User:Jiaanyang/sandbox

 Protein Folding Shape Code  ( PFSC) is comprehensive description of folding patterns in alphabetic code for five consecutive amino acid residues. With mathematical derivation, the free rotations of five point connection as ball-and-stick can be completely described by a set of 27 of folding patterns, which are presented by 26 alphabetic letters and $ symbol. With application to protein, a set of 27 letters for folds of five consecutive amino acid residues is called as PFSC. For any given 3D protein structure, the folding conformation can be completely expressed by a PFSC string without gap, including the secondary structure fragment as well as tertiary structure fragment.

Protein Folding Shape Code (PFSC)
PFSC is a complete description of folding patterns in alphabetic code for five consecutive amino acid residues, and then applies to protein structural conformation. A folden of five amino acid residues was taken as folding element, and the complete description of its folding space was acquired mathematically. The backbone of five consecutive amino acid residues has two continuous dihedral angles (1-4 and 2-5), which is the smallest element to reveal how a twist is continued or reversed in some degree. First, a folden with five point connection was mathematically derived to expose its possible folds. Without constrain, the folds are freely rotated around each join point with topological uniformity, and all possible folds in geometric space form a complete and continuous aggregation. Second, with matrix transformation, the mathematical equation with initial fifteen dimensional variables for folden of five points was derived to reduce the dimensions into three effective variables. Third, the continuous aggregation of folding geometric space was converted into partitioned quantum description, and then applied to biological protein. Then a set of 27 folding shapes is obtained which is able completely to cover various folding patterns for five successive amino acid residues. Fourth, with digital description, these 27 folding shapes are represented by 26 alphabetic letters and “$” symbol. As a set of 27 alphabetic letters is applied to protein systems, it is called as the Protein Folding Shape Code (PFSC), which together represents the full space of folding shapes of five amino acid residues. Each PFSC letter represents a specific folding pattern. For examples, “A” is for typical alpha-helix; “V, J, Y, P, D and H” contain partial feature of alpha-helix; “B” for typical beta-strand; “E, G, S, M, V and J” contain partial feature of beta-strand and “X, U, R, I, F, L, O, C, Z, W, T, K, $, N and Q” for irregular tertiary folding shape. With alphabetic digital letters, one PFSC code represents a folding pattern of five amino acid residues, and reversely the folding shape of a given 3D coordinate of five amino acid residues can be represented by one PFSC letter.

PFSC String for Complete Protein Conformation
The conformation of a given protein 3D structure can be completely presented by a PFSC string, in which one PFSC letter represents for a folding shape of 5 amino acid residues, two PFSC letters next each other for overlapping two folding shapes with four amino acids, and so on from N-terminal to C-terminal. A PFSC string, as an alphabetic digital presentation for protein conformation, provides a complete description for protein fold without any gap from N-terminal to C-terminal, including regular secondary structure fragment as well as tertiary structure fragment. Thus, any protein 3D structure in PDB or from computational simulation calculation can be converted into a PFSC string.

Conformation Comparison
With PFSC, the comparison of protein conformation can be carried on by alignment of PFSC strings as each PFSC string represents its protein conformation. The aligned results provide the score (1.00-0.00) for similarity, and the score value = 1.00 is means 100% identical in conformation. Also, the aligned results explicitly reveal the folding difference along sequence. The PFSC alignment offer three benefits for protein comparison. First, the protein comparison can be integrated by sequence alignment and conformation alignment with PFSC strings. Second, it overcame the ambiguity and arbitrariness of 3D structural superposition for protein comparison. Third, the PFSC conformation comparison is able to compare any protein with any degree of homological proteins, which is not limited by only higher homological proteins.

High Through-Put Screening PDB
With PFSC, the protein with similar conformation can be easily discovered, which will impact protein research and drug discovery. The PDB is firstly converted into PFSC express. Then, based on the queried protein with PFSC, the proteins with similar conformation can be found by high through-put screening PDB.

Convert Protein Structure to PFSC
Any given protein 3D structure in PDB format can be converted to PFSC string. The server is liked by http://www.micropht.com/. Steps: 1.	Click green bar “Protein Structure Fingerprint Technology” 2.	Click Login 3.	Username = public; Password = public 4.	Select “Conversion”