CS23D





CS23D is a web server to generate 3D structural models from NMR chemical shifts. CS23D combines maximal fragment assembly with chemical shift threading, de novo structure generation, chemical shift-based torsion angle prediction, and chemical shift refinement. CS23D makes use of RefDB and ShiftX.

CS23D input formats
CS23D accepts chemical shift files in either SHIFTY or BMRB formats.

CS23D options
A user can
 * 1) Exclude a protein from being used as the template
 * 2) Ignore high-identity homologs in the list of available templates
 * 3) Change the number of models in the final ensemble
 * 4) Change the number of model optimization steps

CS23D output
CS23D output consists of a set of 10 best-score PDB coordinates. A hyperlink to the single best score structure is also provided. The overall CS23D score, knowledge-based score, chemical shift score, Ramachandran plot statistics, correlations between the observed and calculated shifts before and after refinement are displayed. A conclusion about structure reliability is given to the user.

CS23D protocol
Homology search: The query sequence is used to find homologous proteins or/and protein fragments in a non-redundant database of PDB sequences and secondary structures of PPT-DB using BLAST.

Homology modelling: Homology modelling is done by the Homodeller program, which is a part of the PROTEUS2 program. The proteins that are identified during the homology search step are used as the templates in homology modelling.

Chemical shift re-referencing: Chemical shifts are re-referenced by the RefCor, which is a part of the RCI webserver backend.

Secondary structure prediction from chemical shifts: Secondary structure is predicted from chemical shifts by CSI.

Torsion angle prediction from chemical shifts: Torsion angles are predicted from chemical shifts by PREDITOR.

Chemical shift threading: Backbone Phi and Psi torsion angles predicted from chemical shifts by PREDITOR are  mapped into nine different regions in Ramachandran space, each of which are assigned specific letters. A protein can be represented by a sequence of these nine "torsion angle letters". Thrifty is using these sequences of torsion angle letters to identify good templates in a database of ~18,500 nonredundant PDB structures that have had their structures converted to the nine-letter Ramachandran "alphabet".

In a similar manner, chemical shift threading is additionally done using three-letter secondary structure alphabet (H for helix, B for beta-strand, C for coil) and secondary structure predicted from chemical shifts by the CSI program.

Model assembly: Subfragments identified by homology modelling and chemical shift threading steps are assembled into initial 3D models using CS23D SFassembler (SubFragment assembler). The initial models are evaluated by the GAFolder scoring function (see below) and the best model is further refined by GAFolder (see more info about GAFolder below).

Ab initio folding: Ab initio folding is done by Rosetta when no template was identified by the homology modelling and chemical shift threading steps. Rosetta models are evaluated by GAFolder scoring function and the best Rosetta models are refined by GAFolder (see below).

Model optimization: Model optimization in CS23D is done by a torsion-angle-based minimizer GAfolder (Genetic Algorithm folder) that uses a genetic algorithm to sample conformation space. The method is similar to that employed by GENFOLD. GAFolder makes torsion angles moves within the ranges defined by the values and uncertainties of torsion angles predicted by PREDITOR. GAFolder evaluates protein models by the scoring function described below.

Scoring function: Scoring function of GAFolder consists of knowledge based scores and chemical shift scores.

The knowledge-based scores include:
 * 1) radius of gyration score,
 * 2) hydrogen bond energy,
 * 3) number of hydrogen bonds,
 * 4) bad contacts score,
 * 5) disulfide bond score,
 * 6) modified threading energy based on the Bryant and Lawrence potential.
 * 7) Ramachandran score that evaluates normality of model torsion angles Phi and Psi
 * 8) Omega score that evaluates normality of model torsion omega angles
 * 9) Chi score that is based on expected chi angles for different phi and psi combinations.

The chemical shift component of the GAfolder scoring function uses:
 * 1) weighted coefficients of correlation between the experimental chemical shifts (CA, CB, CO, N, HA, HN) and chemical shifts calculated by SHIFTX 1.0.
 * 2) agreement between model secondary structure and secondary structure predicted by CSI from experimental chemical shifts.

CS23D sub-programs

 * 1) CSI - prediction of secondary structure from chemical shifts
 * 2) BLAST - sequence alignment, homology search
 * 3) PROTEUS2 - homology modelling
 * 4) PREDITOR - prediction of torsion angles from chemical shifts
 * 5) Pepmake - building protein models from torsion angles and sequence
 * 6) PPT-DB- secondary structure database
 * 7) Rosetta - ab initio structure generation
 * 8) RCI- estimating uncertainty of torsion angles predicted from chemical shifts by PREDITOR
 * 9) ShiftX 1.0 - is used to generate coefficients of correlation between observed chemical shifts and shifts predicted by ShiftX from protein models
 * 10) SFAssembler - maximal fragment assembly
 * 11) GAFolder - chemical shift refinement via a genetic algorithm
 * 12) Thrifty - chemical shift threading
 * 13) RefCor - chemical shift re-referencing

CS23D dependence on template sequence identity
CS23D is a template-based method. Therefore, its performance depends on sequence identity of the selected template(s), see the adjacent picture. Likewise, Rosetta is a fragment-biased method. Its performance depends on the quality of selected fragments. Fragment quality and, thus, Rosetta performance can be improved by using chemical shifts during the fragment selection step (e.g. in CS-Rosetta protocol). For a structural solution that is not biased by a template structure or fragment structure, one may want to consider obtaining NOE-based distance restraints (8-10 per residue) and using them with the GeNMR program in its ab initio mode.