Protein chemical shift prediction

Protein chemical shift prediction is a branch of biomolecular nuclear magnetic resonance spectroscopy that aims to accurately calculate protein chemical shifts from protein coordinates. Protein chemical shift prediction was first attempted in the late 1960s using semi-empirical methods applied to protein structures solved by X-ray crystallography. Since that time protein chemical shift prediction has evolved to employ much more sophisticated approaches including quantum mechanics, machine learning and empirically derived chemical shift hypersurfaces. The most recently developed methods exhibit remarkable precision and accuracy.

Protein chemical shifts
NMR chemical shifts are often called the mileposts of nuclear magnetic resonance spectroscopy. Chemists have used chemical shifts for more than 50 years as highly reproducible, easily measured parameters to map out the covalent structure of small organic molecules. Indeed, the sensitivity of NMR chemical shifts to the type and character of neighbouring atoms, combined with their reasonably predictable tendencies has made them invaluable for both deciphering and describing the structure of thousands of newly synthesized or newly isolated compounds The same sensitivity to a variety of important protein structural features has made protein chemical shifts equally valuable to protein chemists and biomolecular NMR spectroscopists. In particular, protein chemical shifts are sensitive not only to substituent or covalent atom effects (such as electronegativity, redox states or ring currents) but they are also sensitive to backbone torsion angles (i.e. secondary structure), hydrogen bonding, local atomic motions and solvent accessibility.

Importance of protein chemical shift prediction
Predicted or estimated protein chemical shifts can be used to assist with the chemical shift assignment process. This is especially true if a similar (or identical) protein structure has been solved by X-ray crystallography. In this case, the three-dimensional structure can be used to estimate what the NMR chemical shifts should be and thereby simplify the process of assigning the experimentally observed chemical shifts. Predicted/estimated protein chemical shifts can also be used to identify incorrect or mis-assignments, to correct mis-referenced or incorrectly referenced chemical shifts, to optimize protein structures via chemical shift refinement and to identify the relative contributions of different electronic or geometric effects to nucleus-specific shifts. Protein chemical shifts can also be used to identify secondary structures, to estimate backbone torsion angles, to determine the location of aromatic rings, to assess cysteine oxidation states, to estimate solvent exposure and to measure backbone flexibility.

Progress in chemical shift prediction programs
Significant progress in chemical shift prediction has been made through continuous improvements in our understanding of the key physico-chemical factors contributing to chemical shift changes. These improvements have also been helped along through significant computational advancements and the rapid expansion of biomolecular chemical shift databases . Over the past four decades, at least three different methods for calculating or predicting protein chemical shifts have emerged. The first is based on using sequence/structure alignment against protein chemical shift databases, the second is based on directly calculating shifts from atomic coordinates, and the third is based on using a combination of the two approaches.
 * Predicting shifts via sequence homology: these are based on the simple observation that similar protein sequences share similar structures and similar chemical shifts
 * Predicting shifts from coordinate data / structure:
 * Semi-classical methods: employ empirical equations derived from classical physics and experimental data
 * Quantum mechanical (QM) methods: employ density functional theory (DFT)
 * Empirical methods: rely on using chemical shift ‘‘hypersurfaces" or related "structure/shift" tables
 * Hybrid Methods: combining the above two methods

The emergence of hybrid prediction methods
By early 2000, several research groups realized that protein chemical shifts could be more efficiently and accurately calculated by combining different methods together as shown in Figure 1. This led to the development of several programs and web servers that rapidly calculate protein chemical shifts when provided with protein coordinate data. These “hybrid” programs, along with some of their features and URLs, are listed below in Table 1.

Performance comparison of modern protein chemical shift prediction programs
This table (Figure 2) lists the correlation coefficients between the experimentally observed backbone chemical shifts and the calculated/predicted backbone shifts for different chemical shift predictors using an identical test set of 61 test proteins.

Coverage and speed
Different methods have different levels of coverage and rates of calculation. Some methods only calculate or predict chemical shifts for backbone atoms (6 atom types). Some calculate chemical shifts for backbone and certain side chain atoms (C and N only) and still others are able to calculate shifts for all atoms (40 atom types). For chemical shift refinement there is a need for rapid calculation as thousands of structures are generated during a molecular dynamics or simulated annealing run and their chemical shifts must be calculated equally rapidly.

All the computational speed tests for SPARTA, SPARTA+, SHIFTS, CamShift, SHIFTX and SHIFTX2 were performed on the same computer using the same set of proteins. The calculation speed reported for PROSHIFT is based on the response rate of its web server.