Protein chemical shift re-referencing

Protein chemical shift re-referencing is a post-assignment process of adjusting the assigned NMR chemical shifts to match IUPAC and BMRB recommended standards in protein chemical shift referencing. In NMR chemical shifts are normally referenced to an internal standard that is dissolved in the NMR sample. These internal standards include tetramethylsilane (TMS), 4,4-dimethyl-4-silapentane-1-sulfonic acid (DSS) and trimethylsilyl propionate (TSP). For protein NMR spectroscopy the recommended standard is DSS, which is insensitive to pH variations (unlike TSP). Furthermore, the DSS 1H signal may be used to indirectly reference 13C and 15N shifts using a simple ratio calculation [1]. Unfortunately, many biomolecular NMR spectroscopy labs use non-standard methods for determining the 1H, 13C or 15N “zero-point” chemical shift position. This lack of standardization makes it difficult to compare chemical shifts for the same protein between different laboratories. It also makes it difficult to use chemical shifts to properly identify or assign secondary structures or to improve their 3D structures via chemical shift refinement. Chemical shift re-referencing offers a means to correct these referencing errors and to standardize the reporting of protein chemical shifts across laboratories.

Importance of NMR chemical shift re-referencing in biomolecular NMR
Incorrect chemical shift referencing is a particularly acute problem in biomolecular NMR. It has been estimated that up to 20% of 13C and up to 35% of 15N shift assignments are improperly referenced. Given that the structural and dynamic information contained within chemical shifts is often quite subtle, it is critical that protein chemical shifts be properly referenced so that these subtle differences can be detected. Fundamentally, the problem with chemical shift referencing comes from the fact that chemical shifts are relative frequency measurements rather than absolute frequency measurements. Because of the historic problems with chemical shift referencing, chemical shifts are perhaps the most precisely measurable but the least accurately measured parameters in all of NMR spectroscopy.

Programs for protein chemical shift re-referencing
Because of the magnitude and severity of the problems with chemical shift referencing in biomolecular NMR, a number of computer programs have been developed to help mitigate the problem (see Table 1 for a summary). The first program to comprehensively tackle chemical shift mis-referencing in biomolecular NMR was SHIFTCOR.

Table 1. Summary and comparison of different chemical shift re-referencing and mis-assignment detection programs.

SHIFTCOR: A structure-based chemical shift correction program
SHIFTCOR is an automated protein chemical shift correction program that uses statistical methods to compare and correct predicted NMR chemical shifts (derived from the 3D structure of the protein) relative to an input set of experimentally measured chemical shifts. SHIFTCOR uses several simple statistical approaches and pre-determined cut-off values to identify and correct potential referencing, assignment and typographical errors. SHIFTCOR identifies potential chemical shift referencing problems by comparing the difference between the average value of each set of observed backbone (1Hα, 13Cα, 13Cβ, 13CO, 15N and 1HN) shifts and their corresponding predicted chemical shifts. The difference between these two averages results in a nucleus-specific chemical shift offset or reference correction (i.e. one for 1H, one for 13C and one for 15N). In order to ensure that certain extreme outliers do not unduly bias these average offset values, the average of the observed shifts is only calculated after excluding potential mis-assignments or typographical errors.

SHIFTCOR output
SHIFTCOR generates and reports chemical shift offsets or differences for each nucleus. The results contain the chemical shift analyses (including lists of potential mis-assignments, the estimated referencing errors, the estimated error in the calculated reference offset (95% confidence interval), the applied or suggested reference offset, correlation coefficients, RMSD values) and the corrected BMRB formatted chemical shift file (see Figure 1 for details).

SHIFTCOR uses the chemical shift calculation program SHIFTX to predict 1Hα, 13Cα,15N shifts based on the 3D structure coordinates of the protein being analyzed. By comparing the predicted shifts to the observed shifts, SHIFTCOR is able to accurately identify chemical shift reference offsets as well as potential mis-assignments. A key limitation to the SHIFTCOR approach is that requires that the 3D structure for the target protein be available to assess the chemical shift reference offsets. Given that chemical shift assignments are typically made before the structure is determined, it was soon realized that structure-independent approaches were required to develop.

Structure-independent chemical shift correction programs
Several methods have been developed that make use of the estimated (via 1H or 13C shifts) or predicted (via sequence) secondary structure content of the protein being analyzed. These programs include PSSI, CheckShift, LACS,  and PANAV. Both PANAV <> and CheckShift are also available as web servers.

The PSSI and PANAV programs use the secondary structure determined by 1H shifts (which are almost never mis-referenced) to adjust the target protein’s 13C and 15N shifts to match the 1H-derived secondary structure. LACS uses the difference between secondary 13Cα and 13Cβ shifts plotted against secondary 13Cα shifts or secondary 13Cβ shifts to determine reference offsets. A more recent version of LACS has been adapted to identify 15N chemical shift mis-referencing. This new version of LACS exploits the well-known relationship between secondary 15N shifts and the secondary 13Cα and 13Cβ shifts of the preceding residue. In contrast to LACS and PANAV/PSSI, CheckShift uses secondary structure predicted from high-performance secondary structure prediction programs such as PSIPRED to iteratively adjust 13C and 15N chemical shifts so that their secondary shifts match the predicted secondary structure. These programs have all been shown to accurately identify mis-referenced and properly re-reference protein chemical shifts deposited in the BMRB,. Note that both LACS and CheckShift are programmed to always predict the same offset for 13Cα and 13Cβ shifts, whereas PSSI and PANAV do not make this assumption. As a general rule, PANAV and PSSI typically exhibit a smaller spread (or standard deviation) in calculated reference offsets, indicating that these programs are slightly more precise than either LACS or CheckShift. Neither LACS nor CheckShift are able to handle proteins that have the extremely large (above 40 ppm) reference offsets, whereas PANAV and PSSI seem to be able to deal with these kinds of anomalous proteins.

In a recent study, a chemical shift re-referencing program (PANAV) was run on a total of 2421 BMRB entries that had a sufficient proportion of (>80%) of assigned chemical shifts to perform a robust chemical shift reference correction. A total of 243 entries were found with 13Cα shifts offset by more than 1.0 ppm, 238 entries with 13Cβ shifts offset of more than 1.0 ppm, 200 entries with 13C’ shifts offset of more than 1.0 ppm and 137 entries with 15N shifts offset by more than 1.5 ppm. From this study, 19.7% of the entries in the BMRB appear to be mis-referenced. Evidently, chemical shift referencing continues to be a significant, and as yet unresolved problem for the biomolecular NMR community.