R-factor (crystallography)

In crystallography, the R-factor (sometimes called residual factor or reliability factor or the R-value or RWork) is a measure of the agreement between the crystallographic model and the experimental X-ray diffraction data. In other words, it is a measure of how well the refined structure predicts the observed data. The value is also sometimes called the discrepancy index, as it mathematically describes the difference between the experimental observations and the ideal calculated values. It is defined by the following equation:


 * $$R = \frac{ \sum{||F_\text{obs}| - |F_\text{calc}|| } }{ \sum{ |F_\text{obs}|}},$$

where F is the so-called structure factor and the sum extends over all the reflections of X-rays measured and their calculated counterparts respectively. The structure factor is closely related to the intensity of the reflection it describes:


 * $$I_{hkl} \propto |F(hkl)|^2$$.

The minimum possible value is zero, indicating perfect agreement between experimental observations and the structure factors predicted from the model. There is no theoretical maximum, but in practice, values are considerably less than one even for poor models, provided the model includes a suitable scale factor. Random experimental errors in the data contribute to $$R$$ even for a perfect model, and these have more leverage when the data are weak or few, such as for a low-resolution data set. Model inadequacies such as incorrect or missing parts and unmodeled disorder are the other main contributors to $$R$$, making it useful to assess the progress and final result of a crystallographic model refinement. For large molecules, the R-factor usually ranges between 0.6 (when computed for a random model and against an experimental data set) and 0.2 (for example for a well refined macro-molecular model at a resolution of 2.5 Ångström). Small molecules (up to ca. 1000 atoms) usually form better-ordered crystals than large molecules, and thus it is possible to attain lower R-factors. In the Cambridge Structural Database of small-molecule structures, more than 95% of the 500,000+ crystals have an R-factor lower than 0.15, and 9.5% have an R-factor lower than 0.03.

Crystallographers also use the Free R-Factor ($$R_{Free}$$) to assess possible overmodeling of the data. $$R_{Free}$$ is computed according to the same formula given above, but on a small, random sample of data that are set aside for the purpose and never included in the refinement. $$R_{Free}$$ will always be greater than $$R$$ because the model is not fitted to the reflections that contribute to $$R_{Free}$$, but the two statistics should be similar because a correct model should predict all the data with uniform accuracy. If the two statistics differ significantly then that indicates the model has been over-parameterized, so that to some extent it predicts not the ideal error-free data for the correct model, but rather the error-afflicted data actually observed.

The quantities $$R_\text{sym}$$ and $$R_\text{merge}$$ are similarly used to describe the internal agreement of measurements in a crystallographic data set.