Quantum chemistry composite methods

Quantum chemistry composite methods (also referred to as thermochemical recipes) are computational chemistry methods that aim for high accuracy by combining the results of several calculations. They combine methods with a high level of theory and a small basis set with methods that employ lower levels of theory with larger basis sets. They are commonly used to calculate thermodynamic quantities such as enthalpies of formation, atomization energies, ionization energies and electron affinities. They aim for chemical accuracy which is usually defined as within 1 kcal/mol of the experimental value. The first systematic model chemistry of this type with broad applicability was called Gaussian-1 (G1) introduced by John Pople. This was quickly replaced by the Gaussian-2 (G2) which has been used extensively. The Gaussian-3 (G3) was introduced later.

Gaussian-2 (G2)
The G2 uses seven calculations:
 * 1) the molecular geometry is obtained by a MP2 optimization using the 6-31G(d) basis set and all electrons included in the perturbation. This geometry is used for all subsequent calculations.
 * 2) The highest level of theory is a quadratic configuration interaction calculation with single and double excitations and a triples excitation contribution (QCISD(T)) with the 6-311G(d) basis set. Such a calculation in the Gaussian and Spartan programs also give the MP2 and MP4 energies which are also used.
 * 3) The effect of polarization functions is assessed using an MP4 calculation with the 6-311G(2df,p) basis set.
 * 4) The effect of diffuse functions is assessed using an MP4 calculation with the 6-311+G(d, p) basis set.
 * 5) The largest basis set is 6-311+G(3df,2p) used at the MP2 level of theory.
 * 6) A Hartree–Fock geometry optimization with the 6-31G(d) basis set used to give a geometry for:
 * 7) A frequency calculation with the 6-31G(d) basis set to obtain the zero-point vibrational energy (ZPVE)

The various energy changes are assumed to be additive so the combined energy is given by:
 * EQCISD(T) from 2 + [EMP4 from 3 - EMP4 from 2] + [EMP4 from 4 - EMP4 from 2] + [EMP2 from 5 + EMP2 from 2 - EMP2 from 3 - EMP2 from 4]

The second term corrects for the effect of adding the polarization functions. The third term corrects for the diffuse functions. The final term corrects for the larger basis set with the terms from steps 2, 3 and 4 preventing contributions from being counted twice. Two final corrections are made to this energy. The ZPVE is scaled by 0.8929. An empirical correction is then added to account for factors not considered above. This is called the higher level correction (HC) and is given by -0.00481 x (number of valence electrons) -0.00019 x (number of unpaired valence electrons). The two numbers are obtained calibrating the results against the experimental results for a set of molecules. The scaled ZPVE and the HLC are added to give the final energy. For some molecules containing one of the third row elements Ga–Xe, a further term is added to account for spin orbit coupling.

Several variants of this procedure have been used. Removing steps 3 and 4 and relying only on the MP2 result from step 5 is significantly cheaper and only slightly less accurate. This is the G2MP2 method. Sometimes the geometry is obtained using a density functional theory method such as B3LYP and sometimes the QCISD(T) method in step 2 is replaced by the coupled cluster method CCSD(T).

The G2(+) variant, where the "+" symbol refers to added diffuse functions, better describes anions than conventional G2 theory. The 6-31+G(d) basis set is used in place of the 6-31G(d) basis set for both the initial geometry optimization, as well as the second geometry optimization and frequency calculation. Additionally, the frozen-core approximation is made for the initial MP2 optimization, whereas G2 usually uses the full calculation.

Gaussian-3 (G3)
The G3 is very similar to G2 but learns from the experience with G2 theory. The 6-311G basis set is replaced by the smaller 6-31G basis. The final MP2 calculations use a larger basis set, generally just called G3large, and correlating all the electrons not just the valence electrons as in G2 theory, additionally a spin-orbit correction term and an empirical correction for valence electrons are introduced. This gives some core correlation contributions to the final energy. The HLC takes the same form but with different empirical parameters.

Gaussian-4 (G4)
G4 is a compound method in spirit of the other Gaussian theories and attempts to take the accuracy achieved with G3X one small step further. This involves the introduction of an extrapolation scheme for obtaining basis set limit Hartree-Fock energies, the use of geometries and thermochemical corrections calculated at B3LYP/6-31G(2df,p) level, a highest-level single point calculation at CCSD(T) instead of QCISD(T) level, and addition of extra polarization functions in the largest-basis set MP2 calculations. Thus, Gaussian 4 (G4) theory is an approach for the calculation of energies of molecular species containing first-row, second-row, and third row main group elements. G4 theory is an improved modification of the earlier approach G3 theory. The modifications to G3- theory are the change in an estimate of the Hartree–Fock energy limit, an expanded polarization set for the large basis set calculation, use of CCSD(T) energies, use of geometries from density functional theory and zero-point energies, and two added higher level correction parameters. According to the developers, this theory gives significant improvement over G3-theory. The G4 and the related G4MP2 methods have been extended to cover transition metals. A variant of G4MP2, termed G4(MP2)-6X, has been developed with an aim to improve the accuracy with essentially identical quantum chemistry components. It applies scaling to the energy components in addition to using the HLC. In the G4(MP2)-XK method that is related to G4(MP2)-6X, the Pople-type basis sets are replaced with customized Karlsruhe-type basis sets. In comparison with G4(MP2)-6X, which covers main-group elements up to krypton, G4(MP2)-XK is applicable to main-group elements up to radon.

Feller-Peterson-Dixon approach (FPD)
Unlike fixed-recipe, "model chemistries", the FPD approach    consists of a flexible sequence of (up to) 13 components that vary with the nature of the chemical system under study and the desired accuracy in the final results. In most instances, the primary component relies on coupled cluster theory, such as CCSD(T), or configuration interaction theory combined with large Gaussian basis sets (up through aug-cc-pV8Z, in some cases) and extrapolation to the complete basis set limit. As with some other approaches, additive corrections for core/valence, scalar relativistic and higher order correlation effects are usually included. Attention is paid to the uncertainties associated with each of the components so as to permit a crude estimate of the uncertainty in the overall results. Accurate structural parameters and vibrational frequencies are a natural byproduct of the method. While the computed molecular properties can be highly accurate, the computationally intensive nature of the FPD approach limits the size of the chemical system to which it can be applied to roughly 10 or fewer first/second row atoms.

The FPD Approach has been heavily benchmarked against experiment. When applied at the highest possible level, FDP is capable to yielding a root-mean-square (RMS) deviation with respect to experiment of 0.30 kcal/mol (311 comparisons covering atomization energies, ionization potentials, electron affinities and proton affinities). In terms of equilibrium, bottom-of-the-well structures, FPD gives an RMS deviation of 0.0020 Å (114 comparisons not involving hydrogens) and 0.0034 Å (54 comparisons involving hydrogen). Similar good agreement was found for vibrational frequencies.

T1
The T1 method. is an efficient computational approach developed for calculating accurate heats of formation of uncharged, closed-shell molecules comprising H, C, N, O, F, Si, P, S, Cl and Br, within experimental error. It is practical for molecules up to molecular weight ~ 500 a.m.u.

T1 method as incorporated in Spartan consists of:


 * 1) HF/6-31G* optimization.
 * 2) RI-MP2/6-311+G(2d,p)[6-311G*] single point energy with dual basis set.
 * 3) An empirical correction using atom counts, Mulliken bond orders, HF/6-31G* and RI-MP2 energies as variables.

T1 follows the G3(MP2) recipe, however, by substituting an HF/6-31G* for the MP2/6-31G* geometry, eliminating both the HF/6-31G* frequency and QCISD(T)/6-31G* energy and approximating the MP2/G3MP2large energy using dual basis set RI-MP2 techniques, the T1 method reduces computation time by up to 3 orders of magnitude. Atom counts, Mulliken bond orders and HF/6-31G* and RI-MP2 energies are introduced as variables in a linear regression fit to a set of 1126 G3(MP2) heats of formation. The T1 procedure reproduces these values with mean absolute and RMS errors of 1.8 and 2.5 kJ/mol, respectively. T1 reproduces experimental heats of formation for a set of 1805 diverse organic molecules from the NIST thermochemical database with mean absolute and RMS errors of 8.5 and 11.5 kJ/mol, respectively.

Correlation consistent composite approach (ccCA)
This approach, developed at the University of North Texas by Angela K. Wilson's research group, utilizes the correlation consistent basis sets developed by Dunning and co-workers. Unlike the Gaussian-n methods, ccCA does not contain any empirically fitted term. The B3LYP density functional method with the cc-pVTZ basis set, and cc-pV(T+d)Z for third row elements (Na - Ar), are used to determine the equilibrium geometry. Single point calculations are then used to find the reference energy and additional contributions to the energy. The total ccCA energy for main group is calculated by:
 * EccCA = EMP2/CBS + ΔECC + ΔECV + ΔESR + ΔEZPE + ΔESO

The reference energy EMP2/CBS is the MP2/aug-cc-pVnZ (where n=D,T,Q) energies extrapolated at the complete basis set limit by the Peterson mixed gaussian exponential extrapolation scheme. CCSD(T)/cc-pVTZ is used to account for correlation beyond the MP2 theory:
 * ΔECC = ECCSD(T)/cc-pVTZ - EMP2/cc-pVTZ

Core-core and core-valence interactions are accounted for using MP2(FC1)/aug-cc-pCVTZ:
 * ΔECV= EMP2(FC1)/aug-cc-pCVTZ - EMP2/aug-cc-pVTZ

Scalar relativistic effects are also taken into account with a one-particle Douglass Kroll Hess Hamiltonian and recontracted basis sets:
 * ΔESR = EMP2-DK/cc-pVTZ-DK - EMP2/cc-pVTZ

The last two terms are zero-point energy corrections scaled with a factor of 0.989 to account for deficiencies in the harmonic approximation and spin-orbit corrections considered only for atoms.

The Correlation Consistent Composite Approach is available as a keyword in NWChem and GAMESS (ccCA-S4 and ccCA-CC(2,3))

Complete Basis Set methods (CBS)
The Complete Basis Set (CBS) methods are a family of composite methods, the members of which are: CBS-4M, CBS-QB3, and CBS-APNO, in increasing order of accuracy. These methods offer errors of 2.5, 1.1, and 0.7 kcal/mol when tested against the G2 test set. The CBS methods were developed by George Petersson and coworkers, and they make extrapolate several single-point energies to the "exact" energy. In comparison, the Gaussian-n methods perform their approximation using additive corrections. Similar to the modified G2(+) method, CBS-QB3 has been modified by the inclusion of diffuse functions in the geometry optimization step to give CBS-QB3(+). The CBS family of methods is available via keywords in the Gaussian 09 suite of programs.

Weizmann-n theories
The Weizmann-n ab initio methods (Wn, n = 1–4)  are highly accurate composite theories devoid of empirical parameters. These theories are capable of sub-kJ/mol accuracies in prediction of fundamental thermochemical quantities such as heats of formation and atomization energies, and unprecedented accuracies in prediction of spectroscopic constants. The Wn-P34 variants further extend the applicability from first- and second-row species to include heavy main-group systems (up to xenon).

The ability of these theories to successfully reproduce the CCSD(T)/CBS (W1 and W2), CCSDT(Q)/CBS (W3), and CCSDTQ5/CBS (W4) energies relies on judicious combination of very large Gaussian basis sets with basis-set extrapolation techniques. Thus, the high accuracy of Wn theories comes with the price of a significant computational cost. In practice, for systems consisting of more than ~9 non-hydrogen atoms (with C1 symmetry), even the computationally more economical W1 theory becomes prohibitively expensive with current mainstream server hardware.

In an attempt to extend the applicability of the Wn ab initio thermochemistry methods, explicitly correlated versions of these theories have been developed: Wn-F12 (n = 1–3) and more recently even a W4-F12 theory. W1-F12 was successfully applied to large hydrocarbons (e.g., dodecahedrane, as well as to systems of biological relevance (e.g., DNA bases). W4-F12 theory has been applied to systems as large as benzene. In a similar manner, the WnX protocols that have been developed independently further reduce the requirements on computational resources by using more efficient basis sets and, for the minor components, electron-correlation methods that are computationally less demanding.