Parallel tempering

Parallel tempering, in physics and statistics, is a computer simulation method typically used to find the lowest energy state of a system of many interacting particles. It addresses the problem that at high temperatures, one may have a stable state different from low temperature, whereas simulations at low temperatures may become "stuck" in a metastable state. It does this by using the fact that the high temperature simulation may visit states typical of both stable and metastable low temperature states.

More specifically, parallel tempering (also known as replica exchange MCMC sampling), is a simulation method aimed at improving the dynamic properties of Monte Carlo method simulations of physical systems, and of Markov chain Monte Carlo (MCMC) sampling methods more generally. The replica exchange method was originally devised by Robert Swendsen and J. S. Wang, then extended by Charles J. Geyer, and later developed further by Giorgio Parisi, Koji Hukushima and Koji Nemoto, and others. Y. Sugita and Y. Okamoto also formulated a molecular dynamics version of parallel tempering; this is usually known as replica-exchange molecular dynamics or REMD.

Essentially, one runs N copies of the system, randomly initialized, at different temperatures. Then, based on the Metropolis criterion one exchanges configurations at different temperatures. The idea of this method is to make configurations at high temperatures available to the simulations at low temperatures and vice versa. This results in a very robust ensemble which is able to sample both low and high energy configurations. In this way, thermodynamical properties such as the specific heat, which is in general not well computed in the canonical ensemble, can be computed with great precision.

Background
Typically a Monte Carlo simulation using a Metropolis–Hastings update consists of a single stochastic process that evaluates the energy of the system and accepts/rejects updates based on the temperature T. At high temperatures updates that change the energy of the system are comparatively more probable. When the system is highly correlated, updates are rejected and the simulation is said to suffer from critical slowing down.

If we were to run two simulations at temperatures separated by a ΔT, we would find that if ΔT is small enough, then the energy histograms obtained by collecting the values of the energies over a set of Monte Carlo steps N will create two distributions that will somewhat overlap. The overlap can be defined by the area of the histograms that falls over the same interval of energy values, normalized by the total number of samples. For ΔT = 0 the overlap should approach 1.

Another way to interpret this overlap is to say that system configurations sampled at temperature T1 are likely to appear during a simulation at T2. Because the Markov chain should have no memory of its past, we can create a new update for the system composed of the two systems at T1 and T2. At a given Monte Carlo step we can update the global system by swapping the configuration of the two systems, or alternatively trading the two temperatures. The update is accepted according to the Metropolis–Hastings criterion with probability


 * $$ p = \min \left( 1, \frac{ \exp \left( -\frac{E_j}{kT_i} - \frac{E_i}{kT_j} \right) }{ \exp \left( -\frac{E_i}{kT_i} - \frac{E_j}{kT_j} \right) } \right) = \min \left( 1, e^{(E_i - E_j) \left( \frac{1}{kT_i} - \frac{1}{kT_j} \right)} \right) ,$$

and otherwise the update is rejected. The detailed balance condition has to be satisfied by ensuring that the reverse update has to be equally likely, all else being equal. This can be ensured by appropriately choosing regular Monte Carlo updates or parallel tempering updates with probabilities that are independent of the configurations of the two systems or of the Monte Carlo step.

This update can be generalized to more than two systems.

By a careful choice of temperatures and number of systems one can achieve an improvement in the mixing properties of a set of Monte Carlo simulations that exceeds the extra computational cost of running parallel simulations.

Other considerations to be made: increasing the number of different temperatures can have a detrimental effect, as one can think of the 'lateral' movement of a given system across temperatures as a diffusion process. Set up is important as there must be a practical histogram overlap to achieve a reasonable probability of lateral moves.

The parallel tempering method can be used as a super simulated annealing that does not need restart, since a system at high temperature can feed new local optimizers to a system at low temperature, allowing tunneling between metastable states and improving convergence to a global optimum.

Implementations
• Abalone

• ACEMD

• AMBER

• CHARMM

• Desmond

• GROMACS

• LAMMPS

• RASPA-2.0

• Orac