User:Christoph n plus one/sandbox/Direct Coupling Analysis

Direct coupling analysis or DCA is an umbrella term for several methods for analyzing sequence data in Computational Biology. The common idea of these methods is to use statistical modeling to quantify the mutual compatibility between two positions of a biological sequence. This mutual compatibility is supposed to represent a direct relationship between the two positions and to be independent of other positions in the sequence. This is in contrast to measures of correlation, which can be large even if there is no direct relationship between the positions (hence the name direct coupling analysis). Since this mutual compatibility links the positions in the process of evolution, it can be effectively seen as quantifying the molecular coevolution between them. DCA has been used in the inference of protein residue contacts,   RNA structure prediction,  the inference of protein-protein interaction networks,

Their common basis is the statistical modeling of the variety within a set of related sequences using the following probability mass function:

$$ \begin{align} p\left(a | J,h\right) = \frac{1}{Z} \exp{\sum\limits_{i=1}^{N-1} \sum\limits_{j=i+1}^{N} J_{ij}(a_i,a_j) + h_i(a_i)} \end{align} $$