User:Manudouz/sandbox/Models of codon evolution

A first model of codon evolution.

A second model of codon evolution, either restricted (a single nucleotide substitution between codons) or unrestricted (up to three nucleotide substitutions between codons).

More generalized models.

Requirement of accurate codon-based alignments: use of amino acid-aware alignment of DNA sequences.

The $$ P(t) $$ matrix
Specifically, if $$ i $$ and $$ j $$ are the codons respectively containing the nucleotide triplets $$ i_{1}i_{2}i_{3} $$ and $$ j_{1}j_{2}j_{3} $$, then codon exchangeabilities can be expressed by the transition matrix


 * $$ P(t) = \big(P_{ij}(t)\big) $$ where each individual entry, $$ P_{ij}(t)\ $$ refers to the probability that codon $$ i $$ will change to codon $$ j $$ in time $$ t\ $$.

Example: We would like to model the substitution process between codons in a continuous-time fashion. The corresponding $$ 61\ X\ 61\ $$ transition matrix will look like:



P(t) = \begin{pmatrix}  p_{TTT \to TTT}(t) & p_{TTT \to TTC}(t) & \ldots & p_{TTT \to j_{1}j_{2}j_{3}}(t) & \ldots & p_{TTT \to GGA}(t) & p_{TTT \to GGG}(t) \\ p_{TTC \to TTT}(t) & p_{TTC \to TTC}(t) & \ldots & p_{TTC \to j_{1}j_{2}j_{3}}(t) & \ldots & p_{TTC \to GGA}(t) & p_{TTC \to GGG}(t) \\ \ldots               & \ldots                & \ldots & \ldots                            & \ldots & \ldots                & \ldots                \\ p_{i_{1}i_{2}i_{3} \to TTT}(t) & p_{i_{1}i_{2}i_{3} \to TTC}(t) & \ldots & p_{i_{1}i_{2}i_{3} \to j_{1}j_{2}j_{3}}(t) & \ldots & p_{i_{1}i_{2}i_{3} \to GGA}(t) & p_{i_{1}i_{2}i_{3} \to GGG}(t) \\ \ldots               & \ldots                & \ldots & \ldots                            & \ldots & \ldots                & \ldots                \\ p_{GGA \to TTT}(t) & p_{GGA \to TTC}(t) & \ldots & p_{GGA \to j_{1}j_{2}j_{3}}(t) & \ldots & p_{GGA \to GGA}(t) & p_{GGA \to GGG}(t) \\ p_{GGG \to TTT}(t) & p_{GGG \to TTC}(t) & \ldots & p_{GGG \to j_{1}j_{2}j_{3}}(t) & \ldots & p_{GGG \to GGA}(t) & p_{GGG \to GGG}(t) \end{pmatrix} $$

The codon $$ i_{1}i_{2}i_{3} $$ where each of the $$ i_{1} $$, $$ i_{2} $$, and $$ i_{3} $$ is a nucleotide $$A$$, $$C$$, $$G$$ or $$T$$.

The $$ Q $$ matrix
"The rate at which each particular allowed substitution occurs is proportional to the (equilibrium) frequency $$ \pi_{j_{1}j_{2}j_{3}} $$ of the codon (j) being changed to."



Q = \begin{pmatrix}  q_{TTT \to TTT} & q_{TTT \to TTC} & \ldots & q_{TTT \to j_{1}j_{2}j_{3}} & \ldots & q_{TTT \to GGA} & q_{TTT \to GGG} \\ q_{TTC \to TTT} & q_{TTC \to TTC} & \ldots & q_{TTC \to j_{1}j_{2}j_{3}} & \ldots & q_{TTC \to GGA} & q_{TTC \to GGG} \\ \ldots               & \ldots                & \ldots & \ldots                            & \ldots & \ldots                & \ldots                \\ q_{i_{1}i_{2}i_{3} \to TTT} & q_{i_{1}i_{2}i_{3} \to TTC} & \ldots & q_{i_{1}i_{2}i_{3} \to j_{1}j_{2}j_{3}} & \ldots & q_{i_{1}i_{2}i_{3} \to GGA} & q_{i_{1}i_{2}i_{3} \to GGG} \\ \ldots               & \ldots                & \ldots & \ldots                            & \ldots & \ldots                & \ldots                \\ q_{GGA \to TTT} & q_{GGA \to TTC} & \ldots & q_{GGA \to j_{1}j_{2}j_{3}} & \ldots & q_{GGA \to GGA} & q_{GGA \to GGG} \\ q_{GGG \to TTT} & q_{GGG \to TTC} & \ldots & q_{GGG \to j_{1}j_{2}j_{3}} & \ldots & q_{GGG \to GGA} & q_{GGG \to GGG} \end{pmatrix} $$


 * $$q_{i_{1}i_{2}i_{3} \to j_{1}j_{2}j_{3}} = \left\{

\begin{array}{ccccc} 0 & \mbox{ if i or j differ by two or more substitutions } \\ \pi_j & \mbox{ if i and j differ by one synonymous transversion } \\ \pi_j. \kappa & \mbox{ if i and j differ by one synonymous transition } \\ \pi_j. \omega & \mbox{ if i and j differ by one non-synonymous transversion } \\ \pi_j. \kappa. \omega & \mbox{ if i and j differ by one non-synonymous transition } \\ \end{array} \right.$$

Transitions / transversions.

A codon model with three layers.