Codon Adaptation Index

The Codon Adaptation Index (CAI) is the most widespread technique for analyzing codon usage bias. As opposed to other measures of codon usage bias, such as the 'effective number of codons' (Nc), which measure deviation from a uniform bias (null hypothesis), CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes. CAI is used as a quantitative method of predicting the level of expression of a gene based on its codon sequence.

Rationale
Ideally, the reference set in CAI is composed of highly expressed genes, so that CAI provides an indication of gene expression level under the assumption that there is translational selection to optimize gene sequences according to their expression levels. The rationale for this is dual: highly expressed genes need to compete for resources (i.e. ribosomes) in fast-growing organisms and it makes sense for them to be also more accurately translated. Both hypotheses lead to highly expressed genes using mostly codons for tRNA species that are abundant in the cell.

Implementation
For each amino acid in a gene, the weight of each of its codons represented by a parameter termed relative adaptiveness ($w_{i}$), is computed from a reference sequence set, as the ratio between the observed frequency of the codon $f_{i}$ and the frequency of the most frequent synonymous codon $f_{j}$ for that amino acid.


 * $$w_i=\frac{f_i}{\max(f_j)} \qquad i,j \in [\text{synonymous codons for amino acid}]$$

The CAI of a gene is simply defined as the geometric mean of the weight associated to each codon over the length ($L$) of the gene sequence (measured in codons).


 * $$\text{CAI}= (\Pi_{i=1}^{L} w_{i})^{\frac{1}{L}}$$