Histone code

The histone code is a hypothesis that the transcription of genetic information encoded in DNA is in part regulated by chemical modifications (known as histone marks) to histone proteins, primarily on their unstructured ends. Together with similar modifications such as DNA methylation it is part of the epigenetic code. Histones associate with DNA to form nucleosomes, which themselves bundle to form chromatin fibers, which in turn make up the more familiar chromosome. Histones are globular proteins with a flexible N-terminus (taken to be the tail) that protrudes from the nucleosome. Many of the histone tail modifications correlate very well to chromatin structure and both histone modification state and chromatin structure correlate well to gene expression levels. The critical concept of the histone code hypothesis is that the histone modifications serve to recruit other proteins by specific recognition of the modified histone via protein domains specialized for such purposes, rather than through simply stabilizing or destabilizing the interaction between histone and the underlying DNA. These recruited proteins then act to alter chromatin structure actively or to promote transcription. For details of gene expression regulation by histone modifications see table below.

The hypothesis
The hypothesis is that chromatin-DNA interactions are guided by combinations of histone modifications. While it is accepted that modifications (such as methylation, acetylation, ADP-ribosylation, ubiquitination, citrullination, SUMO-ylation and phosphorylation) to histone tails alter chromatin structure, a complete understanding of the precise mechanisms by which these alterations to histone tails influence DNA-histone interactions remains elusive. However, some specific examples have been worked out in detail. For example, phosphorylation of serine residues 10 and 28 on histone H3 is a marker for chromosomal condensation. Similarly, the combination of phosphorylation of serine residue 10 and acetylation of a lysine residue 14 on histone H3 is a tell-tale sign of active transcription.

Modifications
Well characterized modifications to histones include:
 * Methylation: Both lysine and arginine residues are known to be methylated. Methylated lysines are the best understood marks of the histone code, as specific methylated lysine match well with gene expression states. Methylation of lysines H3K4 and H3K36 is correlated with transcriptional activation while demethylation of H3K4 is correlated with silencing of the genomic region. Methylation of lysines H3K9 and H3K27 is correlated with transcriptional repression. Particularly, H3K9me3 is highly correlated with constitutive heterochromatin. Methylation of histone lysine also has a role in DNA repair.  For instance, H3K36me3 is required for homologous recombinational repair of DNA double-strand breaks, and H4K20me2 facilitates repair of such breaks by non-homologous end joining.
 * Acetylation—by HAT (histone acetyl transferase); deacetylation—by HDAC (histone deacetylase): Acetylation tends to define the 'openness' of chromatin as acetylated histones cannot pack as well together as deacetylated histones.
 * Phosphorylation
 * Ubiquitination
 * SUMOylation

However, there are many more histone modifications, and sensitive mass spectrometry approaches have recently greatly expanded the catalog.

A very basic summary of the histone code for gene expression status is given below (histone nomenclature is described here):

Histone H2B

 * H2BK5ac

Histone H3

 * H3K4me1 - primed enhancers
 * H3K4me3 is enriched in transcriptionally active promoters.
 * H3K9me2 -repression
 * H3K9me3 is found in constitutively repressed genes.
 * H3K27me3 is found in facultatively repressed genes.
 * H3K36me
 * H3K36me2
 * H3K36me3 is found in actively transcribed gene bodies.
 * H3K79me2
 * H3K9ac is found in actively transcribed promoters.
 * H3K14ac is found in actively transcribed promoters.
 * H3K23ac
 * H3K27ac distinguishes active enhancers from poised enhancers.
 * H3K36ac
 * H3K56ac is a proxy for de novo histone assembly.
 * H3K122ac is enriched in poised promoters and also found in a different type of putative enhancer that lacks H3K27ac.

Histone H4

 * H4K5ac
 * H4K8ac
 * H4K12ac
 * H4K16ac
 * H4K20me
 * H4K91ac

Complexity
Unlike this simplified model, any real histone code has the potential to be massively complex; each of the four standard histones can be simultaneously modified at multiple different sites with multiple different modifications. To give an idea of this complexity, histone H3 contains nineteen lysines known to be methylated—each can be un-, mono-, di- or tri-methylated. If modifications are independent, this allows a potential 419 or 280 billion different lysine methylation patterns, far more than the maximum number of histones in a human genome (6.4 Gb / ~150 bp = ~44 million histones if they are very tightly packed). And this does not include lysine acetylation (known for H3 at nine residues), arginine methylation (known for H3 at three residues) or threonine/serine/tyrosine phosphorylation (known for H3 at eight residues), not to mention modifications of other histones.

Every nucleosome in a cell can therefore have a different set of modifications, raising the question of whether common patterns of histone modifications exist. A study of about 40 histone modifications across human gene promoters found over 4000 different combinations used, over 3000 occurring at only a single promoter. However, patterns were discovered including a set of 17 histone modifications that are present together at over 3000 genes. Mass spectrometry-based top-down proteomics has provided more insight into these patterns by being able to discriminate single molecule co-occurrence from co-localization in the genome or on the same nucleosome. A variety of approaches have been used to delve into detailed biochemical mechanisms that demonstrate the importance of interplay between histone modifications. Thus, specific patterns of histone modifications are more common than others. These patterns are functionally important but they are intricate and challenging to study. We currently have the best biochemical understanding of the importance of a relatively small number of discrete modifications and a few combinations.

Structural determinants of histone recognition by readers, writers, and erasers of the histone code are revealed by a growing body of experimental data.