5-Hydroxymethylcytosine

5-Hydroxymethylcytosine (5hmC) is a DNA pyrimidine nitrogen base derived from cytosine. It is potentially important in epigenetics, because the hydroxymethyl group on the cytosine can possibly switch a gene on and off. It was first seen in bacteriophages in 1952. However, in 2009 it was found to be abundant in human and mouse brains, as well as in embryonic stem cells. In mammals, it can be generated by oxidation of 5-methylcytosine, a reaction mediated by TET enzymes. Its molecular formula is C5H7N3O2.

Localization
Every mammalian cell seems to contain 5-Hydroxymethylcytosine, but the levels vary significantly depending on the cell type. The highest levels are found in neuronal cells of the central nervous system. The amount of hydroxymethylcytosine increases with age, as shown in mouse hippocampus and cerebellum.

Function
The exact function of this nitrogen base is still not fully elucidated, but it is thought that it may regulate gene expression or prompt DNA demethylation. This hypothesis is supported by the fact that artificial DNA that contains 5-hydroxymethylcytosines (5hmC) can be converted into unmodified cytosines once introduced into mammalian cells. Moreover, 5hmC is highly enriched in primordial germ cells, where it apparently plays a role in global DNA demethylation. Additionally, 5-Formylcytosine, an oxidation product of 5-Hydroxymethylcytosine and possible intermediate of an oxidative demethylation pathway was detected in DNA from embryonic stem cells, although no significant amounts of these putative demethylation intermediates could be detected in mouse tissue. 5-Hydroxymethylcytosine may be especially important in the central nervous system, as it is found in very high levels there. Reduction in the 5-Hydroxymethylcytosine levels have been found associated with impaired self-renewal in embryonic stem cells. 5-Hydroxymethylcytosine is also associated with labile, unstable nucleosomes which are frequently repositioned during cell differentiation.

The accumulation of 5-hydroxymethylcytosine (5hmC) in post-mitotic neurons is associated with “functional demethylation” that facilitates transcription and gene expression. The term “demethylation,” as applied to neurons, ordinarily refers to the replacement of 5-methylcytosine (5mC) by cytosine in DNA that can occur through a series of reactions involving a TET enzyme as well as enzymes of the DNA base excision repair pathway (see Epigenetics in learning and memory). “Demethylation” of 5mC in DNA most often results in the promotion of expression of genes with neuronal activities. “Functional demethylation” refers to the replacement of 5mC by 5hmC, ordinarily a single-step TET-mediated reaction, that also facilitates gene expression, an effect similar to that of “demethylation.”

Bacteria and phages
Phages probably evolved to use 5hmC to avoid recognition by most restriction enzymes in bacteria. The T4 phage uses 5hmC exclusively during replication, adding glycosylation to the hydroxyl group to further complicate the moiety. Some bacteria have in turn evolved restriction enzymes specific for sites containing 5hmC. One prominent example is PvuRts1I, originally identified in 1994.

5hmC in T4 is produced by genome protein 42, deoxycytidylate 5-hydroxymethyltransferase (EC 2.1.2.8). The glycosylation reactions are known as EC 2.4.1.26, EC 2.4.1.27, and EC 2.4.1.28.

History
5-Hydroxymethylcytosine was observed by Skirmantas Kriaucionis, an associate at the Heintz lab, who was looking for levels of 5-methylcytosine in two different neuron types. He discovered a significant amount of an unknown substance instead, and after conducting several tests, identified it as being 5-hydroxymethylcytosine.

The lab of L. Aravind used bioinformatic tools to predict that the Tet family of enzymes would likely oxidize 5-methylcytosine to 5-hydroxymethylcytosine. This was demonstrated in vitro and in live human and mouse cells by scientists working in the labs of Anjana Rao and David R. Liu.

5-Hydroxymethylcytosine was originally observed in mammals in 1972 by R. Yura, but this initial finding is dubious. Yura found 5-hmC present at extremely high levels in rat brain and liver, completely supplanting 5-methylcytosine. This contradicts all research conducted on mammalian DNA composition conducted before and since, including the Heintz and Rao papers, and another group was unable to reproduce Yura's result.

With the discovery of 5-hydroxymethylcytosine some concerns have been raised regarding DNA methylation studies using the bisulfite sequencing technique. 5-hydroxymethylcytosine has been shown to behave like its precursor, 5-methylcytosine, in bisulfite conversion experiments. Therefore, bisulfite sequencing data may need to be revisited to verify whether the detected modified base is 5-methylcytosine or 5-hydroxymethylcytosine. In 2012 the lab of Chuan He discovered a method to solve the problems of 5-hydroxymethylcytosine being detected as 5-methylcytosine in normal bisulfite conversion experiments using the oxidative properties of the Tet-family of enzymes, this method has been termed TAB-seq.

In June 2020, Oxford Nanopore added a hydroxymethyl cytosine detection model to their research basecaller, rerio, allowing old signal-level data from any R9+ nanopore runs to be re-called to identify 5hmC.