RNA-directed DNA methylation

RNA-directed DNA methylation (RdDM) is a biological process in which non-coding RNA molecules direct the addition of DNA methylation to specific DNA sequences. The RdDM pathway is unique to plants, although other mechanisms of RNA-directed chromatin modification have also been described in fungi and animals. To date, the RdDM pathway is best characterized within angiosperms (flowering plants), and particularly within the model plant Arabidopsis thaliana. However, conserved RdDM pathway components and associated small RNAs (sRNAs) have also been found in other groups of plants, such as gymnosperms and ferns. The RdDM pathway closely resembles other sRNA pathways, particularly the highly conserved RNAi pathway found in fungi, plants, and animals. Both the RdDM and RNAi pathways produce sRNAs and involve conserved Argonaute, Dicer and RNA-dependent RNA polymerase proteins.

RdDM has been implicated in a number of regulatory processes in plants. The DNA methylation added by RdDM is generally associated with transcriptional repression of the genetic sequences targeted by the pathway. Since DNA methylation patterns in plants are heritable, these changes can often be stably transmitted to progeny. As a result, one prominent role of RdDM is the stable, transgenerational suppression of transposable element (TE) activity. RdDM has also been linked to pathogen defense, abiotic stress responses, and the regulation of several key developmental transitions. Although the RdDM pathway has a number of important functions, RdDM-defective mutants in Arabidopsis thaliana are viable and can reproduce, which has enabled detailed genetic studies of the pathway. However, RdDM mutants can have a range of defects in different plant species, including lethality, altered reproductive phenotypes, TE upregulation and genome instability, and increased pathogen sensitivity. Overall, RdDM is an important pathway in plants that regulates a number of processes by establishing and reinforcing specific DNA methylation patterns, which can lead to transgenerational epigenetic effects on gene expression and phenotype.

Biological functions
RdDM is involved in a number of biological processes in the plant, including stress responses, cell-to-cell communication, and the maintenance of genome stability through TE silencing.

Transposable element silencing and genome stability
TEs are pieces of DNA that, when expressed, can move around the genome through a copy-and-paste or cut-and-paste mechanism. New TE insertions can disrupt protein coding or gene regulatory sequences, which can harm or kill the host cell or organism. As a result, most organisms have mechanisms for preventing TE expression. This is particularly key in plant genomes, which are often TE-rich. Some plant species, including important crops like maize and wheat, have genomes consisting of upwards of 80% TEs. RdDM plays a key role in silencing these mobile DNA elements in plants by adding DNA methylation over new TE insertions and constantly reinforcing DNA methylation over existing TEs, inhibiting transposition and maintaining long-term genome stability. Although the RdDM mechanism itself is unique to plants, using DNA methylation to silence TEs is a common strategy among eukaryotes.

RdDM primarily targets small TEs and TE fragments near genes, which are usually in open, accessible euchromatic regions of the genome that are permissive for gene expression. In these regions, the ‘active’ chromatin state has a tendency to spread from expressed genes to nearby repressed regions, like TEs, which can cause these TEs to become activated and transpose. Continuous activity by RdDM opposes the spread of active chromatin, maintaining a silent, repressive heterochromatic state over TEs in these otherwise euchromatic regions. In turn, RdDM activity recruits other pathways that help establish and propagate the silent, heterochromatic state (see 'Interactions between RdDM and other chromatin modifying pathways'). Because of the self-reinforcing nature of these silencing pathways, excessive RdDM activity can also cause the silent, heterochromatic chromatin state over TEs to spread to nearby genes and repress them, with potentially harmful consequences for the organism. Therefore, RdDM activity must be finely tuned to maintain a balance between repressing TEs and allowing expression of nearby genes.

In addition to maintaining stable silencing of TEs, RdDM can also initiate transcriptional silencing of foreign DNA, including novel TE insertions, virus-derived sequences, and transgenes (also see 'Biotic stresses' and 'Transgene silencing' below). When TEs integrate near genes, RdDM-mediated silencing of the TEs often affects gene expression. However, this is not always deleterious, and can sometimes be overcome by other processes, or alter gene expression in ways beneficial to the plant. Over evolutionary time, beneficial TEs can become an important part of the mechanism by which a gene is regulated. In one example, the gene ROS1 lies adjacent to a small helitron TE that is normally methylated by RdDM. While DNA methylation is normally associated with transcriptional repression, this is not the case at the ROS1 locus. Instead, methylation of the helitron TE promotes ROS1 expression, so ROS1 expression is lost in mutants of the RdDM pathway that cannot methylate the TE. Interestingly, ROS1 encodes a DNA glycosylase that functions to remove DNA methylation from the genome. The link between ROS1 expression and RdDM activity at this TE ensures that DNA methylation and demethylation activities remain in balance, helping to maintain DNA methylation homeostasis genome-wide. Thus, RdDM-mediated regulation of TEs can lead to beneficial regulatory outcomes.

Some TEs have evolved mechanisms to suppress or escape RdDM-based silencing in order to facilitate their own proliferation, leading to an evolutionary arms race between TEs and their host genomes. In one example, a TE-derived sequence was found to produce sRNAs that trigger post-transcriptional repression of a component of the RdDM pathway, inhibiting RdDM. This sequence may have helped the original TE escape RdDM-based silencing and insert itself into the host genome.

Studying how RdDM targets and represses different types of TEs has led to many major insights into how the RdDM mechanism works. The retrotransposon EVADÉ (EVD) was one of the first TEs specifically shown to be repressed by RdDM-derived sRNAs. Later work used EVD to trace the mechanism by which a novel TE insertion became silenced, revealing an important mechanistic link between post-transcriptional gene silencing and RdDM. Studies of other retrotransposons, including ONSEN, which is regulated by both RdDM and heat stress, and Athila family TEs, among many others, have also provided valuable insights into RdDM-mediated TE silencing.

Development and reproduction
A number of epigenetic changes required for normal development and reproduction in flowering plants involve RdDM. In a well-studied example, RdDM is required for repression of the FWA gene, which allows for proper timing of flowering in Arabidopsis. The FWA promoter contains tandem repeats that are usually methylated by RdDM, leading to transcriptional repression. Loss of this methylation re-activates FWA expression, causing a late-flowering phenotype. The loss of DNA methylation and associated late-flowering phenotype can be stably transmitted to progeny. Since the demethylated fwa allele leads to a stable, heritable change in the expression of FWA without any change to the DNA sequence, it is a classic example of an epiallele.

Mutations in the RdDM pathway can strongly affect gamete formation and seed viability, particularly in plant species with high TE content like maize and Brassica rapa, highlighting the importance of this pathway in plant reproduction. During gamete formation, it has been hypothesized, and in some cases shown, that RdDM helps reinforce TE silencing in the germ cells. In both pollen and ovules, a support cell undergoes epigenetic reprogramming, losing DNA methylation and other epigenetic marks at a number of loci, including TEs. This causes TE re-activation and encourages the production of RdDM-derived sRNAs against these TEs in the support cells. The sRNAs are then thought to move from the support cell to the germ cell in order to reinforce TE silencing in the next generation. This phenomenon has been observed in pollen, but has yet to be shown definitively in the ovule. This role for sRNAs in plants resembles the role of piRNAs in germline development in Drosophila and some other animals. A similar phenomenon may also occur in roots to preserve TE silencing in important stem cell populations. The RdDM pathway is also involved in regulating imprinted expression at some genes. This unusual parent-of-origin-specific expression pattern occurs at several loci in the endosperm during seed development in flowering plants. A few factors involved in the RdDM pathway are themselves imprinted (favoring expression from the paternal allele) in diverse species, including A. thaliana, A. lyrata, C. rubella, and maize. RdDM also plays a role in mediating the gene dosage effects seen in seeds derived from interploid crosses, though the mechanism for this remains largely unknown.

There is also evidence that RdDM plays a role in several other aspects of plant development, including seed dormancy, fruit ripening, and other pathways involved in flowering. However, most of these data are correlative, and further study is necessary to understand the role of RdDM in these processes.

Abiotic stresses
RdDM helps plants respond to a number of abiotic stresses, such as heat stress, drought, phosphate starvation, salt stress, and others. Many TEs become upregulated under abiotic stress conditions, and thus one function of RdDM in stress response is to help counter this activation. In one example, the retrotransposon ONSEN is upregulated by heat stress, but normally remains suppressed by RdDM-associated sRNAs and can only transpose efficiently in heat-stressed plants that are also deficient in RdDM. More generally, in plants exposed to heat stress, several components of the RdDM pathway become upregulated, and mutations in some components of the RdDM machinery reduce heat tolerance, suggesting RdDM plays an important role during heat stress. In addition to regulating TEs under stress conditions, RdDM can also regulate genes in order to trigger appropriate stress responses. Under low humidity, leaves produce fewer stomata due to RdDM-mediated downregulation of two genes involved in stomatal development. Similarly, RdDM becomes downregulated in response to salt stress, and this has been shown to trigger the expression of a transcription factor important in salt stress resistance.

Biotic stresses
RdDM was initially discovered as a response to infection by viroids, and along with RNAi plays an important role in defending the plant against viroids and viruses. The RdDM and RNAi machinery recognize viral RNAs and process them into sRNAs, which can then be used both pathways to degrade viral RNA (RNAi) and silence viral DNA (RdDM). However, little is known about how the RdDM and RNAi machinery distinguish between viral RNAs and RNAs produced by the host plant. Mutants defective in RdDM and other methylation-deficient mutants are often hypersensitive to viral infection. Virus-host interactions are another example of an evolutionary arms race, and many plant viruses encode suppressors of both RdDM and RNAi in an attempt to evade the host plant's defenses.

RdDM is also involved in protecting the plant from other biotic stresses, including bacterial infections, fungal infections, and predation. Loss of RdDM can have opposing effects on resistance for different pathogens. For example, some RdDM mutants have increased susceptibility to the bacterium Agrobacterium tumefaciens, but those same mutants have decreased susceptibility to the bacterium Pseudomonas syringae, highlighting the complexity of the different pathogen defense pathways and their interactions with RdDM.

Transgene silencing
In addition to naturally-occurring foreign nucleic acid stressors like TEs and viruses, artificially introduced DNA sequences, like transgenes, are also targeted for repression by RdDM. Transgenes are widely used in genetics research to study gene function and regulation, and in plant breeding to introduce novel and desirable properties into a plant. Transgene silencing by RdDM and other mechanisms has therefore proved problematic for plant researchers. Efforts to understand how transgenes become silenced have ultimately helped reveal much of what we now know about the RdDM pathway (see 'History and discovery of RdDM'). In one early example, researchers sequentially transformed plants with two different transgenes that shared some of their DNA sequence. They found that transforming the second transgene into the plants led to the first transgene gaining DNA methylation and becoming inactivated. This provided an early clue that there existed a trans-acting, sequence-based mechanism for transcriptional silencing of foreign DNA, later shown to be RdDM.

Stress and RdDM-mediated epigenetic ‘memory’
Due to the heritability of DNA methylation patterns in plants, and the self-reinforcing nature of RdDM and other DNA methylation pathways, any DNA methylation changes caused by environmental stressors have the potential to be maintained and transmitted to future generations. This can allow stress-induced DNA methylation changes to act as a ‘memory’ of the stressor and help prime the plant or its progeny to respond more efficiently to the stress if re-exposed. For example, RdDM-derived sRNAs against TEs or viruses that have already integrated into the genome and been silenced serve as a 'memory' of those prior infections, protecting against future invasions by similar sequences. There is also evidence that DNA methylation changes due to other stressors, such as salt or heat stress, can persist in the progeny of stressed plants even in the absence of the original stressor. In this study, the persistence of the stress-induced DNA methylation changes required several RdDM-related proteins, suggesting that RdDM was involved in maintaining the stress-altered DNA methylation patterns. In another example, resistance to insect attack was transmitted to progeny via DNA methylation changes, and this inheritance was also dependent on functional sRNA biogenesis pathways. Thus, RdDM can potentially alter the plant epigenome in response to stress, and helps maintain these changes to modulate future stress responses in the affected plant and its descendants.

Short and long-range signaling
The sRNA molecules produced by RdDM and other pathways are able to move between cells via plasmodesmata, and can also move systemically through the plant via the vasculature. They therefore have the potential to act as signaling molecules. This has been demonstrated in plants engineered to express green fluorescent protein (GFP). The GFP protein produced by these plants caused them to glow green under certain light conditions. When tissue from a second plant expressing a sRNA construct complementary to GFP was grafted onto the GFP-expressing plant, the GFP fluorescence was lost: after grafting, the sRNAs being produced in the second plant's tissues were moving into the tissues of the first, GFP-expressing plant, and triggering silencing of GFP. The same study showed that a subset of these mobile sRNAs were triggering the addition of DNA methylation to the GFP locus via RdDM. Therefore, sRNAs involved in RdDM can act as signaling molecules and trigger the addition of DNA methylation at complementary loci in cells far away from where the sRNAs were originally generated. Since then, studies have shown that sRNAs can move and direct RdDM both from shoot to root and root to shoot, though the silencing effect is more robust when sRNAs move from shoot to root.

Movement of sRNAs that drive RdDM activity plays an important role in plant development, including during reproduction  and root development. In both cases, sRNA movement seems to function primarily as a way to reinforce DNA methylation and silencing of TEs in developmentally important cell types, like germ cells and stem cells. Silencing TEs and maintaining genome integrity in these cells is particularly important because they give rise to many other cells, all of which will inherit any defects or mutations in the original stem cell or germ cell. sRNA movement is also involved in plant-pathogen interactions: sRNAs can move from infected cells to distal uninfected tissues in order to prime a defense response, though to date this has only been shown for RNAi, not RdDM.

Pathways and mechanisms
This section focuses on the pathways and mechanisms by which RdDM leads to sequence-specific DNA methylation. The pathways presented here were characterized primarily in the model plant Arabidopsis thaliana, but are likely similar in other angiosperms. Conservation of RdDM in other plant species is discussed in more detail in 'Evolutionary conservation' below.

DNA methylation context


RdDM is the only mechanism in plants that can add DNA methylation to cytosines regardless of sequence context. DNA methylation in plants is typically divided into three categories based on the sequence context of the methylated cytosine: CG, CHG, and CHH, where H is any nucleotide except G. These reflect the different sequence contexts targeted by several DNA methylation pathways in plants. These context-specific pathways are primarily involved in maintaining existing DNA methylation patterns. The highly conserved methyltransferase MET1 (homolog of mammalian DNMT1) maintains DNA methylation in the CG context, while two conserved plant-specific methyltransferases, Chromomethylase 3 (CMT3) and CMT2, help maintain CHG and CHH methylation, respectively. Unlike these pathways, RdDM leads to the addition of DNA methylation at all cytosines regardless of their sequence context. Like MET1, CMT2 and CMT3, RdDM is primarily involved in maintaining existing DNA methylation patterns. However, RdDM is also the only pathway capable of adding DNA methylation de novo to previously unmethylated regions in plants.

Mechanism
The RdDM pathway can be split up into two main processes: the production of sRNAs, and the recruitment of DNA methylation machinery by those sRNAs to specific target loci in the DNA. These two activities together constitute RdDM, and ultimately lead to DNA methylation being added to cytosines at specific target loci.



Canonical RdDM
The canonical RdDM pathway is, as its name suggests, the most well-characterized RdDM pathway to date. Canonical RdDM is preferentially recruited to regions that are already DNA-methylated and heterochromatic, and acts to reinforce existing DNA methylation patterns at these loci, forming a positive feedback loop. Canonical RdDM makes up the majority of RdDM activity in a cell.

sRNA production
The first part of the RdDM pathway revolves around the biogenesis of sRNAs. A plant-specific RNA polymerase complex, RNA Polymerase IV (Pol IV), is first recruited to silent heterochromatin via its interaction with CLASSY (CLSY) proteins and SAWADEE homeodomain homolog 1 (SHH1) (also see 'Interactions between RdDM and other chromatin modifying pathways' below). Pol IV transcribes these regions to produce short single-stranded RNAs (ssRNAs) roughly 30 to 45 nucleotides in length, each of which is the precursor for a single sRNA. These ssRNAs are converted into double-stranded RNAs (dsRNAs) co-transcriptionally by RNA-directed RNA polymerase 2 (RDR2), which physically associates with Pol IV. The dsRNAs are then cleaved by the endoribonuclease Dicer-like 3 (DCL3) into 24 nucleotide (nt) sRNAs. Pol IV, RDR2, and DCL3 alone are sufficient for the production of 24 nt sRNAs in vitro, suggesting that while other factors involved in this part of the pathway may help increase efficiency or specificity, they are not required for Pol IV-mediated sRNA production.

While nearly all 24 nt sRNAs involved in RdDM are produced through the Pol IV-RDR2-DCL3 pathway, a small proportion are produced through other pathways. For example, some RNA Polymerase II (Pol II) transcripts that contain an inverted repeat sequence form double-stranded hairpin structures that can be directly cleaved by DCL3 to form 24 nt sRNAs.

DNA methylation of target loci
In the second part of the pathway, the RdDM DNA methylation machinery is guided to DNA sequences complementary to the sRNAs generated in the first part of the pathway. One strand from each 24 nt double-stranded sRNA is loaded into Argonaute (AGO) proteins AGO4, AGO6, or AGO9. AGO3 may also be able to function in this pathway. Argonautes are a large, highly conserved family of proteins that can bind sRNAs, forming a protein-sRNA duplex that enables them to recognize and bind other RNA sequences complementary to their sRNA partner. Once formed, the AGO-sRNA duplex finds and binds complementary sequences along an RNA ‘scaffold’ produced by the plant-specific RNA polymerase V (Pol V), with the help of interactions with Suppressor of Ty insertion 5-like (SPT5L), the Involved in de novo 2 - IDN2 Paralog (IDN2-IDP) complex, and the Pol V subunit NRPE1. This leads to the recruitment of the DNA methyltransferase enzyme Domains Rearranged Methyltransferase 2 (DRM2), which methylates nearby DNA. The mechanism by which the AGO-sRNA duplex recruits DRM2 is not yet well understood.

Non-canonical RdDM
Recent work has revealed a number of variations of the RdDM pathway, collectively referred to as non-canonical RdDM. Unlike canonical RdDM, the non-canonical pathways are generally involved in establishing initial DNA methylation at new target loci, like novel TE insertions, rather than maintaining existing heterochromatin. Actively expressing elements like new TE insertions are normally strongly targeted by post-transcriptional gene silencing (PTGS/RNAi) pathways. Non-canonical RdDM occurs primarily as a byproduct of these PTGS pathways, leading to the initial establishment of a silent, heterochromatic state over the new TE or other target locus. Once that initial silent state is established, Pol IV can be recruited to the locus by CLSY and SHH1, and the canonical RdDM pathway takes over the long-term maintenance of silencing. Therefore, the non-canonical RdDM pathways often act as a temporary bridge between initial post-transcriptional silencing of novel elements by RNAi, and long-term transgenerational transcriptional silencing via canonical RdDM. Consistent with this role in initiation of novel silencing, non-canonical RdDM targets relatively few loci in comparison to canonical RdDM.

The primary difference between the canonical and non-canonical RdDM pathways lies in the origin and biogenesis of the sRNAs involved. The canonical RdDM pathway involves 24 nt sRNAs, which are specific to that pathway and come predominantly from a single source (the Pol IV-RDR2 complex). In contrast, the non-canonical RdDM pathways involve 21-22 nt sRNAs from a variety of sources, allowing de novo DNA methylation to be initiated at many different types of loci. These 21-22 nt sRNAs are not specific to non-canonical RdDM, and also function in other PTGS pathways. In fact, only a small fraction of 21-22nt sRNAs are involved in RdDM, with the majority instead driving a positive feedback loop amplifying the PTGS response. The functional outcome of a specific 21-22 nt sRNA depends on the AGO protein it ultimately associates with: sRNAs that associate with AGO4, AGO6 or AGO9 result in RdDM and DNA methylation, while sRNAs that associate with other AGOs, like AGO1, primarily result in PTGS.

By using 21-22 nt sRNAs derived from a variety of sources, non-canonical RdDM can flexibly induce de novo DNA methylation and silencing at many different types of loci. One of the primary sources of 21-22 nt sRNAs is Pol II transcripts. Some of these transcripts, particularly those produced from TEs, viruses, or certain non-protein-coding transcripts, are targeted by PTGS pathways like miRNAs or RNAi, leading to cleavage of the transcript. The resulting fragments can be converted into dsRNA by RDR6 and then processed into 21-22 nt sRNAs by DCL2 or DCL4. Most of these 21-22 nt sRNAs are loaded into AGO1 and feed back into PTGS, amplifying PTGS efficiency. However, some will instead associate with AGO6, leading to RdDM. dsRNAs resulting from RDR6 activity can also sometimes processed by DCL3 instead of DCL2/4 and trigger RdDM. Additionally, some Pol II transcripts contain inverted repeat sequences, which can form double-stranded hairpin-like structures. These can be cleaved by DCL proteins independent of RDRs to produce either 21-22 nt or 24 nt sRNAs that can participate in RdDM. Similarly, miRNA precursors, which also form hairpin structures and are normally cleaved by DCL1 to produce miRNAs, can instead be cleaved by other DCLs to form sRNAs for RdDM. While most non-canonical RdDM occurs via AGO6 or AGO4, there is also a version of the pathway where sRNAs instead associate with AGO2, which together with the NERD complex (Needed for RDR2-independent DNA methylation) recruits DRM2 to target loci and triggers DNA methylation. Since the non-canonical pathways are not yet as well characterized as the canonical RdDM pathway, there likely remain additional sources of sRNAs used for RdDM that have not yet been uncovered.

Factors involved
A number of factors involved in RdDM are listed below, along with additional details about their function and corresponding references. Several factors primarily involved in PTGS that sometimes participate in RdDM are also listed.

Interactions with other chromatin modifying pathways
Different chromatin states, like active euchromatin or silent heterochromatin, are defined by a combination of specific histone modification and DNA methylation patterns. Repressive chromatin modifications, like DNA methylation, help promote DNA compaction and reduce DNA accessibility, while other modifications help open chromatin and increase accessibility. Methylation of the 9th lysine of histone H3 (H3K9), primarily in the form of H3K9 trimethylation (H3K9me3) in animals and H3K9 dimethylation (H3K9me2) in plants, is a highly conserved repressive modification. Lack of H3K4 methylation (H3K4me0) is also associated with repression, along with several other histone modifications and variants. The combination of DNA methylation, H3K9me2, and H3K4me0 is strongly associated with heterochromatin in plants.

Since DNA methylation and repressive histone modifications together define heterochromatin, most DNA methylation pathways in plants recognize and interact with repressive histone marks and vice versa, forming positive feedback loops that help maintain the repressive chromatin state. The RdDM-associated protein SHH1 recognizes H3K4me0 and H3K9me2 at heterochromatic loci and recruits Pol IV to these loci to trigger additional DNA methylation at these regions. Similarly, SUVH2 and SUVH9 help recruit Pol V to loci with DNA methylation. Thus, both major parts of the canonical RdDM pathway are preferentially recruited to regions that are already in the silent, heterochromatic state marked by DNA methylation, H3K9me2, and H3K4me0. DNA methylation at these same heterochromatic loci is also recognized by the histone methyltransferases SUVH4/KYP, SUVH5, and SUVH6, which bind to non-CG methylation and add H3K9me2 to nearby histones, closing the positive feedback loop. Similarly, CMT3 and CMT2, the two DNA methyltransferases involved in the maintenance of CHG and CHH methylation respectively, both bind and add DNA methylation to H3K9me2-marked heterochromatin, forming their own feedback loop with SUVH4/5/6. These interactions help strongly reinforce silencing at TEs and other heterochromatic regions.

A similar feedback loop occurs in animals. HP1 plays a vital role in maintaining heterochromatin by propagating H3K9 methylation through a positive feedback loop with the H3K9 methyltransferase SUV39H. H3K9 methylation recruits HP1, which recruits SUV39H to deposit more H3K9 methylation. Though HP1 is conserved in plants, its function in this feedback loop is not. Instead, the positive feedback loops between H3K9me2 and the RdDM and CMT2/3 DNA methylation pathways fulfill a similar function in propagating H3K9me2. More recently, a plant-specific protein, Agenet Domain Containing Protein 1 (ADCP1), was also identified that may function analogously to HP1 in maintaining H3K9me2 levels in heterochromatin, facilitating heterochromatin formation.

Ultimately, the constant reinforcement of silencing chromatin modifications at heterochromatic loci creates a repressive chromatin state wherein the DNA and histones (nucleosomes) become tightly packed together. This helps silence gene expression by physically inhibiting access to the DNA, preventing RNA Polymerase II, transcription factors and other proteins from initiating transcription. However, this same compaction also prevents factors involved in heterochromatin maintenance from accessing the DNA, which could lead to the silent, compact state being lost. This is particularly true in the dense constitutive heterochromatin surrounding the centromere. In these regions, the chromatin remodeler DDM1 plays a crucial role in DNA methylation maintenance by displacing nucleosomes temporarily to allow methyltransferases and other factors access the DNA. However, since most RdDM targets are small TEs in open, accessible and gene-rich regions (see “TE silencing and genome stability”), few RdDM sites require DDM1. In fact, dense heterochromatin inhibits RdDM. By contrast, CMT2 and CMT3 preferentially function in constitutive heterochromatin and depend strongly on DDM1 to maintain silencing over these regions. Similarly, MET1, which maintains DNA methylation at CG sites after replication, requires DDM1 to access heterochromatin and maintain CG methylation in those regions. Thus, DDM1 is a key regulator of DNA methylation in dense heterochromatin, but regulates sites mostly independently from RdDM.

Interactions between RdDM and the other three maintenance DNA methylation pathways are limited and predominantly indirect. The DNA methyltransferase MET1 robustly maintains CG methylation genome-wide, including at RdDM target sites. In RdDM mutants, non-CG methylation at RdDM target sites is lost, but CG methylation is still maintained, suggesting that MET1 activity is independent of RdDM. However, although met1 mutants lose CG methylation as expected, they also lose much of their non-CG methylation, including at RdDM target loci. At these sites, silencing can still be initiated by RdDM in met1 mutants, but it is not maintained or transmitted to progeny, suggesting that MET1 is important for the maintenance, but not initiation, of silencing at a subset of RdDM target loci. This effect is likely indirect: loss of MET1 leads to loss of H3K9me2 at some sites, which inhibits the recruitment of Pol IV and therefore prevents maintenance of DNA methylation via canonical RdDM, although the non-canonical pathways (which do not involve Pol IV) are not affected. Loss of the histone deacetylase HDA6, which facilitates maintenance methylation by MET1 at some loci, has a similar effect, suggesting that multiple different factors involved in maintaining heterochromatin likely facilitate RdDM-mediated DNA methylation maintenance.

Loss of RdDM leads to strong loss of non-CG methylation at TEs in gene-rich regions in the chromosome arms, but has little effect on DNA methylation levels in the constitutive heterochromatin around the centromere. This suggests that CMT2 and CMT3, which function primarily to maintain CHG and CHH methylation in dense constitutive heterochromatin, do not depend on RdDM activity. Similarly, in cmt2,cmt3 double mutants, many TEs in the chromosome arms remain methylated, presumably due to the persistent activity of RdDM, indicating that loss of CMT2/3 has little effect on RdDM activity. This suggests that RdDM and CMT2/3 function mostly independently and at distinct loci: RdDM is the main pathway responsible for maintaining non-CG DNA methylation in euchromatic, gene rich regions, while CMT2 and CMT3 maintain non-CG DNA methylation in constitutive heterochromatin. In mutants defective in both RdDM and CMT2/CMT3, all non-CG methylation in the genome is eliminated, demonstrating that together RdDM and CMT2/CMT3 account for all non-CG methylation in the genome.

Balance between DNA methylation and demethylation
Most DNA methylation mechanisms in plants are self-reinforcing (see above), including RdDM: Pol IV and Pol V are both recruited to heterochromatic regions that already have DNA methylation, encouraging additional DNA methylation via canonical RdDM. Positive feedback loops like these can cause DNA methylation activity to spread out from the intended methylated target sites into genes or other regulatory elements, which can negatively affect gene expression. To prevent this spreading, DNA methylation pathways are opposed by passive and active DNA demethylation. DNA methylation can be lost passively with each cell division, because newly synthesized strands of DNA lack DNA methylation until it is re-added by one of the maintenance DNA methylation pathways. DNA methylation can also be actively removed in plants by DNA glycosylases, which remove methylated cytosines via the base excision repair pathway. In Arabidopsis, there are four proteins responsible for removing DNA methylation: Repressor of silencing 1 (ROS1), Demeter (DME), Demeter-like 2 (DML2), and Demeter-like 3 (DML3). These DNA glycosylases help prevent the spread of DNA methylation from RdDM targets to active genes. Loss of active DNA demethylation in ros1;dml2;dml3 triple mutants leads to a widespread increase in DNA methylation levels, whereas ectopic expression of ROS1 leads to progressive loss of DNA methylation at many loci, highlighting the importance of balancing DNA methylation and demethylation activity.

Interestingly, expression of the DNA demethylase ROS1 is directly tied to RdDM activity: DNA methylation over a TE targeted by RdDM in the ROS1 promoter is required for ROS1 expression, though other factors are also involved in regulating ROS1. Since ROS1 expression is tied to DNA methylation at a specific TE, ROS1 expression is also strongly reduced in plants with defective RdDM that lose the ability to methylate that TE and others. This general mechanism helps maintain DNA methylation homeostasis by tuning DNA demethylation activity to DNA methylation activity, helping to ensure that DNA methylation patterns can be stably maintained over time.

Origins of RdDM pathway members
While all eukaryotes share three RNA polymerases (RNA Pol I, II and III), plants have two additional polymerases, Pol IV and Pol V. Both Pol IV and V share an evolutionary origin, deriving from Pol II. In other eukaryotic kingdoms that lack these two specialized RNA polymerases, Pol II transcribes the precursors of small RNAs used in silencing pathways – in fact, Pol II transcripts are also sometimes processed into sRNAs in plants. It has been hypothesized that the origin of both Pol IV and Pol V is rooted in “escape from adaptive conflict”. The idea is that potential tensions between the “traditional” function of Pol II and the small RNA biogenesis function could be relieved by duplication of Pol II and subfunctionalization of the resulting multiple RNA polymerases.

Analyses of evolutionary lineage for Pol IV and Pol V are complicated to some extent by the fact that each enzyme is actually composed of at least 12 subunits. In Arabidopsis thaliana, some subunits are shared between Pol IV and Pol V, some are unique to each polymerase, and some are shared between Pol II, IV, and V. Orthologs of certain Pol IV and V subunits have been found in all lineages of land plants, including ferns, liverworts, and mosses. These findings argue for a shared origin of Pol IV and V dating back to early land / vascular plants.

Much of the work done to elucidate the genes and proteins involved in the RdDM pathway has been performed in Arabidopsis thaliana, a model angiosperm. However, studies of Pol IV and V conducted in maize show some key differences with Arabidopsis. Maize Pol IV and V differ from each other in terms of only one subunit (the largest one). In Arabidopsis, Pol IV and V differ from each other in terms of three subunits. However, maize utilizes a set of interchangeable catalytic subunits – two in the case of Pol IV and three in the case of Pol V – that provide additional specialization of polymerase functionality. While differences exist, overall there is a broad overlap in RdDM functions and components between the different angiosperm species studied to date.

Outside of Pol IV and Pol V, a large proportion of key RdDM component proteins (for example, DCL3 and AGO4) have orthologs found within each class of land plants, which provides support for the hypothesis that some form of the RdDM pathway evolved early within the plant lineage. However, RdDM pathway functionality does appear to change to an appreciable extent between different plant species and lineages. For example, while gymnosperms have functional Pol IV and produce 24 nt small RNAs, the biogenesis of sRNAs within gymnosperms is much more heavily skewed towards 21 nt than 24 nt sRNAs. This suggests that canonical RdDM may be rarer or less pronounced in gymnosperms than in angiosperms. Similarly, while orthologs of DRM2 are found in various angiosperms, there are no known DRM2 orthologs in other plant lineages. One possibility is that angiosperms have the “most complete” version of the RdDM pathway, with all other plant lineages possessing robust and functional subsets of the pathway. However, since nearly all of the work on RdDM has been done in angiosperms, it is also possible that alternative versions of RdDM in other lineages have simply not yet been uncovered, particularly if these alternative versions include different proteins or proteins without clear homologs in angiosperms.

Relationships with sRNA silencing pathways in other kingdoms
All eukaryotic kingdoms host some form of small RNAs. One such class of sRNAs is the Piwi-interacting RNAs (piRNAs). Much like in RdDM, piRNAs primarily function to target and silence transposons, particularly in the germline. However, piRNAs are only found in animals, are longer than the small RNAs functioning in RdDM (24-32 nucleotides), and mediate their functions through interactions with a different subclass of AGO proteins, the PIWI subfamily, which are absent from plants. MicroRNAs (miRNAs) are another class of small RNA with silencing properties. While miRNAs are in a similar size range as RdDM sRNAs (~21 nt), miRNAs associate with a distinct set of Argonaute proteins that silence target RNAs by initiating their degradation or blocking their downstream translation into proteins, rather than recruiting DRM2 to add DNA methylation to nearby DNA. Both RdDM and the miRNA pathways involve related proteins from the Argonaute and Dicer families.

Perhaps the most analogous pathways to RdDM in another eukaryotic kingdom are the sRNA directed transcriptional gene silencing (TGS) and co-transcriptional gene silencing (CTGS) pathways in Schizosaccharomyces pombe. In S. pombe, TGS directs methylation of H3K9, leading to heterochromatin formation, and is directed by sRNAs produced from the targeted regions. Similar to canonical RdDM, this pathway is a positive feedback loop: sRNAs are generated preferentially from heterochromatin-rich areas of the genome, and these sRNAs direct the addition of K3K9 methylation to maintain/spread heterochromatin. Meanwhile, CTGS is directed by AGO1-bound sRNAs, similar to PTGS within plants, and results in the inhibition of transcription by Pol II, as well as to Pol II release. Unlike RdDM, TGS and CTGS in S. pombe do not rely on transcription from non-Pol II sources or lead to the addition of DNA methylation. However, the S. pombe pathways and RdDM share many of the same components, like RNA-directed RNA polymerases and sRNAs, and have similar functions in maintaining heterochromatin.

History
Introducing transgenes into organisms has been a widely used tool in plant genetics research for decades. However, researchers often find that their introduced transgenes are not expressed as strongly as expected, or sometimes even at all, a phenomenon called transgene silencing. The discovery of transgene silencing in the 1990s spurred a great deal of interest in understanding the mechanisms behind this silencing. Researchers found that transgene silencing was ubiquitous, occurring in multiple species (including Arabidopsis, Tobacco, and Petunia), and was associated with increased DNA methylation over and around the silenced transgene.

Around the same time in 1994, work in tobacco plants had revealed a new pathway involving RNAs that resulted in DNA methylation. Researchers found that when viroids were introduced into the plant and integrated into the plant genome, the viroid sequences, but not the host genome, gained DNA methylation. The deposition of methylation over these foreign viroid sequences helped inhibit viroid replication, and was therefore thought to represent a plant pathogen defense mechanism. The evidence suggested that the viroid RNAs produced during viroid replication were being used by the plant as a template to help target DNA methylation to the viroid sequences. This mechanism was therefore named RNA-directed DNA methylation, or RdDM.

RdDM turned out to be the solution to the transgene mystery: like viroids and viruses, transgenes are foreign sequences, and as a result they are often recognized as foreign invaders and targeted for silencing by RdDM and PTGS. Since transgene silencing was a reliable marker of RdDM activity, researchers were able to design genetic screens to identify mutants that failed to trigger silencing at transgenes, reasoning that these genes were likely to be involved in the RdDM pathway. These experiments revealed many parts of the pathway, including RNA Pol IV and V, Dicer-like proteins, Argonautes, and others.

The involvement of sRNAs in RdDM was initially suspected due to the similarity between RdDM and RNAi, the latter of which had recently been shown to involve small RNAs. To test whether sRNAs were involved in RdDM, RNA hairpin structures complementary to a specific gene promoter were introduced into Arabidopsis and Tobacco. The hairpin RNAs were processed into sRNAs, which were able to trigger the addition of DNA methylation to the targeted promoter and silence the gene. This demonstrated that sRNAs could direct DNA methylation to specific loci. Later efforts showed that the sRNAs involved in RdDM were approximately 24-26 nt long, while the sRNAs associated with RNAi were only about 21-22 nt in length. Soon after, the identification of AGO4 and characterization of its role in RdDM led to predictions, later confirmed, that 24 nt sRNAs were associating with AGO4 and directing DNA methylation to complementary loci.

Early work on transgene silencing and RdDM also identified SDE4 as required for the production of most sRNAs involved in RdDM. SDE4 would later be identified as the largest subunit of Pol IV, and renamed NRPD1. A number of studies published in quick succession from multiple research groups, utilizing both forward and reverse genetic approaches, went on to identify and characterize Pol IV and Pol V as highly specialized plant RNA polymerases involved in RdDM. The Pol IV / Pol V naming convention was adopted shortly thereafter.

Potential biotechnology applications
Since the mechanism underlying the sequence-specificity of RdDM is well known, RdDM can be ‘tricked’ into targeting and silencing endogenous genes in a highly specific manner, which has a number of potential biotechnological and bioengineering applications. Several different methods can be used to trigger RdDM-based DNA methylation and silencing of specific genes. One method, called virus-induced gene silencing (VIGS), involves inserting part of the promoter sequence of the desired target gene into a virus. The virus will reproduce the chunk of promoter sequence as part of its own RNA, which is otherwise foreign to the plant. Because the viral RNA is foreign, it will be targeted for PTGS and processed into sRNAs, some of which will be complementary to the original target gene's promoter. A subset of these sRNAs will recruit the RdDM machinery to the target gene to add DNA methylation. In one study, researchers used this method with an engineered Cucumber Mosaic Virus to recruit RdDM to silence a gene that affected flower pigmentation in petunia, and another that affected fruit ripening in tomato. In both cases, they showed that DNA methylation was added to the locus as expected. In petunia, both the gain of DNA methylation and changes in flower coloration were heritable, while only partial silencing and heritability were observed in tomato. VIGS has also been used to silence the FLOWERING WAGENINGEN (FWA) locus in Arabidopsis, which resulted in plants that flowered later than normal. The same study also showed that the inhibitory effect of VIGS on FWA and flowering can become stronger over the course of successful generations.

Another method to target RdDM to a desired target gene involves introducing a hairpin RNA construct that is complementary to the target locus. Hairpin RNAs contain an inverted repeat, which causes the RNA molecule to form a double-stranded RNA (dsRNA) structure called an RNA hairpin. The dsRNA hairpin can be processed by DCL proteins into sRNAs which are complementary to the target locus, triggering RdDM at that locus. This method has been used in several studies.

Changes induced by RdDM can sometimes be maintained and inherited over multiple generations without outside intervention or manipulation, suggesting that RdDM can be a valuable tool for targeted epigenome editing. Recent work has even bypassed RdDM altogether by artificially tethering DRM2 (or other components of the RdDM pathway) directly to specific target loci, using either zinc finger nucleases or CRISPR. In these experiments, tethering the RdDM machinery to a specific locus led to gain of DNA methylation at the target site that was often heritable for multiple generations, even once the artificial construct was removed through crossing. For all of these methods, however, more work on minimizing off-target effects and increasing DNA methylation efficiency is needed.

Genetically Modified Organisms (GMOs) have played a large role in recent agricultural research and practice, but have proven controversial, and face regulatory barriers to implementation in some jurisdictions. GMOs are defined by the inclusion of “foreign” genetic material into the genome. The treatment of plants with engineered RNAs or viruses intended to trigger RdDM does not change the underlying DNA sequence of the treated plant's genome; only the epigenetic state of portions of the DNA sequence already present are altered. As a result, these plants are not considered GMOs. This has led to efforts to utilize RdDM and other RNA-mediated effects to induce agriculturally-beneficial traits, like altering pathogen or herbicide susceptibility, or speeding up plant breeding by quickly inducing favorable traits. However, while this is an area of active interest, there are few broadly implemented applications as of now.