Epitranscriptome

Within the field of molecular biology, the epitranscriptome includes all the biochemical modifications of the RNA (the transcriptome) within a cell. In analogy to epigenetics that describes "functionally relevant changes to the genome that do not involve a change in the nucleotide sequence", epitranscriptomics involves all functionally relevant changes to the transcriptome that do not involve a change in the ribonucleotide sequence. Thus, the epitranscriptome can be defined as the ensemble of such functionally relevant changes.

There are several types of RNA modifications that impact gene expression. These modifications happen to many types of cellular RNA including, but not limited to, ribosomal RNA (rRNA), transfer RNA (tRNA), messenger RNA (mRNA), and small nuclear RNA (snRNA). The most common and well-understood mRNA modification at present is N6-Methyladenosine (m6A), which has been observed to occur an average of three times in every mRNA molecule.

Currently, work is focused on determining the types of and location of RNA modifications, determining if these modification have function, and if so, what is their mechanism of action. Similar to the epigenome, the epitranscriptome has "writers" and "erasers" that mark RNA and "readers" that translate those marks into function. One function that has been elucidated involves the enzyme adenosine deaminase (ADAR), which acts on RNA. ADAR affects a series of cellular processes, including alternative splicing, microRNAs, the innate immune system, and leads to protein recoding especially for important receptors in the central nervous system.

N6-Methyladenosine (m6A)
m6A describes the methylation of the nitrogen at position 6 in the adenosine base within mRNA. Discovered in 1974, m6A is the most abundant eukaryotic mRNA modification; most mRNAs contain approximately three m6A residues. However, some mRNA transcripts do not contain any m6A at all, while others may have 10 or more. The term "epitranscriptome" was coined following transcriptome-wide mappings of m6A sites, but does not necessarily exclude other post-transcriptional mRNA modifications. How, and in response to what stimulus, the cell endogen e ously regulates the level of m6A methylation remains unclear at present. However, it is known that the levels of this epitranscriptional mark are dynamically altered during embryonic development. Moreover, environmental stimuli such as stress can also alter the levels of m6A.

The m6A mRNA methylomes of different eukaryotic organisms have two common characteristics. First of all, the mark is usually found in the R[G > A]m6AC[U>A>C>] or RRm6ACH sequence. Secondly, this mark is enriched in specific regions of the transcriptome; it is mostly found close to stop codons, in 3’-UTRs and in long internal exons. Nevertheless, m6A levels vary between different RNAs within a cell and between different cell types of the same organism. The mechanisms controlling the addition of m6A to some types of RNA have been described, but others remain unknown.

Writers, Readers, and Erasers
In eukaryotes, the use of m6A on mRNA involves a methyltransferase complex, commonly termed the 'Writer', that installs the methyl group. This m6A modification is recognized by special proteins known as 'Readers'. The number of readers varies across different organisms. Notably, in vertebrates, the presence of proteins categorized as 'Erasers' is suggested to facilitate the removal of m6A, which enables a dynamic regulation of m6A deposition on mRNAs.

The m6A mark is added by a m6A methyltransferase writer complex post-transcriptionally. This writer complex is composed of METTL3, METTL14, Wilms tumor 1-associated protein (WTAP), KIAA1429 and RBM15. METTL3 is the catalytic subunit, whereas METTL14 is involved in the stability of the complex and RNA recruitment. WTAP is also needed in aiding the recruitment of mRNA, whereas RBM15 and its paralog RBM15B are only involved in the recruitment of lncRNAs. The role RBM15 and RBM15B may have in recruiting other types of RNA to the methyltransferase complex remains unknown. The specific recognition sites of the writers are not known, but the minimal sequence required is 5’-Rm6AC-3’. METTL3 has been proposed to also be a "reader" of the m6A mark. This function is localized in the cytoplasm, where it promotes the recruitment of eIF3. Discovery of the METTL3 complex indicated that m6A installation might be a regulated process, which was pivotal for the advancement and interest in the field of epitranscriptomics.

Members of the YTH domain protein family act as "readers" of m6A. The study of these proteins has been key in understanding the functions and effects of mRNA methylation. It has been shown that three members of the human YTH domain family of proteins have higher binding affinities to methylated mRNA. The YTH protein YTHDF2 affects mRNA by directing methylated mRNA from the translational pool to mRNA decay sites. As a result, presence of m6A on mRNA is correlated with a shorter half-life than unmethylated mRNA.

So far, two "erasers" of the m6A mark have been identified. ALKBH5 is a demethylase found in mammals that removes the methyl group of m6A. The second one is the fat mass and obesity associated protein (FTO), a demethylase that converts m6A back to adenosine. FTO preferentially demethylates the m6A found closer to the mRNA cap. This oxidative process has three steps and two intermediates: N6-hydroxymethyladenosine (hm6A) and N6-formyladenosine (f6A). FTO is most commonly found in nuclear speckles; however, in some species low levels of FTO can also be found in the cytoplasm. Dysfunctional FTO correlates with alterations in body weight and disease, while Alkbh5 knockout mice have impaired fertility. These two facts reflect how important the proper regulation of the m6A modification is for normal body function. Moreover, mutations in FTO can lead to developmental failures, brain atrophy and physiological disorders in adulthood.

Role in the life-cycle of mRNA
mRNA methylation is important throughout the entire life-cycle of the mRNA, starting with the alternative polyadenylation (APA) of some transcripts. m6A sites are often located in the last exon, mostly in the 3’ untranslated region (3'-UTR). The presence of m6A in the 3’-UTR promotes the use of the proximal APA site, resulting in a shorter 3’-UTR. Splicing of the pre-mRNA transcripts may be influenced by m6A, although this effect can vary across different biological systems. Furthermore, nuclear export of mature mRNAs depends on m6A; when the m6A "writers" are inhibited, there is a delay in the export of the mature mRNAs. However, normal nuclear export does not solely depend on m6A, other mRNA marks such as 5'-methylcytosine (m5C) are also involved.

The m6A mark has a notable effect on translational dynamics. There are various ways in which m6A is involved in translational efficiency. For instance, this modification modulates multiple steps in the process of tRNA incorporation. On the one hand, it slows down GTP hydrolysis by EF-Tu by 12-fold and the peptidyl transfer reaction by two-fold. It also causes a 1.5-fold increase in the amount of GTP hydrolyzed per peptidyl transfer, which indicates that a lot of proofreading is required. Moreover, because it is just a modified adenosine base, m6A base-pairs with uridine during decoding. However, the adenosine's methylation hinders tRNA accommodation and translation elongation. When a m6A-modified codon interacts with its cognate tRNA (the tRNA with the anticodon that is complementary to a particular codon), it acts more like a near-cognate codon interaction instead of the cognate codon interaction. This can be seen in the delay in the tRNA accommodation, which is dependent upon both the position of the m6A in the mRNA codons and on how accurate the translation is. Overall, this m6A modification leads to a kinetic loss of a factor of 18. To summarize, translation-elongation dynamics are slower for codons with m6A and different locations of these modified nucleotides in the mRNA codons affect decoding dynamics in different ways.

However, this mark can also increase translational efficiency. The m6A "reader" YTHDF1 induces the association of the modified mRNA with the ribosome. Furthermore, it also recruits the translation initiation factor eIF3 to the mRNA independently of METTL3. Additionally, eIF3 also acts as a "reader" of a m6A located in the 5’-UTR of the mRNA, which results in recruitment of the 40S translational preinitiation complex. This interaction is involved in cap-independent translation, which happens during the cellular response to heat shock stress.

m6A methylation also modulates mRNA stability. The "reader" YTHDF2 binds to m6A-containing mRNAs and decreases their stability by recruiting them to P-bodies, in a process called methylation-dependent mRNA decay. This process is needed to rapidly degrade pluripotency transcription factor transcripts, to enable the commitment of a pluripotent stem cell to a specific cell lineage. Reduced levels of m6A in mice embryos lead to embryonic lethality during the early stages of development.

Role of N6-Methyladenosine (m6A) in alternative splicing


Stem loop structures can sometimes be found in introns. m6A residues located in these stem-loops weaken base-pairing interactions within the stem, thus altering the structure of the mRNA. This phenomenon is known as m6A-Switch. The m6A mark has an important role in alternative splicing, since it increases the accessibility of hnRNPC to its binding site. The heterogeneous nuclear ribonucleoprotein C (hnRNPC) is a RNA-binding protein that complexes with both heterogeneous nuclear RNA (hnRNA) and pre-mRNA to participate in pre-mRNA processing. hnRNPC binds to a uridine-rich region in introns that can usually form stem-loops. The destabilization of the stem-loop exposes the hnRNPC binding site, which increases the accessibility of the protein to the region. Because hnRNPC must be bound to pre-mRNA in order to fulfill its function, increased accessibility means higher activity of hnRNPC. Therefore, m6A residues located in stem-loops of introns enhance the activity of hnRNPC, which results in increased alternative splicing. Evidence supporting this claim identified that decreased m6A levels in the transcriptome lead to significantly reduced hnRNPC binding.

m6A also has additional roles in alternative splicing by acting as the binding site for YTHDC1 (YTHDC1 binds to m6A residues located in alternative exons). YTHDC1 has a double role in alternative splicing. First of all, it recruits the serine and arginine-rich splicing factor 3 (SRSF3), which promotes exon inclusion. In addition, YTHDC1 blocks binding of SRSF10, a protein involved in exon-skipping.

Due to the role of m6A in alternative splicing, pre-mRNAs have higher levels of m6A than mature mRNAs. Moreover, m6A is more abundant in mRNAs that undergo alternative splicing compared to genes that code a single isoform. This is because alternatively spliced mRNAs are enriched in METTL3 binding sites. Splicing is affected in Mettl3 knock-out mice, resulting in increased frequency of exon skipping and intron retention. However, m6A is not a general unspecific splicing factor, it only participates in the alternative splicing of certain mRNAs and lncRNAs.

Other roles of m6A
m6A is not only found on mRNAs, various non-coding RNAs also contain this mark. For instance, XIST, the lncRNA that initiates X-inactivation, is enriched in m6A. These m6A are recognized and bound by the YTH domain protein YTHDC1. XIST mediated silencing of the X chromosome is negatively affected when XIST is not modified with m6A.

RNA molecules containing m6A are involved in UV-induced DNA damage repair mechanisms. When DNA is damaged, poly(A)+ transcripts containing numerous m6A residues accumulate in the region. This facilitates the accessibility of DNA-repairing proteins, such as DNA polymerase K, so that they can fulfil their function.

Disease
Alterations in the pathways leading to the addition of the removal of the m6A mark result in impaired gene expression and cellular function, which can lead to disease.

Normal m6A levels are altered in a number of cancers. Reduced m6A levels due to down regulation of METTL3 and/or METTL14 lead to the activation of a number of oncogenes, such as the gene encoding ADAM metallopeptidase domain 19 (ADAM19). Moreover, loss of m6A also results in the down regulation of tumor suppressors like cyclin-dependent kinase inhibitor 2A (CDKN2A) and breast cancer 2 (BRCA2). On the other hand, increased m6A levels inhibit tumor progression in certain types of cancer. In addition, single nucleotide polymorphisms (SNPs) on the gene encoding FTO have been associated with increased risk of breast and pancreatic cancer. Altered m6A levels also contribute to hypoxia-induced enrichment of breast cancer stem cells phenotype. All things considered, "writers" and "erasers" of the m6A mark may be good potential drug targets in cancer therapy.

Metabolic disorders are also affected by the m6A mark due to the role of FTO. Overexpression of FTO results in increased body and fat mass, whereas loss of FTO leads to a reduction in lean body mass. However, the mechanisms by which changes in FTO expression affect body and fat mass are not understood.

Current research of the m6a epitranscriptome is continuing to uncover the implications of m6a and its post-physiological effects on ischemic stroke incidents. Microglial-mediated responses and contributing demethylases, including FTO and ALKBH5, appear to be a contributing factor for alterations of the cerebral m6a epitranscriptome. Mood disorders, such as major depressive disorder, have also been identified as disease processes associated with m6a epitranscriptome changes.

N1-methyladenosine (m1A)
N1-methyladenosine is a modified nucleoside in which a methyl group is added to N1 of the adenosine base. This modification introduces a positive charge on the nitrogen atom to which the methyl group is added, because the modified nitrogen donates its lone pair to the carbon atom of the methyl group in order to form a bond. N1-methyladenosine modification is thought to regulate tRNA and rRNA stability, as well as potentially alter protein-RNA interactions or RNA secondary structures. This modification results in the melting of double-stranded RNA, due to alterations in the RNA structure. The N1-methyladenosine modification is less common than the m6A modification, with modified transcripts usually only containing a single m1A modification, whereas they may contain several m6A residues.

Studies of these modifications have been slow to advance due to a lack of sound methodology to locate and identify them. A few methods, such as MeRIP-seq and m1A-ID-seq, have been developed, but the particular adenosine that is modified still cannot be identified. A computational tool based on the data generated from these methods called RAMPed has been developed to try to identify these particular modifications.

Disease

Modification of m1a is of interest regarding its considerable correlation with cancer biology and tumorigenesis. Involvement of m1a may be organized under the categories of proliferation, invasion, cell death, tumor microenvironment, or cancer metabolism. Cancer cell proliferation has been found to be promoted by specific m1a “writers”. For example, the regulator TRMT6 has been found to be overexpressed in individuals with glioma, a cancer marked by the inappropriate proliferation of glial cells of the brain or spinal cord. Additionally, regulation of ALKBH3 has been found to support and bolster cancer cell invasiveness in certain breast and ovarian cancers.

5-methylcytosine (m5C)
5-methylcytosine, commonly abbreviated as "m5C", is a chemical modification first identified in tRNA. Since its initial identification, 5-methylcytosine has been found in a variety of different cellular structures ranging from a variety of RNAs and even DNA. Two different kinds of RNA m5C "writers" have been identified: NOP2/SUN RNA methyltransferase (NSUN) and DNA methyltransferase-2. It is important to note that DNMT-2 is a protein that falls under the DNMT family, which contains three other DNMTs (1, 3a, and 3b) known to demonstrate methylation activity in relation to the genome. Uniquely, DNMT-2 is the only DNMT that has been confirmed to methylate both DNA and RNA, although its overall DNA methylation function is significantly less than that of its counterparts. While these writers have been identified, as of now, there are no known m5C "erasers"; in a broader sense, this means that reamination, or the conversion of 5-methylcytosine back into cytosine, has not been observed in RNA. 5-methylcytosine modifications are typically found approximately 100 nucleotides downstream of translation initiation sites. This may provide some insight into the purpose of these modifications; for instance, this may indicate that these modifications are important for controlling the fate of the RNA, such as whether it will be translated or not in the case of mRNA. However, the exact purpose of the methylation at specific cytosines in RNA is currently unknown. One possibility may be that m5C may be associated with RNA transport, since the Aly/REF export factor is a known m5C binding protein. On the other hand, m5C modifications could possibly be associated with the regulation of genes involved in energy and lipid metabolism, through modulation of the overall RNA translational fate.

Adenosine-to-Inosine
Adenosine-to-Inosine (A-to-I) modifications were described well before the conception of epitranscriptomics. These modifications are very common in tissues and cells of the nervous system, and malfunctions in this deamination can result in a variety of different human diseases. A-to-I deamination has been shown to cause changes in the overall RNA structure or cause changes to the protein-coding mRNAs, although changes in codons and the amino acid they code for are not commonly seen. A-to-I RNA editing is described in more detail on the RNA editing page.

Queuosine
Queuine (Q) is a modified nucleotide at position 34 in tRNA (queuosine is the name of the nucleoside, while queuine is the name of the nucleotide). Nucleotide modifications in tRNA are not uncommon, as tRNA is one of the most heavily modified types of RNA, and nearly 80 types of modified nucleotides have been identified. Queuosine is a very heavily modified version of guanosine (G). Modifications in tRNA have the well-known ability to control and modulate gene expression. The regulation of gene expression typically comes from some structural changes to the stem-loop structure of the tRNA. The editing that tRNA undergoes may have developed as a response to rare codons, and tRNA counteracts frameshifts by utilizing the modified bases. Other similar modifications to nucleotides impact the ability of tRNA to initiate translation, thus impeding gene expression.

This modification is particularly widespread and found amongst a variety of organisms, indicating that perhaps convergent evolution took place in the development of this nucleoside. Eukaryotic cells cannot synthesize queuosine, so they must rely on prokaryotes of the microbiome to produce and increase the availability of it within the body. Depleted levels of Q34 (queuine at position 34) are associated with the development of tumors.

2′-O-methylation
2'-O-methylation refers to the methylation of the 2' hydroxyl group of the ribose within an RNA nucleotide. 2'-O-methylation is found in the five-prime cap of mRNAs in higher eukaryotes. It is involved in differentiating between self and non-self mRNA. Without the 2′-O-methylation mark the immune system triggers higher levels of type 1 interferon activity. While this modification is not currently known to be a response to any particular phenomenon, not everything is fully understood about the mechanisms of this modification due to the difficulty of studying small RNA molecules. However, the effect on RNA stability this modification has could be regulated to modulate transcript levels.

Pseudouridylation
Pseudouridine (Ψ, 5-ribosyluracil) is the most abundant RNA modification; in fact, at one time it was considered the "fifth nucleotide". This isomer of uridine is found in various types of RNA, such as snRNA, tRNA, small nucleolar RNA (snoRNA) and many others. Pseudouridine increases the stability of the modified RNA by making the sugar-phosphate backbone more rigid and by facilitating base stacking interactions (pseudouridine contains an extra hydrogen bond donor). When it comes to Watson-Crick base pair interactions, the pseudouridine-adenosine base pair is more stable than the uridine-adenosine base pair; therefore, pseudouridine increases stability. Apart from increasing RNA stability, this modification is also involved in regulation of translation. All eukaryotic stop codons contain one uridine (UAA, UGA and UGA); conversion of this uridine to pseudouridine results in suppression of translational termination and generation of unexpected sense codons. The artificial process of pseudouridylation has an effect on the function of mRNA: it changes the genetic code by making non-canonical base pairing possible in the ribosome decoding center.

Pseudouridylation reactions are catalyzed by enzymes that contain the pseudouridine synthase domain; 13 such enzymes have been identified in humans, which are called pseudouridine syntheses (PUS). These enzymes can be either RNA-dependent or RNA-independent depending on whether a small RNA is needed to guide the enzyme to its target or not. Additionally, different PUS enzymes work in different cell compartments. For instance, PUS4 (also known as TruB pseudouridine synthase family member 1, TRUB1) and PUS7, which are responsible for most of the mRNA pseudouridylation, are located in the nucleus or the cytoplasm. On the other hand, several PUS enzymes, such as PUS1 and TRUB2 are located in the mitochondria, modifying a number of mitochondrial mRNAs (mt-mRNAs). In tRNA, PUS1 and PUS7 modify the second uridine in the UGUAR consensus sequence, as long as this sequence is located in a very structured region of the tRNA.

To date, no pseudouridine erasers or readers have been identified. It is thought that pseudouridylation is most probably an irreversible process.

Pseudouridine is most commonly found in tRNAs, with almost all tRNA molecules having at least one pseudouridine. Therefore, because the addition of pseudouridine happens during the normal processing of tRNA, it is not considered an epitranscriptomic mark. However, pseudouridine acts as an epigenetic mark in mRNAs and ncRNAs of the brain, since pseudouridylation in these two RNAs responds dynamically to stress and differentiation in the cell, giving reason to believe that pseudouridylation may act as an important regulatory mechanism for RNA function. Pseudouridylation in mRNA can be conserved, tissue-specific or inducible, which reflects plasticity and regulatory function. Furthermore, expression of TRBU1, which is mostly expressed in the brain, goes up due to fear conditioning. In addition, expression of the ncRNAs needed to guide RNA-dependent PUS enzymes also goes up in response to fear.

Pseudouridine detection and sequencing methods
There are three major techniques for the site-specific mapping of pseudouridine in RNA, called Pseudo-seq, Ψ-seq and PSI-seq. All these methods are based on the unique reaction between pseudouridine and N-cyclohexyl-N'-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMCT). The RNA to be analyzed is fragmented and incubated with CMCT. Even if CMCT can form covalent bonds with U, G and Ψ residues, only Ψ-CMC is resistant to alkaline hydrolysis (U-CMC and G-CMC get hydrolyzed). Next, reverse transcription is done to obtain a cDNA library, with the cDNAs terminating one nucleotide downstream the pseudouridine residue. Next generation sequencing of the cDNA library will indicate where the modified pseudouridine residue is located in the RNA. In order to do this, two cDNA libraries are prepared, one in which the RNA has undergone CMC treatment and the other one without CMC treatment. Differences in the length of the reads between the two libraries will indicate where the Ψ residues are. Another method is called CeU-Seq, which uses a biotinylated derivative of CMCT. This enables the purification and enrichment of biotinylated transcripts (transcripts modified with pseudouridine) with streptavidin columns, therefore reducing the library size and increasing sensitivity.

Other pseudouridine detection methods include site-specific cleavage and radioactive-labeling followed by ligation-assisted extraction and thin-layer chromatography (SCARLET) and mass spectrometry.

Ribosomal RNA (rRNA)
Ribosomal RNA, or rRNA, forms the nucleic acid component of ribosomes. rRNA modifications take place in and around the peptidyl transferasecenter, the active site of the ribosome. Some modifications include pseudouridines, 2′-O-methylations on backbone sugars, and methylated bases. It is not well known what the biological effects of these modifications are on the rRNA molecule, but one hypothesis is that they help stabilize the structure and enhance the function of the ribosome, especially during ribosome formation. Moreover, these modifications may alter the chemical properties of the rRNA such that the correct tertiary structure is favored. 2'-O-methylation prevents backbone hydrolysis; other noted modifications also seem to help with stabilizing rRNA secondary structures and preventing damage to rRNA strands. 2'-O-methylation also helps to increase base stacking forces, stabilizing the secondary and tertiary structure of rRNA even further. Collectively, these modifications in rRNA are indispensable to ribosomal function.

Transfer RNA (tRNA)
Transfer RNAs, which are RNAs that participate in translation, contain the greatest number of modifications of any type of RNA, with up to one-fourth of the nucleosides in these molecules containing some sort of modification in eukaryotes. There are several known reasons for the wide variety of modifications found in tRNA. First of all, such modifications allow for easier differentiation between different tRNA molecules, such as separating the initiator tRNAMet from elongator tRNAMet.. Moreover, they increase overall tRNA stability. Some studies have shown that the modifications of tRNA can be dynamic and adaptive to the changes of the environment. Examples include methylation of cytosine groups by tRNA methyltransferase (Trm4) in response to the depletion of nutrients in the body. The tRNA's cruciform structure is incredibly important to its overall function and such a complicated structure is maintained by post-transcriptional modifications. A primary example of this is the methylation of guanosine at junctions within the tRNA structure. These methylguanosine impact the overall tertiary structure by disrupting any potential canonical hydrogen bonding (hydrogen bonds that are conventional Watson-Crick base pairs), thus creating a loop at the core of the tRNA. Other modifications are integral for creating and maintaining the extreme bends in the structure.

Messenger RNA (mRNA)
Messenger RNA is the bridge between the genetic code and the resulting proteins, as it is what carries the necessary information that gets translated into proteins. Modifications to the actual, physical genetic code are likely to be deleterious; therefore, minor modifications, such as methylation, done to mRNA are preferable (nevertheless, modifications are still seen throughout the genome). The four major types of modifications done to mRNA are N7-methylguanine (at the 5′ cap), N6-methyladenosine, 5-methylcytosine, and 2′-O-methylation. The modification seen at the 5' cap perfectly demonstrates how modifications to mRNA can impact its function, as the 5' cap is necessary to initiate translation. Therefore, modifications, such as N7-methylguanine during RNA processing, to the 5' cap may effect the ability of the ribosome to initiate translation. It is important to note that not all modifications happening to the mRNA are epigenetic, some, like the N7-methylguanosine cap, are RNA editing.

mRNA molecules demonstrate something known as "modification stoichiometry". Modification stoichiometry is when only a portion of transcripts have a specific modification at a particular modification site. Typically, under normal cell conditions, the modification stoichiometry is very low, there are a very few number of transcripts that have specific modifications. However, as cell conditions change, the fraction of modified transcripts can change as well. As with other types of RNA, modifications impact the overall structure of the mRNA. Altering its structure may cause the mRNA to take different paths. For example, a normal transcript might be fated to be translated; however, the introduction of a modified base can disrupt its structure and send it down a different path, and that particular transcript may now be targeted for degradation.

Short non-coding RNA (sncRNA) modifications
Modifications can also happen in short non-coding RNAs, including small nuclear RNA (snRNA) and microRNA (miRNA). However, these modifications are less common than those in mRNA, tRNA, and rRNA.

Short nuclear RNA (snRNA)
Some trans-spliced snRNAs have been observed to have a N2,N2,7-trimethylguanosine cap. This particular modification to the guanosine cap is rare in snRNAs. Trans-splicing is a phenomenon in which exons from two different primary RNA transcripts are ligated together. These rare variants have been seen during development in C.elegans and are associated with polysomes. How this modification is regulated in certain cell types and the exact function of this modification remains largely unknown, although it has been speculated that this modification may help define a special subset of trimethylguanosine-regulated RNAs.

MicroRNA (miRNA)
Some miRNAs in plants have been seen to contain 2'-O-methylation, a modification to the ribose sugar that is added by the methyltransferase HEN1. This modification is thought to protect the miRNA against polyuridylation, which would result in its subsequent degradation.

In addition, pri-miRNAs have been shown to contain m6A. This reversible modification may affect their cellular localization and function during miRNA processing.

Long non-coding RNA (lncRNA)
The family of long non-coding RNAs includes a variety of different kinds of RNA, including, but not limited to, circular RNA (circRNA), nuclear lncRNA, long intergenic non-coding RNA, and enhancer RNA. The development of next-generation sequencing has made the study of lncRNA more accessible (because lncRNA is not very common in the cell relative to other types of RNA).

Editing and modifications to lncRNA have demonstrated to result in changes in RNA expression and rate of mutation. 5-methylcytosine (m5C), N6-methyladenosine (m6A), and pseudouridine are the three most common and most studied modifications occurring in lncRNA. Modifications to the nucleotide structure are likely to impact the structure of lncRNAs and modulate their overall function. The study of the reversibility of these modifications is an active area of research. These modifications impact a variety of different qualities including the lncRNA's function and the initiation of translation. Modifications to lncRNAs have been demonstrated to impact where they localize within the cell and while complicated structures, such as the crucifix of tRNA, are not typically found in lncRNA, modifications may alter their structure and impact the overall function and pathway the lncRNA takes.

Viral epitranscriptomics
Viral epitranscriptomics is the field that studies RNA modifications in viral transcripts that do not affect the sequence of the transcript but that are functionally relevant. So far, the studies have been focused on viral transcripts of mammalian viruses. Mammalian viral transcripts must function in a mammalian cell, so they must acquire the same epigenetic marks as the host cell. For this, viruses make use of the numerous mRNA modifying enzymes found in the host cells.

m6A in viral transcripts
The most widely described RNA modification in mammalian viruses is m6A, which was first identified in Influenza virus mRNAs, in 1976. The epitranscriptomic analysis of viral transcripts has revealed that m6A levels in viral and cellular transcripts are similar. Nevertheless, in some viruses such as adenovirus-2, m6A levels are higher in viral mRNAs. As with cellular RNAs, m6A is predominantly added in the nucleus by METTL3, with the assistance of several cofactors such as METTL14, WTAP, KIAA1429 and RBM15/RMB15B. A recent study demonstrates the presence of m6A in the small T antigen of Merkel cell polyomavirus (MCPyV) in Merkel cell carcinoma, a fatal skin cancer.

Studies of the viral m6A mark have mostly been conducted with HIV. Despite the high mutagenic rate of this virus, m6A sites have been evolutionarily conserved. This is due to the fact that m6A is involved in regulating multiple stages in the HIV life-cycle. In addition to the normal functions m6A has in pre-mRNA splicing, nuclear export, mRNA stability and translation; this mark also inhibits the recognition of viral transcripts by Toll-like receptors and RIG-1 receptors. As a result, m6A positively influences viral replication. On the other hand, HIV also regulates the addition of the m6A mark in a number of cellular mRNAs. For instance, 56 cellular transcripts that only contain m6A during HIV infection have been identified. The effect this mark has on cellular transcripts during the course of the viral infection remains unknown.

Even if m6A-marked viral transcripts are involved in regulating gene expression of a number of different viruses, the mechanisms by which this happens have not been identified. To date, three possible models have been proposed.

Although METTL3 and METTL14 are mostly localized in the nucleus, they can also be found in the cytoplasm, where they methylate the genomes and transcripts of cytoplasmic RNA viruses. As opposed to nuclear viruses, loss of m6A on hepatitis C virus (HCV, a cytoplasmic RNA virus) increases the production of infectious HCV virions, which indicates that in this particular virus the m6A mark has a negative effect on virus production. Nevertheless, in other cytoplasmic RNA viruses such as dengue virus and yellow fever virus, m6A sites have been selected for during evolution, suggesting that the m6A mark is beneficial for these viruses.

Since m6A enhances viral replication, m6A can be used as a target for antiviral therapy. The major challenge is to target this mark in viral transcripts without causing major effects to the host cells, as normally occurring cellular m6A marks will also be depleted. The S-adenosylhomocysteine (SAC) hydrolase inhibitor 3-dezaadenosine (DAA) can be used as an antiviral drug, because it inhibits the addition of m6A. However, it is yet to be determined whether this drug has any off-target effects.

Other viral transcript modifications
m6A is not the only RNA modification that can be found in viral RNAs. For instance, N6,2-O-dimethyladenosine (m6Am) can be found in influenza and herpes simplex virus type 1, even though the effect this mark has on the life cycle of these viruses remains unknown. NAT10 mediated acetylation of cytidines on HIV-1 RNA was recently reported. Another modification commonly found in coronaviruses, flaviviruses and poxviruses (all of them are cytoplasmic viruses) is the 2'-O-methylation of ribose moieties. The addition of this mark is catalyzed by a viral methyltransferase. 2'-O-methylation binds to and inhibits Toll-like receptor 7 (TLR-7), which is involved in activating the production of inflammatory cytokines. Moreover, this modification enables viral RNAs to evade the antiviral actions of the IFIT proteins, a family of interferon-induced proteins that limit viral replication.

MODOMICS
MODOMICS is a comprehensive database that contains information about RNA modifications. MODOMICS provides the following information: the chemical structure of the modified RNAs, the RNA modifying pathways, the location of the modifications in the RNA sequences, the enzymes responsible for the modifications and liquid chromatography/mass spectrometry(LC/MS) data of the modified RNAs. As of November 2017, the database contained 163 different RNA modifications, as well as 340 different enzymes and cofactors involved in the modifications. This database classifies RNA modifying pathways according to their starting point. The LC/MS data has been very useful in determining the specific mass of the modified RNAs, which facilitates the identification of the modification.

RMBase,ENCORE
The ENCyclOpedia of Rna Epitranscriptome (ENCORE) is an upgraded version of RMBase that a comprehensive epitranscriptome platform with tens of new software and tools, to decode the distribution pattern, metagene profile, biogenesis mechanisms, regulatory functions, interactome, evolutional conservation and novel reader proteins of more than 70 different types of RNA modifications by analyzing thousands of high-throughput sequencing data.