RNA editing

RNA editing (also RNA modification) is a molecular process through which some cells can make discrete changes to specific nucleotide sequences within an RNA molecule after it has been generated by RNA polymerase. It occurs in all living organisms and is one of the most evolutionarily conserved properties of RNAs. RNA editing may include the insertion, deletion, and base substitution of nucleotides within the RNA molecule. RNA editing is relatively rare, with common forms of RNA processing (e.g. splicing, 5'-capping, and 3'-polyadenylation) not usually considered as editing. It can affect the activity, localization as well as stability of RNAs, and has been linked with human diseases.

RNA editing has been observed in some tRNA, rRNA, mRNA, or miRNA molecules of eukaryotes and their viruses, archaea, and prokaryotes. RNA editing occurs in the cell nucleus, as well as within mitochondria and plastids. In vertebrates, editing is rare and usually consists of a small number of changes to the sequence of the affected molecules. In other organisms, such as squids, extensive editing (pan-editing) can occur; in some cases the majority of nucleotides in an mRNA sequence may result from editing. More than 160 types of RNA modifications have been described so far.

RNA-editing processes show great molecular diversity, and some appear to be evolutionarily recent acquisitions that arose independently. The diversity of RNA editing phenomena includes nucleobase modifications such as cytidine (C) to uridine (U) and adenosine (A) to inosine (I) deaminations, as well as non-template nucleotide additions and insertions. RNA editing in mRNAs effectively alters the amino acid sequence of the encoded protein so that it differs from that predicted by the genomic DNA sequence.

Next generation sequencing
To identify diverse post-transcriptional modifications of RNA molecules and determine the transcriptome-wide landscape of RNA modifications by means of next generation RNA sequencing, recently many studies have developed conventional or specialised sequencing methods. Examples of specialised methods are MeRIP-seq, m6A-seq, PA-m5C-seq , methylation-iCLIP, m6A-CLIP, Pseudo-seq, Ψ-seq, CeU-seq, Aza-IP and RiboMeth-seq ). Many of these methods are based on specific capture of the RNA species containing the specific modification, for example through antibody binding coupled with sequencing of the captured reads. After the sequencing these reads are mapped against the whole transcriptome to see where they originate from. Generally with this kind of approach it is possible to see the location of the modifications together with possible identification of some consensus sequences that might help identification and mapping further on. One example of the specialize methods is PA-m5C-seq. This method was further developed from PA-m6A-seq method to identify m5C modifications on mRNA instead of the original target N6-methyladenosine. The easy switch between different modifications as target is made possible with a simple change of the capturing antibody form m6A specific to m5C specific. Application of these methods have identified various modifications (e.g. pseudouridine, m6A, m5C, 2′-O-Me) within coding genes and non-coding genes (e.g. tRNA, lncRNAs, microRNAs) at single nucleotide or very high resolution.

Mass Spectrometry
Mass spectrometry is a way to quantify RNA modifications. More often than not, modifications cause an increase in mass for a given nucleoside. This gives a characteristic readout for the nucleoside and the modified counterpart. Moreover, mass spectrometry allows the investigation of modification dynamics by labelling RNA molecules with stable (non-radioactive) heavy isotopes in vivo. Due to the defined mass increase of heavy isotope labeled nucleosides they can be distinguished from their respective unlabelled isotopomeres by mass spectrometry. This method, called NAIL-MS (nucleic acid isotope labelling coupled mass spectrometry), enables a variety of approaches to investigate RNA modification dynamics.

Messenger RNA modification
Recently, functional experiments have revealed many novel functional roles of RNA modifications. Most of the RNA modifications are found on transfer-RNA and ribosomal-RNA, but also eukaryotic mRNA has been shown to be modified with multiple different modifications. 17 naturally occurring modifications on mRNA have been identified, from which the N6-methyladenosine is the most abundant and studied. mRNA modifications are linked to many functions in the cell. They ensure the correct maturation and function of the mRNA, but also at the same time act as part of cell's immune system. Certain modifications like 2’O-methylated nucleotides has been associated with cells ability to distinguish own mRNA from foreign RNA. For example, m6A has been predicted to affect protein translation and localization,  mRNA stability, alternative polyA choice and stem cell pluripotency. Pseudouridylation of nonsense codons suppresses translation termination both in vitro and in vivo, suggesting that RNA modification may provide a new way to expand the genetic code. 5-methylcytosine on the other hand has been associated with mRNA transport from the nucleus to the cytoplasm and enhancement of translation. These functions of m5C are not fully known and proven but one strong argument towards these functions in the cell is the observed localization of m5C to translation initiation site. Importantly, many modification enzymes are dysregulated and genetically mutated in many disease types. For example, genetic mutations in pseudouridine synthases cause mitochondrial myopathy, sideroblastic anemia (MLASA) and dyskeratosis congenital.

Compared to the modifications identified from other RNA species like tRNA and rRNA, the amount of identified modifications on mRNA is very small. One of the biggest reasons why mRNA modifications are not so well known is missing research techniques. In addition to the lack of identified modifications, the knowledge of associated proteins is also behind other RNA species. Modifications are results of specific enzyme interactions with the RNA molecule. Considering mRNA modifications most of the known related enzymes are the writer enzymes that add the modification on the mRNA. The additional groups of enzymes readers and erasers are for most of the modifications either poorly known of not known at all. For these reasons there has been during the past decade huge interest in studying these modifications and their function.

Transfer RNA modifications
Transfer RNA or tRNA is the most abundantly modified type of RNA. Modifications in tRNA play crucial roles in maintaining translation efficiency through supporting structure, anticodon-codon interactions, and interactions with enzymes.

Anticodon modifications are important for proper decoding of mRNA. Since the genetic code is degenerate, anticodon modifications are necessary to properly decode mRNA. Particularly, the wobble position of the anticodon determines how the codons are read. For example, in eukaryotes an adenosine at position 34 of the anticodon can be converted to inosine. Inosine is a modification that is able to base-pair with cytosine, adenine, and uridine.

Another commonly modified base in tRNA is the position adjacent to the anticodon. Position 37 is often hypermodified with bulky chemical modifications. These modifications prevent frameshifting and increase anticodon-codon binding stability through stacking interactions.

Ribosomal RNA modification
Ribosomal RNA (rRNA) is essential to the makeup of ribosomes and peptide transfer during translation processes. Ribosomal RNA modifications are made throughout ribosome synthesis, and often occur during and/or after translation. Modifications primarily play a role in the structure of the rRNA in order to protect translational efficiency. Chemical modification in rRNA consists of methylation of ribose sugars, isomerization of uridines, and methylation and acetylation of individual bases.

Methylation
Methylation of rRNA upholds structural rigidity by blocking base pair stacking and surrounds the 2’-OH group to block hydrolysis. It occurs at specific parts of eukaryotic rRNA. The template for methylation consists of 10-21 nucleotides. 2'-O-methylation of the ribose sugar is one of the most common rRNA modifications. Methylation is primarily introduced by small nucleolar RNA's, referred to as snoRNPs. There are two classes of snoRNPs that target methylation sites, and they are referred to box C/D and box H/ACA. One type of methylation, 2′-O-methylation, contributes to helical stabilization.

Isomerization
The isomerization of uridine to pseudouridine is the second most common rRNA modification. These pseudouridines are also introduced by the same classes of snoRNPs that participate in methylation. Pseudouridine synthases are the major participating enzymes in the reaction. The H/ACA box snoRNPs introduce guide sequences that are about 14-15 nucleotides long. Pseudouridylation is triggered in numerous places of rRNAs at once to preserve the thermal stability of RNA. Pseudouridine allows for increased hydrogen bonding and alters translation in rRNA and tRNA. It alters translation by increasing the affinity of the ribosome subunit to specific mRNAs.

Base Editing:

Base editing is the third major class of rRNA modification, specifically in eukaryotes. There are 8 categories of base edits that can occur at the gap between the small and large ribosomal subunits. RNA methyltransferases are the enzymes that introduce base methylation. Acetyltransferases are the enzymes responsible for acetylation of cytosine in rRNA. Base methylation plays a role in translation. These base modifications all work in conjunction with the two other main classes of modification to contribute to RNA structural stability. An example of this occurs in N7-methylation, which increases the nucleotide's charge to increase ionic interactions of proteins attaching to the RNA before translation.

Editing by insertion or deletion
RNA editing through the addition and deletion of uracil has been found in kinetoplasts from the mitochondria of Trypanosoma brucei. Because this may involve a large fraction of the sites in a gene, it is sometimes called "pan-editing" to distinguish it from topical editing of one or a few sites.

Pan-editing starts with the base-pairing of the unedited primary transcript with a guide RNA (gRNA), which contains complementary sequences to the regions around the insertion/deletion points. The newly formed double-stranded region is then enveloped by an editosome, a large multi-protein complex that catalyzes the editing. The editosome opens the transcript at the first mismatched nucleotide and starts inserting uridines. The inserted uridines will base-pair with the guide RNA, and insertion will continue as long as A or G is present in the guide RNA and will stop when a C or U is encountered. The inserted nucleotides cause a frameshift, and result in a translated protein that differs from its gene.

The mechanism of the editosome involves an endonucleolytic cut at the mismatch point between the guide RNA and the unedited transcript. The next step is catalyzed by one of the enzymes in the complex, a terminal U-transferase, which adds Us from UTP at the 3' end of the mRNA. The opened ends are held in place by other proteins in the complex. Another enzyme, a U-specific exoribonuclease, removes the unpaired Us. After editing has made mRNA complementary to gRNA, an RNA ligase rejoins the ends of the edited mRNA transcript. As a consequence, the editosome can edit only in a 3' to 5' direction along the primary RNA transcript. The complex can act on only a single guide RNA at a time. Therefore, a RNA transcript requiring extensive editing will need more than one guide RNA and editosome complex.

C-to-U editing
The editing involves cytidine deaminase that deaminates a cytidine base into a uridine base. An example of C-to-U editing is with the apolipoprotein B gene in humans. Apo B100 is expressed in the liver and apo B48 is expressed in the intestines. In the intestines, the mRNA has a CAA sequence edited to be UAA, a stop codon, thus producing the shorter B48 form. C-to-U editing often occurs in the mitochondrial RNA of flowering plants. Different plants have different degrees of C-to-U editing; for example, eight editing events occur in mitochondria of the moss Funaria hygrometrica, whereas over 1,700 editing events occur in the lycophytes Isoetes engelmanii. C-to-U editing is performed by members of the pentatricopeptide repeat (PPR) protein family. Angiosperms have large PPR families, acting as trans -factors for cis -elements lacking a consensus sequence; Arabidopsis has around 450 members in its PPR family. There have been a number of discoveries of PPR proteins in both plastids and mitochondria.

A-to-I editing
Adenosine-to-inosine (A-to-I) modifications contribute to nearly 90% of all editing events in RNA. The deamination of adenosine is catalyzed by the double-stranded RNA-specific adenosine deaminase (ADAR), which typically acts on pre-mRNAs. The deamination of adenosine to inosine disrupts and destabilizes the dsRNA base pairing, therefore rendering that particular dsRNA less able to produce siRNA, which interferes with the RNAi pathway.

The wobble base pairing causes deaminated RNA to have a unique but different structure, which may be related to the inhibition of the initiation step of RNA translation. Studies have shown that I-RNA (RNA with many repeats of the I-U base pair) recruits methylases that are involved in the formation of heterochromatin and that this chemical modification heavily interferes with miRNA target sites. There is active research into the importance of A-to-I modifications and their purpose in the novel concept of epitranscriptomics, in which modifications are made to RNA that alter their function. A long established consequence of A-to-I in mRNA is the interpretation of I as a G, therefore leading to functional A-to-G substitution, e.g. in the interpretation of the genetic code by ribosomes. Newer studies, however, have weakened this correlation by showing that inosines can also be decoded by the ribosome (although in a lesser extent) as adenosines or uracils. Furthermore, it was shown that I's lead to the stalling of ribosomes on the I-rich mRNA.

The development of high-throughput sequencing in recent years has allowed for the development of extensive databases for different modifications and edits of RNA. RADAR (Rigorously Annotated Database of A-to-I RNA editing) was developed in 2013 to catalog the vast variety of A-to-I sites and tissue-specific levels present in humans, mice, and flies. The addition of novel sites and overall edits to the database are ongoing. The level of editing for specific editing sites, e.g. in the filamin A transcript, is tissue-specific. The efficiency of mRNA-splicing is a major factor controlling the level of A-to-I RNA editing. Interestingly, ADAR1 and ADAR2 also affect alternative splicing via both A-to-I editing ability and dsRNA binding ability.

Alternative mRNA editing
Alternative U-to-C mRNA editing was first reported in WT1 (Wilms Tumor-1) transcripts, and non-classic G-A mRNA changes were first observed in HNRNPK (heterogeneous nuclear ribonucleoprotein K) transcripts in both malignant and normal colorectal samples. The latter changes were also later seen alongside non-classic U-to-C alterations in brain cell TPH2 (tryptophan hydroxylase 2) transcripts. Although the reverse amination might be the simplest explanation for U-to-C changes, transamination and transglycosylation mechanisms have been proposed for plant U-to-C editing events in mitochondrial transcripts. A recent study reported novel G-to-A mRNA changes in WT1 transcripts at two hotspots, proposing the APOBEC3A (apolipoprotein B mRNA editing enzyme, catalytic polypeptide 3A) as the enzyme implicated in this class of alternative mRNA editing. It was also shown that alternative mRNA changes were associated with canonical WT1 splicing variants, indicating their functional significance.

RNA editing in plant mitochondria and plastids
It has been shown in previous studies that the only types of RNA editing seen in the plants' mitochondria and plastids are conversion of C-to-U and U-to-C (very rare). RNA-editing sites are found mainly in the coding regions of mRNA, introns, and other non-translated regions. In fact, RNA editing can restore the functionality of tRNA molecules. The editing sites are found primarily upstream of mitochondrial or plastid RNAs. While the specific positions for C to U RNA editing events have been fairly well studied in both the mitochondrion and plastid, the identity and organization of all proteins comprising the editosome have yet to be established. Members of the expansive PPR protein family have been shown to function as trans-acting factors for RNA sequence recognition. Specific members of the MORF (Multiple Organellar RNA editing Factor) family are also required for proper editing at several sites. As some of these MORF proteins have been shown to interact with members of the PPR family, it is possible MORF proteins are components of the editosome complex. An enzyme responsible for the trans- or deamination of the RNA transcript remains elusive, though it has been proposed that the PPR proteins may serve this function as well.

RNA editing is essential for the normal functioning of the plant's translation and respiration activity. Editing can restore the essential base-pairing sequences of tRNAs, restoring functionality. It has also been linked to the production of RNA-edited proteins that are incorporated into the polypeptide complexes of the respiration pathway. Therefore, it is highly probable that polypeptides synthesized from unedited RNAs would not function properly and hinder the activity of both mitochondria and plastids.

C-to-U RNA editing can create start and stop codons, but it cannot destroy existing start and stop codons. A cryptic start codon is created when the codon ACG is edited to be AUG.



RNA editing in viruses
Viruses (i.e., measles, mumps, or parainfluenza), especially viruses that have an RNA genome, have been shown to have evolved to utilize RNA modifications in many ways when taking over the host cell. Viruses are known to utilize the RNA modifications in different parts of their infection cycle from immune evasion to protein translation enhancement. RNA editing is used for stability and generation of protein variants. Viral RNAs are transcribed by a virus-encoded RNA-dependent RNA polymerase, which is prone to pausing and "stuttering" at certain nucleotide combinations. In addition, up to several hundred non-templated A's are added by the polymerase at the 3' end of nascent mRNA. These As help stabilize the mRNA. Furthermore, the pausing and stuttering of the RNA polymerase allows the incorporation of one or two Gs or As upstream of the translational codon. The addition of the non-templated nucleotides shifts the reading frame, which generates a different protein.

Additionally, the RNA modifications are shown to have both positive and negative effects on the replication and translation efficiency depending on the virus. For example, Courtney et al. showed that an RNA modification called 5-methylcytosine is added to the viral mRNA in infected host cells in order to enhance the protein translation of HIV-1 virus. The inhibition of the m5C modification on viral mRNA results in significant reduction in viral protein translation, but interestingly it has no effect on the expression of viral mRNAs in the cell. On the other hand, Lichinchi et al. showed that the N6-methyladenosine modification on ZIKV mRNA inhibits the viral replication.

Origin and Evolution of RNA editing
The RNA-editing system seen in the animal may have evolved from mononucleotide deaminases, which have led to larger gene families that include the apobec-1 and adar genes. These genes share close identity with the bacterial deaminases involved in nucleotide metabolism. The adenosine deaminase of E. coli cannot deaminate a nucleoside in the RNA; the enzyme's reaction pocket is too small for the RNA strand to bind to. However, this active site is widened by amino acid changes in the corresponding human analog genes, APOBEC1 and ADAR, allowing deamination. The gRNA-mediated pan-editing in trypanosome mitochondria, involving templated insertion of U residues, is an entirely different biochemical reaction. The enzymes involved have been shown in other studies to be recruited and adapted from different sources. But the specificity of nucleotide insertion via the interaction between the gRNA and mRNA is similar to the tRNA editing processes in the animal and Acanthamoeba mitochondria. Eukaryotic ribose methylation of rRNAs by guide RNA molecules is a similar form of modification.

Thus, RNA editing evolved more than once. Several adaptive rationales for editing have been suggested. Editing is often described as a mechanism of correction or repair to compensate for defects in gene sequences. However, in the case of gRNA-mediated editing, this explanation does not seem possible because if a defect happens first, there is no way to generate an error-free gRNA-encoding region, which presumably arises by duplication of the original gene region. A more plausible alternative for the evolutionary origins of this system is through constructive neutral evolution, where the order of steps is reversed, with the gratuitous capacity for editing preceding the "defect".

Therapeutic mRNA Editing
Directing edits to correct mutated sequences was first proposed and demonstrated in 1995. This initial work used synthetic RNA antisense oligonucleotides complementary to a pre-mature stop codon mutation in a dystrophin sequence to activate A-to-I editing of the stop codon to a read through codon in a model xenopus cell system. While this also led to nearby inadvertent A-to-I transitions, A to I (read as G) transitions can correct all three stop codons, but cannot create a stop codon. Therefore, the changes led >25% correction of the targeted stop codon with read through to a downstream luciferase reporter sequence. Follow on work by Rosenthal achieved editing of mutated mRNA sequence in mammalian cell culture by directing an oligonucleotide linked to a cytidine deaminase to correct a mutated cystic fibrosis sequence. More recently, CRISPR-Cas13 fused to deaminases has been employed to direct mRNA editing.

In 2022, therapeutic RNA editing for Cas7-11 was reported. It enables sufficiently targeted cuts and an early version of it was used for in vitro editing in 2021.

Comparison to DNA editing
Unlike DNA editing, which is permanent, the effects of RNA editing − including potential off-target mutations in RNA − are transient and are not inherited. RNA editing is therefore considered to be less risky. Furthermore, it may only require a guide RNA by using the ADAR protein already found in humans and many other eukaryotes' cells instead of needing to introduce a foreign protein into the body.