Non-coding RNA

A non-coding RNA (ncRNA) is a functional RNA molecule that is not translated into a protein. The DNA sequence from which a functional non-coding RNA is transcribed is often called an RNA gene. Abundant and functionally important types of non-coding RNAs include transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs), as well as small RNAs such as microRNAs, siRNAs, piRNAs, snoRNAs, snRNAs, exRNAs, scaRNAs and the long ncRNAs such as Xist and HOTAIR.

The number of non-coding RNAs within the human genome is unknown; however, recent transcriptomic and bioinformatic studies suggest that there are thousands of non-coding transcripts. Many of the newly identified ncRNAs have unknown functions, if any. There is no consensus on how much of non-coding transcription is functional: some believe most ncRNAs to be non-functional "junk RNA", spurious transcriptions, while others expect that many non-coding transcripts have functions to be discovered.

History and discovery
Nucleic acids were first discovered in 1868 by Friedrich Miescher, and by 1939, RNA had been implicated in protein synthesis. Two decades later, Francis Crick predicted a functional RNA component which mediated translation; he reasoned that RNA is better suited to base-pair with an mRNA transcript than a pure polypeptide.

The first non-coding RNA to be characterised was an alanine tRNA found in baker's yeast, its structure was published in 1965. To produce a purified alanine tRNA sample, Robert W. Holley et al. used 140kg of commercial baker's yeast to give just 1g of purified tRNAAla for analysis. The 80 nucleotide tRNA was sequenced by first being digested with Pancreatic ribonuclease (producing fragments ending in Cytosine or Uridine) and then with takadiastase ribonuclease Tl (producing fragments which finished with Guanosine). Chromatography and identification of the 5' and 3' ends then helped arrange the fragments to establish the RNA sequence. Of the three structures originally proposed for this tRNA, the 'cloverleaf' structure was independently proposed in several following publications. The cloverleaf secondary structure was finalised following X-ray crystallography analysis performed by two independent research groups in 1974.

Ribosomal RNA was next to be discovered, followed by URNA in the early 1980s. Since then, the discovery of new non-coding RNAs has continued with snoRNAs, Xist, CRISPR and many more. Recent notable additions include riboswitches and miRNA; the discovery of the RNAi mechanism associated with the latter earned Craig C. Mello and Andrew Fire the 2006 Nobel Prize in Physiology or Medicine.

Recent discoveries of ncRNAs have been achieved through both experimental and bioinformatic methods.

Biological roles
Noncoding RNAs belong to several groups and are involved in many cellular processes. These range from ncRNAs of central importance that are conserved across all or most cellular life through to more transient ncRNAs specific to one or a few closely related species. The more conserved ncRNAs are thought to be molecular fossils or relics from the last universal common ancestor and the RNA world, and their current roles remain mostly in regulation of information flow from DNA to protein.

In translation


Many of the conserved, essential and abundant ncRNAs are involved in translation. Ribonucleoprotein (RNP) particles called ribosomes are the 'factories' where translation takes place in the cell. The ribosome consists of more than 60% ribosomal RNA; these are made up of 3 ncRNAs in prokaryotes and 4 ncRNAs in eukaryotes. Ribosomal RNAs catalyse the translation of nucleotide sequences to protein. Another set of ncRNAs, Transfer RNAs, form an 'adaptor molecule' between mRNA and protein. The H/ACA box and C/D box snoRNAs are ncRNAs found in archaea and eukaryotes. RNase MRP is restricted to eukaryotes. Both groups of ncRNA are involved in the maturation of rRNA. The snoRNAs guide covalent modifications of rRNA, tRNA and snRNAs; RNase MRP cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs. The ubiquitous ncRNA, RNase P, is an evolutionary relative of RNase MRP. RNase P matures tRNA sequences by generating mature 5'-ends of tRNAs through cleaving the 5'-leader elements of precursor-tRNAs. Another ubiquitous RNP called SRP recognizes and transports specific nascent proteins to the endoplasmic reticulum in eukaryotes and the plasma membrane in prokaryotes. In bacteria, Transfer-messenger RNA (tmRNA) is an RNP involved in rescuing stalled ribosomes, tagging incomplete polypeptides and promoting the degradation of aberrant mRNA.

In RNA splicing


In eukaryotes, the spliceosome performs the splicing reactions essential for removing intron sequences, this process is required for the formation of mature mRNA. The spliceosome is another RNP often known as the snRNP or tri-snRNP. There are two different forms of the spliceosome, the major and minor forms. The ncRNA components of the major spliceosome are U1, U2, U4, U5, and U6. The ncRNA components of the minor spliceosome are U11, U12, U5, U4atac and U6atac.

Another group of introns can catalyse their own removal from host transcripts; these are called self-splicing RNAs. There are two main groups of self-splicing RNAs: group I catalytic intron and group II catalytic intron. These ncRNAs catalyze their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms.

In mammals it has been found that snoRNAs can also regulate the alternative splicing of mRNA, for example snoRNA HBII-52 regulates the splicing of serotonin receptor 2C.

In nematodes, the SmY ncRNA appears to be involved in mRNA trans-splicing.

In DNA replication


Y RNAs are stem loops, necessary for DNA replication through interactions with chromatin and initiation proteins (including the origin recognition complex). They are also components of the Ro60 ribonucleoprotein particle which is a target of autoimmune antibodies in patients with systemic lupus erythematosus.

In gene regulation
The expression of many thousands of genes are regulated by ncRNAs. This regulation can occur in trans or in cis. There is increasing evidence that a special type of ncRNAs called enhancer RNAs, transcribed from the enhancer region of a gene, act to promote gene expression.

Trans-acting
In higher eukaryotes microRNAs regulate gene expression. A single miRNA can reduce the expression levels of hundreds of genes. The mechanism by which mature miRNA molecules act is through partial complementarity to one or more messenger RNA (mRNA) molecules, generally in 3' UTRs. The main function of miRNAs is to down-regulate gene expression.

The ncRNA RNase P has also been shown to influence gene expression. In the human nucleus, RNase P is required for the normal and efficient transcription of various ncRNAs transcribed by RNA polymerase III. These include tRNA, 5S rRNA, SRP RNA, and U6 snRNA genes. RNase P exerts its role in transcription through association with Pol III and chromatin of active tRNA and 5S rRNA genes.

It has been shown that 7SK RNA, a metazoan ncRNA, acts as a negative regulator of the RNA polymerase II elongation factor P-TEFb, and that this activity is influenced by stress response pathways.

The bacterial ncRNA, 6S RNA, specifically associates with RNA polymerase holoenzyme containing the sigma70 specificity factor. This interaction represses expression from a sigma70-dependent promoter during stationary phase.

Another bacterial ncRNA, OxyS RNA represses translation by binding to Shine-Dalgarno sequences thereby occluding ribosome binding. OxyS RNA is induced in response to oxidative stress in Escherichia coli.

The B2 RNA is a small noncoding RNA polymerase III transcript that represses mRNA transcription in response to heat shock in mouse cells. B2 RNA inhibits transcription by binding to core Pol II. Through this interaction, B2 RNA assembles into preinitiation complexes at the promoter and blocks RNA synthesis.

A recent study has shown that just the act of transcription of ncRNA sequence can have an influence on gene expression. RNA polymerase II transcription of ncRNAs is required for chromatin remodelling in the Schizosaccharomyces pombe. Chromatin is progressively converted to an open configuration, as several species of ncRNAs are transcribed.

Cis-acting
A number of ncRNAs are embedded in the 5' UTRs (Untranslated Regions) of protein coding genes and influence their expression in various ways. For example, a riboswitch can directly bind a small target molecule; the binding of the target affects the gene's activity.

RNA leader sequences are found upstream of the first gene of amino acid biosynthetic operons. These RNA elements form one of two possible structures in regions encoding very short peptide sequences that are rich in the end product amino acid of the operon. A terminator structure forms when there is an excess of the regulatory amino acid and ribosome movement over the leader transcript is not impeded. When there is a deficiency of the charged tRNA of the regulatory amino acid the ribosome translating the leader peptide stalls and the antiterminator structure forms. This allows RNA polymerase to transcribe the operon. Known RNA leaders are Histidine operon leader, Leucine operon leader, Threonine operon leader and the Tryptophan operon leader.

Iron response elements (IRE) are bound by iron response proteins (IRP). The IRE is found in UTRs of various mRNAs whose products are involved in iron metabolism. When iron concentration is low, IRPs bind the ferritin mRNA IRE leading to translation repression.

Internal ribosome entry sites (IRES) are RNA structures that allow for translation initiation in the middle of a mRNA sequence as part of the process of protein synthesis.

In genome defense
Piwi-interacting RNAs (piRNAs) expressed in mammalian testes and somatic cells form RNA-protein complexes with Piwi proteins. These piRNA complexes (piRCs) have been linked to transcriptional gene silencing of retrotransposons and other genetic elements in germline cells, particularly those in spermatogenesis.

Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are repeats found in the DNA of many bacteria and archaea. The repeats are separated by spacers of similar length. It has been demonstrated that these spacers can be derived from phage and subsequently help protect the cell from infection.

Chromosome structure
Telomerase is an RNP enzyme that adds specific DNA sequence repeats ("TTAGGG" in vertebrates) to telomeric regions, which are found at the ends of eukaryotic chromosomes. The telomeres contain condensed DNA material, giving stability to the chromosomes. The enzyme is a reverse transcriptase that carries Telomerase RNA, which is used as a template when it elongates telomeres, which are shortened after each replication cycle.

Xist (X-inactive-specific transcript) is a long ncRNA gene on the X chromosome of the placental mammals that acts as major effector of the X chromosome inactivation process forming Barr bodies. An antisense RNA, Tsix, is a negative regulator of Xist. X chromosomes lacking Tsix expression (and thus having high levels of Xist transcription) are inactivated more frequently than normal chromosomes. In drosophilids, which also use an XY sex-determination system, the roX (RNA on the X) RNAs are involved in dosage compensation. Both Xist and roX operate by epigenetic regulation of transcription through the recruitment of histone-modifying enzymes.

Bifunctional RNA
Bifunctional RNAs, or dual-function RNAs, are RNAs that have two distinct functions. The majority of the known bifunctional RNAs are mRNAs that encode both a protein and ncRNAs. However, a growing number of ncRNAs fall into two different ncRNA categories; e.g., H/ACA box snoRNA and miRNA.

Two well known examples of bifunctional RNAs are SgrS RNA and RNAIII. However, a handful of other bifunctional RNAs are known to exist (e.g., steroid receptor activator/SRA, VegT RNA, Oskar RNA, ENOD40, p53 RNA SR1 RNA, and Spot 42 RNA. )  Bifunctional RNAs were the subject of a 2011 special issue of Biochimie.

As a hormone
There is an important link between certain non-coding RNAs and the control of hormone-regulated pathways. In Drosophila, hormones such as ecdysone and juvenile hormone can promote the expression of certain miRNAs. Furthermore, this regulation occurs at distinct temporal points within Caenorhabditis elegans development. In mammals, miR-206 is a crucial regulator of estrogen-receptor-alpha.

Non-coding RNAs are crucial in the development of several endocrine organs, as well as in endocrine diseases such as diabetes mellitus. Specifically in the MCF-7 cell line, addition of 17β-estradiol increased global transcription of the noncoding RNAs called lncRNAs near estrogen-activated coding genes.

In pathogenic avoidance
C. elegans was shown to learn and inherit pathogenic avoidance after exposure to a single non-coding RNA of a bacterial pathogen.

Roles in disease
As with proteins, mutations or imbalances in the ncRNA repertoire within the body can cause a variety of diseases.

Cancer
Many ncRNAs show abnormal expression patterns in cancerous tissues. These include miRNAs, long mRNA-like ncRNAs, GAS5, SNORD50, telomerase RNA and Y RNAs. The miRNAs are involved in the large scale regulation of many protein coding genes, the Y RNAs are important for the initiation of DNA replication, telomerase RNA that serves as a primer for telomerase, an RNP that extends telomeric regions at chromosome ends (see telomeres and disease for more information). The direct function of the long mRNA-like ncRNAs is less clear.

Germline mutations in miR-16-1 and miR-15 primary precursors have been shown to be much more frequent in patients with chronic lymphocytic leukemia compared to control populations.

It has been suggested that a rare SNP (rs11614913) that overlaps hsa-mir-196a-2 has been found to be associated with non-small cell lung carcinoma. Likewise, a screen of 17 miRNAs that have been predicted to regulate a number of breast cancer associated genes found variations in the microRNAs miR-17 and miR-30c-1of patients; these patients were noncarriers of BRCA1 or BRCA2 mutations, lending the possibility that familial breast cancer may be caused by variation in these miRNAs. The p53 tumor suppressor is arguably the most important agent in preventing tumor formation and progression. The p53 protein functions as a transcription factor with a crucial role in orchestrating the cellular stress response. In addition to its crucial role in cancer, p53 has been implicated in other diseases including diabetes, cell death after ischemia, and various neurodegenerative diseases such as Huntington, Parkinson, and Alzheimer. Studies have suggested that p53 expression is subject to regulation by non-coding RNA.

Another example of non-coding RNA dysregulated in cancer cells is the long non-coding RNA Linc00707. Linc00707 is upregulated and sponges miRNAs in human bone marrow-derived mesenchymal stem cells, in hepatocellular carcinoma, gastric cancer or breast cancer, and thus promotes osteogenesis, contributes to hepatocellular carcinoma progression, promotes proliferation and metastasis, or indirectly regulates expression of proteins involved in cancer aggressiveness, respectively.

Prader–Willi syndrome
The deletion of the 48 copies of the C/D box snoRNA SNORD116 has been shown to be the primary cause of Prader–Willi syndrome. Prader–Willi is a developmental disorder associated with over-eating and learning difficulties. SNORD116 has potential target sites within a number of protein-coding genes, and could have a role in regulating alternative splicing.

Autism
The chromosomal locus containing the small nucleolar RNA SNORD115 gene cluster has been duplicated in approximately 5% of individuals with autistic traits. A mouse model engineered to have a duplication of the SNORD115 cluster displays autistic-like behaviour. A recent small study of post-mortem brain tissue demonstrated altered expression of long non-coding RNAs in the prefrontal cortex and cerebellum of autistic brains as compared to controls.

Cartilage–hair hypoplasia
Mutations within RNase MRP have been shown to cause cartilage–hair hypoplasia, a disease associated with an array of symptoms such as short stature, sparse hair, skeletal abnormalities and a suppressed immune system that is frequent among Amish and Finnish. The best characterised variant is an A-to-G transition at nucleotide 70 that is in a loop region two bases 5' of a conserved pseudoknot. However, many other mutations within RNase MRP also cause CHH.

Alzheimer's disease
The antisense RNA, BACE1-AS is transcribed from the opposite strand to BACE1 and is upregulated in patients with Alzheimer's disease. BACE1-AS regulates the expression of BACE1 by increasing BACE1 mRNA stability and generating additional BACE1 through a post-transcriptional feed-forward mechanism. By the same mechanism it also raises concentrations of beta amyloid, the main constituent of senile plaques. BACE1-AS concentrations are elevated in subjects with Alzheimer's disease and in amyloid precursor protein transgenic mice.

miR-96 and hearing loss
Variation within the seed region of mature miR-96 has been associated with autosomal dominant, progressive hearing loss in humans and mice. The homozygous mutant mice were profoundly deaf, showing no cochlear responses. Heterozygous mice and humans progressively lose the ability to hear.

Mitochondrial transfer RNAs
A number of mutations within mitochondrial tRNAs have been linked to diseases such as MELAS syndrome, MERRF syndrome, and chronic progressive external ophthalmoplegia.

Distinction between functional RNA (fRNA) and ncRNA
Scientists have started to distinguish functional RNA (fRNA) from ncRNA, to describe regions functional at the RNA level that may or may not be stand-alone RNA transcripts. This implies that fRNA (such as riboswitches, SECIS elements, and other cis-regulatory regions) is not ncRNA. Yet fRNA could also include mRNA, as this is RNA coding for protein, and hence is functional. Additionally artificially evolved RNAs also fall under the fRNA umbrella term. Some publications state that ncRNA and fRNA are nearly synonymous, however others have pointed out that a large proportion of annotated ncRNAs likely have no function. It also has been suggested to simply use the term RNA, since the distinction from a protein coding RNA (messenger RNA) is already given by the qualifier mRNA. This eliminates the ambiguity when addressing a gene "encoding a non-coding" RNA. Besides, there may be a number of ncRNAs that are misannoted in published literature and datasets.