History of RNA biology

Numerous key discoveries in biology have emerged from studies of RNA (ribonucleic acid), including seminal work in the fields of biochemistry, genetics, microbiology, molecular biology, molecular evolution, and structural biology. As of 2010, 30 scientists have been awarded Nobel Prizes for experimental work that includes studies of RNA. Specific discoveries of high biological significance are discussed in this article.

For related information, see the articles on History of molecular biology and History of genetics. For background information, see the articles on RNA and nucleic acids.

RNA and DNA have distinct chemical properties
When first studied in the early 1900s, the chemical and biological differences between RNA and DNA were not apparent, and they were named after the materials from which they were isolated; RNA was initially known as "yeast nucleic acid" and DNA was "thymus nucleic acid". Using diagnostic chemical tests, carbohydrate chemists showed that the two nucleic acids contained different sugars, whereupon the common name for RNA became "ribose nucleic acid". Other early biochemical studies showed that RNA was readily broken down at high pH, while DNA was stable (although denatured) in alkali. Nucleoside composition analysis showed first that RNA contained similar nucleobases to DNA, with uracil instead of thymine, and that RNA contained a number of minor nucleobase components, e.g. small amounts of pseudouridine and dimethylguanine.

Localization in cell and morphogenetic role
In 1933, while studying virgin sea urchin eggs, Jean Brachet suggested that DNA is found in cell nucleus and that RNA is present exclusively in the cytoplasm. At the time, "yeast nucleic acid" (RNA) was thought to occur only in plants, while "thymus nucleic acid" (DNA) only in animals. The latter was thought to be a tetramer, with the function of buffering cellular pH. During the 1930s, Joachim Hämmerling conducted experiments with Acetabularia in which he began to distinguish the contributions of the nucleus and the cytoplasm substances (later discovered to be DNA and mRNA, respectively) to cell morphogenesis and development.

Messenger RNA (mRNA) carries genetic information that directs protein synthesis
The concept of messenger RNA emerged during the late 1950s, and is associated with Crick's description of his "Central Dogma of Molecular Biology", which asserted that DNA led to the formation of RNA, which in turn led to the synthesis of proteins. During the early 1960s, sophisticated genetic analysis of mutations in the lac operon of E. coli and in the rII locus of bacteriophage T4 were instrumental in defining the nature of both messenger RNA and the genetic code. The short-lived nature of bacterial RNAs, together with the highly complex nature of the cellular mRNA population, made the biochemical isolation of mRNA very challenging. This problem was overcome in the 1960s by the use of reticulocytes in vertebrates, which produce large quantities of mRNA that are highly enriched in RNA encoding alpha- and beta-globin (the two major protein chains of hemoglobin). The first direct experimental evidence for the existence of mRNA was provided by such a hemoglobin synthesizing system.

Ribosomes make proteins
In the 1950s, results of labeling experiments in rat liver showed that radioactive amino acids were found to be associated with "microsomes" (later redefined as ribosomes) very rapidly after administration, and before they became widely incorporated into cellular proteins. Ribosomes were first visualized using electron microscopy, and their ribonucleoprotein components were identified by biophysical methods, chiefly sedimentation analysis within ultracentrifuges capable of generating very high accelerations (equivalent to hundreds of thousands times gravity). Polysomes (multiple ribosomes moving along a single mRNA molecule) were identified in the early 1960s, and their study led to an understanding of how ribosomes proceed to read the mRNA in a 5′ to 3′ direction, processively generating proteins as they do so.

Transfer RNA (tRNA) is the physical link between RNA and protein
Biochemical fractionation experiments showed that radioactive amino acids were rapidly incorporated into small RNA molecules that remained soluble under conditions where larger RNA-containing particles would precipitate. These molecules were termed soluble (sRNA) and were later renamed transfer RNA (tRNA). Subsequent studies showed that (i) every cell has multiple species of tRNA, each of which is associated with a single specific amino acid, (ii) that there are a matching set of enzymes responsible for linking tRNAs with the correct amino acids, and (iii) that tRNA anticodon sequences form a specific decoding interaction with mRNA codons.

The genetic code is solved
The genetic code consists of the translation of particular nucleotide sequences in mRNA to specific amino acid sequences in proteins (polypeptides). The ability to work out the genetic code emerged from the convergence of three different areas of study: (i) new methods to generate synthetic RNA molecules of defined composition to serve as artificial mRNAs, (ii) development of in vitro translation systems that could be used to translate the synthetic mRNAs into protein, and (iii) experimental and theoretical genetic work which established that the code was written in three letter "words" (codons). Today, our understanding of the genetic code permits the prediction of the amino sequence of the protein products of the tens of thousands of genes whose sequences are being determined in genome studies.

RNA polymerase is purified
The biochemical purification and characterization of RNA polymerase from the bacterium Escherichia coli enabled the understanding of the mechanisms through which RNA polymerase initiates and terminates transcription, and how those processes are regulated to regulate gene expression (i.e. turn genes on and off). Following the isolation of E. coli RNA polymerase, the three RNA polymerases of the eukaryotic nucleus were identified, as well as those associated with viruses and organelles. Studies of transcription also led to the identification of many protein factors that influence transcription, including repressors, activators and enhancers. The availability of purified preparations of RNA polymerase permitted investigators to develop a wide range of novel methods for studying RNA in the test tube, and led directly to many of the subsequent key discoveries in RNA biology.

First complete nucleotide sequence of a biological nucleic acid molecule
Although determining the sequence of proteins was becoming somewhat routine, methods for sequencing of nucleic acids were not available until the mid-1960s. In this seminal work, a specific tRNA was purified in substantial quantities, and then sliced into overlapping fragments using a variety of ribonucleases. Analysis of the detailed nucleotide composition of each fragment provided the information necessary to deduce the sequence of the tRNA. Today, the sequence analysis of much larger nucleic acid molecules is highly automated and enormously faster.

Evolutionary variation of homologous RNA sequences reveals folding patterns
Additional tRNA molecules were purified and sequenced. The first comparative sequence analysis was done and revealed that the sequences varied through evolution in such a way that all of the tRNAs could fold into very similar secondary structures (two-dimensional structures) and had identical sequences at numerous positions (e.g. CCA at the 3′ end). The radial four-arm structure of tRNA molecules is termed the 'cloverleaf structure', and results from the evolution of sequences with common ancestry and common biological function. Since the discovery of the tRNA cloverleaf, comparative analysis of numerous other homologous RNA molecules has led to the identification of common sequences and folding patterns.

First complete genomic nucleotide sequence
The 3569 nucleotide sequence of all of the genes of the RNA bacteriophage MS2 was determined by a large team of researchers over several years, and was reported in a series of scientific papers. These results enabled the analysis of the first complete genome, albeit an extremely tiny one by modern standards. Several surprising features were identified, including genes that partially overlap one another and the first clues that different organisms might have slightly different codon usage patterns.

Reverse transcriptase can copy RNA into DNA
Retroviruses were shown to have a single-stranded RNA genome and to replicate via a DNA intermediate, the reverse of the usual DNA-to-RNA transcription pathway. They encode a RNA-dependent DNA polymerase (reverse transcriptase) that is essential for this process. Some retroviruses can cause diseases, including several that are associated with cancer, and HIV-1 which causes AIDS. Reverse transcriptase has been widely used as an experimental tool for the analysis of RNA molecules in the laboratory, in particular the conversion of RNA molecules into DNA prior to molecular cloning and/or polymerase chain reaction (PCR).

RNA replicons evolve rapidly
Biochemical and genetic analyses showed that the enzyme systems that replicate viral RNA molecules (reverse transcriptases and RNA replicases) lack molecular proofreading (3′ to 5′ exonuclease) activity, and that RNA sequences do not benefit from extensive repair systems analogous to those that exist for maintaining and repairing DNA sequences. Consequently, RNA genomes appear to be subject to significantly higher mutation rates than DNA genomes. For example, mutations in HIV-1 that lead to the emergence of viral mutants that are insensitive to antiviral drugs are common, and constitute a major clinical challenge.

Ribosomal RNA (rRNA) sequences provide a record of the evolutionary history of all life forms
Analysis of ribosomal RNA sequences from a large number of organisms demonstrated that all extant forms of life on Earth share common structural and sequence features of the ribosomal RNA, reflecting a common ancestry. Mapping the similarities and differences among rRNA molecules from different sources provides clear and quantitative information about the phylogenetic (i.e. evolutionary) relationships among organisms. Analysis of rRNA molecules led to the identification of a third major kingdom of organisms, the archaea, in addition to the prokaryotes and eukaryotes.

Non-encoded nucleotides are added to the ends of RNA molecules
Molecular analysis of mRNA molecules showed that, following transcription, mRNAs have non-DNA-encoded nucleotides added to both their 5′ and 3′ ends (guanosine caps and poly-A, respectively). Enzymes were also identified that add and maintain the universal CCA sequence on the 3′ end of tRNA molecules. These events are among the first discovered examples of RNA processing, a complex series of reactions that are needed to convert RNA primary transcripts into biologically active RNA molecules.

Small RNA molecules are abundant in the eukaryotic nucleus
Small nuclear RNA molecules (snRNAs) were identified in the eukaryotic nucleus using immunological studies with autoimmune antibodies, which bind to small nuclear ribonucleoprotein complexes (snRNPs; complexes of the snRNA and protein). Subsequent biochemical, genetic, and phylogenetic studies established that many of these molecules play key roles in essential RNA processing reactions within the nucleus and nucleolus, including RNA splicing, polyadenylation, and the maturation of ribosomal RNAs.

RNA molecules require a specific, complex three-dimensional structure for activity
The detailed three-dimensional structure of tRNA molecules was determined using X-ray crystallography, and revealed highly complex, compact three dimensional structures consisting of tertiary interactions laid upon the basic cloverleaf secondary structure. Key features of tRNA tertiary structure include the coaxial stacking of adjacent helices and non-Watson-Crick interactions among nucleotides within the apical loops. Additional crystallographic studies showed that a wide range of RNA molecules (including ribozymes, riboswitches and ribosomal RNA) also fold into specific structures containing a variety of 3D structural motifs. The ability of RNA molecules to adopt specific tertiary structures is essential for their biological activity, and results from the single-stranded nature of RNA. In many ways, RNA folding is more highly analogous to the folding of proteins rather than to the highly repetitive folded structure of the DNA double helix.

Genes are commonly interrupted by introns that must be removed by RNA splicing
Analysis of mature eukaryotic messenger RNA molecules showed that they are often much smaller than the DNA sequences that encode them. The genes were shown to be discontinuous, composed of sequences that are not present in the final mature RNA (introns), located between sequences that are retained in the mature RNA (exons). Introns were shown to be removed after transcription through a process termed RNA splicing. Splicing of RNA transcripts requires a highly precise and coordinated sequence of molecular events, consisting of (a) definition of boundaries between exons and introns, (b) RNA strand cleavage at exactly those sites, and (c) covalent linking (ligation) of the RNA exons in the correct order. The discovery of discontinuous genes and RNA splicing was entirely unexpected by the community of RNA biologists, and stands as one of the most shocking findings in molecular biology research.

Alternative pre-mRNA splicing generates multiple proteins from a single gene
The great majority of protein-coding genes encoded within the nucleus of metazoan cells contain multiple introns. In many cases, these introns were shown to be processed in more than one pattern, thus generating a family of related mRNAs that differ, for example, by the inclusion or exclusion of particular exons. The result of alternative splicing is that a single gene can encode a number of different protein isoforms that can exhibit a variety of (usually related) biological functions. Indeed, most of the proteins encoded by the human genome are generated by alternative splicing.

Discovery of catalytic RNA (ribozymes)
An experimental system was developed in which an intron-containing rRNA precursor from the nucleus of the ciliated protozoan Tetrahymena could be spliced in vitro. Subsequent biochemical analysis shows that this group I intron was self-splicing; that is, the precursor RNA is capable of carrying out the complete splicing reaction in the absence of proteins. In separate work, the RNA component of the bacterial enzyme ribonuclease P (a ribonucleoprotein complex) was shown to catalyze its tRNA-processing reaction in the absence of proteins. These experiments represented landmarks in RNA biology, since they revealed that RNA could play an active role in cellular processes, by catalyzing specific biochemical reactions. Before these discoveries, it was believed that biological catalysis was solely the realm of protein enzymes.

RNA was likely critical for prebiotic evolution
The discovery of catalytic RNA (ribozymes) showed that RNA could both encode genetic information (like DNA) and catalyze specific biochemical reactions (like protein enzymes). This realization led to the RNA World Hypothesis, a proposal that RNA may have played a critical role in prebiotic evolution at a time before the molecules with more specialized functions (DNA and proteins) came to dominate biological information coding and catalysis. Although it is not possible for us to know the course of prebiotic evolution with any certainty, the presence of functional RNA molecules with common ancestry in all modern-day life forms is a strong argument that RNA was widely present at the time of the last common ancestor.

Introns can be mobile genetic elements
Some self-splicing introns can spread through a population of organisms by "homing", inserting copies of themselves into genes at sites that previously lacked an intron. Because they are self-splicing (that is, they remove themselves at the RNA level from genes into which they have inserted), these sequences represent transposons that are genetically silent, i.e. they do not interfere with the expression of the gene into which they become inserted. These introns can be regarded as examples of selfish DNA. Some mobile introns encode homing endonucleases, enzymes that initiate the homing process by specifically cleaving double-stranded DNA at or near the intron-insertion site of alleles lacking an intron. Mobile introns are frequently members of either the group I or group II families of self-splicing introns.

Spliceosomes mediate nuclear pre-mRNA splicing
Introns are removed from nuclear pre-mRNAs by spliceosomes, large ribonucleoprotein complexes made up of snRNA and protein molecules whose composition and molecular interactions change during the course of the RNA splicing reactions. Spliceosomes assemble on and around splice sites (the boundaries between introns and exons in the unspliced pre-mRNA) in mRNA precursors and use RNA-RNA interactions to identify critical nucleotide sequences and, probably, to catalyze the splicing reactions. Nuclear pre-mRNA introns and spliceosome-associated snRNAs show similar structural features to self-splicing group II introns. In addition, the splicing pathway of nuclear pre-mRNA introns and group II introns shares a similar reaction pathway. These similarities have led to the hypothesis that these molecules may share a common ancestor.

RNA sequences can be edited within cells
Messenger RNA precursors from a wide range of organisms can be edited before being translated into protein. In this process, non-encoded nucleotides may be inserted into specific sites in the RNA, and encoded nucleotides may be removed or replaced. RNA editing was first discovered within the mitochondria of kinetoplastid protozoans, where it has been shown to be extensive. For example, some protein-coding genes encode fewer than 50% of the nucleotides found within the mature, translated mRNA. Other RNA editing events are found in mammals, plants, bacteria and viruses. These latter editing events involve fewer nucleotide modifications, insertions and deletions than the events within kinetoplast DNA, but still have high biological significance for gene expression and its regulation.

Telomerase uses a built-in RNA template to maintain chromosome ends
Telomerase is an enzyme that is present in all eukaryotic nuclei which serves to maintain the ends of the linear DNA in the linear chromosomes of the eukaryotic nucleus, through the addition of terminal sequences that are lost in each round of DNA replication. Before telomerase was identified, its activity was predicted on the basis of a molecular understanding of DNA replication, which indicated that the DNA polymerases known at that time could not replicate the 3′ end of a linear chromosome, due to the absence of a template strand. Telomerase was shown to be a ribonucleoprotein enzyme that contains an RNA component that serves as a template strand, and a protein component that has reverse transcriptase activity and adds nucleotides to the chromosome ends using the internal RNA template.

Ribosomal RNA catalyzes peptide bond formation
For years, scientists had worked to identify which protein(s) within the ribosome were responsible for peptidyl transferase function during translation, because the covalent linking of amino acids represents one of the most central chemical reactions in all of biology. Careful biochemical studies showed that extensively-deproteinized large ribosomal subunits could still catalyze peptide bond formation, thereby implying that the sought-after activity might lie within ribosomal RNA rather than ribosomal proteins. Structural biologists, using X-ray crystallography, localized the peptidyl transferase center of the ribosome to a highly-conserved region of the large subunit ribosomal RNA (rRNA) that is located at the place within the ribosome where the amino-acid-bearing ends of tRNA bind, and where no proteins are present. These studies led to the conclusion that the ribosome is a ribozyme. The rRNA sequences that make up the ribosomal active site represent some of the most highly conserved sequences in the biological world. Together, these observations indicate that peptide bond formation catalyzed by RNA was a feature of the last common ancestor of all known forms of life.

Combinatorial selection of RNA molecules enables in vitro evolution
Experimental methods were invented that allowed investigators to use large, diverse populations of RNA molecules to carry out in vitro molecular experiments that utilized powerful selective replication strategies used by geneticists, and which amount to evolution in the test tube. These experiments have been described using different names, the most common of which are "combinatorial selection", "in vitro selection", and SELEX (for Systematic Evolution of Ligands by Exponential Enrichment). These experiments have been used for isolating RNA molecules with a wide range of properties, from binding to particular proteins, to catalyzing particular reactions, to binding low molecular weight organic ligands. They have equal applicability to elucidating interactions and mechanisms that are known properties of naturally occurring RNA molecules to isolating RNA molecules with biochemical properties that are not known in nature. In developing in vitro selection technology for RNA, laboratory systems for synthesizing complex populations of RNA molecules were established, and used in conjunction with the selection of molecules with user-specified biochemical activities, and in vitro schemes for RNA replication. These steps can be viewed as (a) mutation, (b) selection, and (c) replication. Together, then, these three processes enable in vitro molecular evolution.

Many mobile DNA elements use an RNA intermediate
Transposable genetic elements (transposons) are found which can replicate via transcription into an RNA intermediate which is subsequently converted to DNA by reverse transcriptase. These sequences, many of which are likely related to retroviruses, constitute much of the DNA of the eukaryotic nucleus, especially so in plants. Genomic sequencing shows that retrotransposons make up 36% of the human genome and over half of the genome of major cereal crops (wheat and maize).

Riboswitches bind cellular metabolites and control gene expression
Segments of RNA, typically embedded within the 5′-untranslated region of a vast number of bacterial mRNA molecules, have a profound effect on gene expression through a previously-undiscovered mechanism that does not involve the participation of proteins. In many cases, riboswitches change their folded structure in response to environmental conditions (e.g. ambient temperature or concentrations of specific metabolites), and the structural change controls the translation or stability of the mRNA in which the riboswitch is embedded. In this way, gene expression can be dramatically regulated at the post-transcriptional level.

Small RNA molecules regulate gene expression by post-transcriptional gene silencing
Another previously unknown mechanism by which RNA molecules are involved in genetic regulation was discovered in the 1990s. Small RNA molecules termed microRNA (miRNA) and small interfering RNA (siRNA) are abundant in eukaryotic cells and exert post-transcriptional control over mRNA expression. They function by binding to specific sites within the mRNA and inducing cleavage of the mRNA via a specific silencing-associated RNA degradation pathway.

Noncoding RNA controls epigenetic phenomena
In addition to their well-established roles in translation and splicing, members of noncoding RNA (ncRNA) families have recently been found to function in genome defense and chromosome inactivation. For example, piwi-interacting RNAs (piRNAs) prevent genome instability in germ line cells, while Xist (X-inactive-specific-transcript) is essential for X-chromosome inactivation in mammals.