User:MuanN/Principles of molecular genetics

The role of genomic information in the practice of clinical medicine is increasing at a rapid pace. Examples include prenatal screening, diagnosis and molecular classification of rare genetic disease, tumor classification by gene expression analysis, and pharmacogenetic applications for medication dosing. Gene identification for complex traits is also progressing rapidly. Findings from these investigations will undoubtedly translate into the clinical realm in due time. A collection of online tutorials and instructional videos are also available at http://learn.genetics.utah.edu.

Physicians require knowledge of the basic principles of genetics to adequately incorporate these applications into clinical practice, with an understanding of the origins of these forms of data and their significance.

The basic principles of molecular genetics are reviewed here. The material summarized here is essential to understand topics related to the basic science and clinical applications of genetics addressed elsewhere within UpToDate. A glossary of genetic terms is available separately.

Readers are encouraged to consult introductory texts in molecular genetics and biology for more detailed reviews of these concepts.

Central dogma of molecular biology
The fundamental processes by which heritable information is stored, transmitted from generation to generation, and translated from code to function are common to all eukaryotes. These processes together are referred to as the Central Dogma of Molecular Biology (figure 1).

Heritable genetic information is stored as DNA, a stable molecule that is faithfully replicated and transmitted from one generation to the next. Segments of DNA that code for functional elements, such as genes, are transcribed into messenger RNA (mRNA). Through the process of translation, mRNA serves as the template for protein synthesis.

The Central Dogma implies a unidirectional flow of information from DNA to RNA to protein. Although it is increasingly evident that the interactions between DNA, RNA and protein are far more complex, the basic principle is a useful paradigm for understanding how the genetic code translates to function, and how disruption of the normal DNA sequence can impact health and disease.

Nucleic acids
Deoxyribose nucleic acid (DNA) and ribose nucleic acid (RNA) are molecular chains composed of a repeating modular structure that consists of a modified five-carbon ringed sugar (deoxyribose and ribose, respectively) bound to one of four nucleic acids. These modules are connected to one another by the formation of phosphodiester bonds between 3' and 5' carbon positions of adjacent modules (figure 2).

In RNA, the 2' carbon is hydroxylated, making RNA species susceptible to environmental insult. DNA lacks this 2' hydroxyl group, conferring significantly greater stability. As such, DNA is better suited for longterm information storage. The relative instability of RNA, in contrast, is ideal as an intermediate moiety for gene activation. Expressed genes (that is, genes transcribed from DNA to RNA for protein synthesis) can be readily "turned-off" by RNA degradation, mediated through the enzyme RNAse.

Nucleic acid is bound to the 1-carbon position of either ribose or deoxyribose. The four bases in DNA are:


 * Adenine (A)
 * Guanine (G)
 * Cytosine (C)
 * Thymine (T)

Adenine and guanine are purines; cytosine and thymine are pyrimidines. In RNA, the pyrimidine Uracil (U) replaces Thymine (figure 3).

The double helix and template-directed nucleic acid synthesis
Eukaryotic DNA exists, in its resting state, as two strands of DNA running antiparallel to each other as a double helix (figure 4). The two strands are held together by specific hydrogen bond pairing that form between A and T (2 hydrogen bonds) or between G and C (3 bonds).

Because the A:T and C:G pairings are specific, the two strands are complementary to each other; opposing strands carry redundant information. These rules of sequence complementarity provide the basis for faithful DNA replication and DNA to RNA transcription. Single-stranded DNA serves as a template for synthesis of the complementary strand (figure 5).

Mitosis and DNA replication
Mitosis is the stage of the somatic cell cycle during which cells divide. The goal of mitosis is to produce clonal daughter cells from the dividing mother cell; that is, to produce two progeny of identical structure and function (figure 6).

Essential to this process is the need to maintain normal DNA content and sequence among daughter cells. This is facilitated by DNA synthesis during the S phase of cell growth (figure 7).

DNA replication begins with the separation of DNA strands (denaturation), a process facilitated by DNA helicase (figure 5). The weakness of the hydrogen bonds between strands allows denaturation to occur at physiologic temperatures. DNA polymerase binds to the 3' end of single strand template DNA and proceeds 3' to 5' along the strand, adding additional nucleic acids complementary to the template. The nascent DNA chain grows 5' to 3' manner, with DNA polymerase catalyzing the formation of the phosphodiester bonds between the nucleic acids.

Both strands of the unwinding double helix serve as template. The orientation of one strand (3' to 5') is amenable to continuous DNA replication in a 5' to 3' direction. This strand is referred to as the leading strand. The other strand is termed the lagging strand. Its replication is discontinuous as the replication apparatus must await exposure of the 3' portion of the strand. Okazaki fragments are the newly synthesized discontinuous DNA segments that are then joined together by DNA ligase.

In general, DNA replication is semi-conservative, in that each daughter molecule contains one old and one newly synthesized strand.

DNA replication occurs with high fidelity, with estimated error rates of 10(-9) to 10(-11) per incorporated nucleotide in eukaryotes. These low rates reflect the low intrinsic DNA polymerase error rates of10(-5), and 3' to 5' exonuclease and cellular proof-reading machinery.

Despite this high degree of accuracy, both germline and somatic sequence errors (random mutations) are inevitable due to the magnitude of DNA sequence replicated over a lifetime (6 x 10(26) bases). The most common error type is slip-mispairing, resulting in single nucleotide substitution or insertion deletion polymorphism. (See "Overview of genetic variation".)

Meiosis and sustained genetic diversity
Meiosis is the process of cell division for the purpose of gamete formation (figure 8).

The goals of meiosis are distinctly different from those of mitosis. The goal of mitosis is to generate cellular replicates (clones) with identical genetic information in diploid form and identical function. In contrast, the goal of meiosis is to generate four genetically distinct haploid cells that can participate in fertilization.

To achieve these goals, meiosis differs from mitosis in the following ways:


 * Two sets of cell division occur, resulting in four haploid cells.
 * Mandatory exchange of chromosomal information occurs between chromosome pairs (maternal and paternal chromosomes) through the process of recombination (figure 9). This recombination results in swapping of homologous chromosomal segments and the creation of novel haplotypes that are a chimera of genetic information from both parental chromosomes. A haplotype refers to the specific physical combination of alleles on a chromosome.
 * Chromosomes segregate independently from each other during both sets of cell division. As a result, this independent assortment yields four haploid genomes that each contain a mix of maternal and paternal genomes.
 * The combined processes of recombination and independent assortment assures that each gamete represents a unique haploid genome, different from that of the individual in which they are formed, and different from those of the individual's parents.

Chromosomal recombination and independent assortment are the primary forces of increasing genetic diversity during reproduction.

Errors during meiosis can introduce structural genetic variation. (See "Overview of genetic variation".)


 * Cross-over errors due to non-homologous recombination can lead to creation of large scale gene duplication or deletions, or more complex chromosomal rearrangements. Non homologous recombination refers to recombination that occurs between two DNA segments that are not perfectly aligned (as an example, the exchange between chromosome 9 and chromosome 22 that results in the 9.22 translocation of chronic myeloid leukemia).
 * Segregation errors due to nondysjunction can lead to numerical aberrations, such as monosomies or trisomies. Nondysjunction describes an error in meiosis in which sister chromosomes segregate together into one gamete, rather than each segregating into their own gametes.

RNA transcription
Template-directed synthesis also facilitates RNA synthesis, though RNA transcription differs from DNA replication in several fundamental ways:


 * Unlike DNA replication, which is global and simultaneous across the genome, occurring during the S phase of the cell cycle, RNA transcription is a local and contextual process, occurring at distinct genomic positions at defined times, in response to various triggers.
 * RNA transcription is regulated through the coordinated actions of cellular machinery that control DNA unfolding and chromatin activation, transcription factor binding, and regulation of mRNA metabolism.
 * RNA transcription is initiated by the binding of specific transcription factors to DNA sequence motifs flanking the gene of interest. The binding of these factors initiates a cascade of programmed protein recruitment ultimately leading to the binding of RNA polymerase to the 5' end of the gene DNA sequence. (See "Overview of transcription factors".)
 * Like DNA polymerase, RNA polymerase catalyzes RNA chain elongation by reading the DNA template and driving the formation of 3' to 5' phosphodiesterase bonds. However, RNA polymerase substitutes uracil for thymidine moieties.

Post-transcription modification of mRNA
Prior to the process of translation (formation of specific proteins from an RNA template), premature mRNA (pre-mRNA) undergoes several modifications: splicing, capping, and polyadenylation (figure 10).

Splicing
RNA transcription is a continuous process, yet the DNA sequence that codes for protein (exon sequence) is often interrupted by intervening noncoding sequences (introns). These intron segments must be removed (spliced) from the RNA prior to translation.

The splicing process can lead to production of several different proteins from one DNA locus, and also may result in functional mutations. The following factors are involved:


 * Spliceosomes are enzymatic ribonucleoprotein complexes that remove introns from the primary RNA transcript (pre-mRNA) to produce mature mRNA.
 * Intron boundaries are marked by conserved splice donor and splice acceptor sites. These provide sequence recognition sites for the spliceosomes. Sequence mutations that alter splice sites can impair normal splicing and are a common form of functional variation implicated in disease.
 * Tissue-specific splicing is regulated by binding of enhancer and suppressor proteins to exonic sequences. Differential exon splicing leads to formation of similar but unique mRNA sequences known as isoforms.
 * Differential splicing of coding exons results in differences in the ultimate protein structure. Hence, discrete DNA loci often encode more than one protein (figure 11).

Capping
To improve transcript stability, the 5' end of mRNA is modified by the linking of an inverted methylated guanosine (m7G). The cap prevents 5' binding to other nucleic acid chains. Other functions include protection from exonucleases, initiation of translation through ribosomal binding, and mRNA translocation from nucleus to cytoplasm.

Polyadenylation
A long tail of adenine molecules is added to the 3' end of virtually all transcripts. In addition to increasing transcript stability, the polyA tail is needed for proper tertiary structure formation and subsequent initiation of translation.

Translation
Mature messenger RNA serves as template to direct the synthesis of proteins. Like DNA and RNA, protein is composed of modular units (amino acids) linked together in a chain. The protein backbone is composed of aminocarboxyl moieties linked together, with the chemical characteristics of each amino acid determined by 1 of 22 side chains (figure 12). The mature mRNA encodes the linear combination of amino acids.

Proteins differ from each other by their polypeptide sequence. The collective interactions of the various side chains confer tertiary polypeptide structure, which in turn confers distinct functions.

mRNA is transported from the nucleus to the endoplasmic reticulum in the cytoplasm. The endoplasmic reticulum is composed of ribosomes, complex ribonucleoprotein structures that include the enzymatic machinery for protein synthesis. Ribosomal proteins bind the 5' end of mRNA to initiate translation.

Protein synthesis is template directed. Base sequences in mRNA specify amino acid sequence in the protein. The specific amino acid sequence is decoded by reading three consecutive bases. These base triplets are known as codons.


 * There are 64 codons (3 base positions, 4 possible bases). These correspond to 22 amino acids. Most amino acids can be specified by more than one codon (figure 13).
 * Codons differing only at the third base position typically code for the same amino acid or amino acids of similar chemical characteristic.
 * Three codons (UGA, UAA, and UAG) do not code for amino acids.
 * By convention, the codons are given as the sequence of mRNA, not the sequence of the complementary DNA strand. The nucleic acid sequence is given 5' to 3' and the protein sequence is given N-terminal- to C-terminal. These directions correspond to the direction of synthesis.

RNA translation is initiated when the ribosomal unit recognizes the initiation codon (typically ATG, coding for methionine). The ribosomal unit exposes the ATG codon, enabling binding of transfer RNA (tRNA). tRNAs are RNAs which bind specific codons.

Each tRNA has a complementary 3-base sequence (the anti-codon) corresponding to the 3-based codon encoding a specific amino acid. The methionine-bound tRNA binds to the initiation codon, facilitating 3' advancement of the ribosomal machinery, in turn exposing the adjacent 3-base codon for recognition by the appropriate anti-codon tRNA, and approximation of the corresponding amino acid residue with the methionine. This process is repeated with sequential amino acid incorporation into the growing peptide chain (chain elongation) (figure 14). Translation is terminated once one of the three stop codons (UGA, UAA, or UAG) is reached, whereby no additional amino acid is added.

The processive movement of the ribosome along the mRNA molecule allows multiple ribosomes to simultaneously synthesize multiple copies of a protein from a single mRNA molecule. This can be observed microscopically by the presence of polyribosomes or polysomes, which represent tight spatial arrays of ribosomes translating a single mRNA molecule.

Post-translation modification
Proteins often undergo additional modifications following translation. Possible protein modifications include:
 * Activation through protease cleavage of upstream leader sequences (eg, the cleavage of proinsulin to active insulin)
 * Regulatory modification (eg, glycosylation, carboxylation, phosphorylation, and oxidation)
 * Structural modifications (eg, attachment of membrane anchoring molecules).

Beyond the Central Dogma
The general principles regarding the flow of biologic information are reflected in most biologic processes. However, exceptions to the central dogma of molecular biology abound.
 * Not all functional RNA codes for protein. Many DNA sequences that are actively transcribed to RNA do not code for protein. In addition to mRNA and tRNA discussed above, other types of RNA include:


 * Ribosomal RNA (rRNA). rRNA may combine with ribosomal proteins to facilitate translation.
 * Small RNA segments combine with splicing proteins to form the spliceosome.
 * Small nuclear RNA (snRNA) molecules regulate diverse gene regulatory machinery.
 * Some double stranded RNAs, known as short-interfering RNA (siRNA) direct targeted degradation of homologous mRNA molecules, inhibiting their translation into proteins.


 * Many viruses store their genetic information as RNA rather than as DNA. Several different mechanisms have been incorporated into viral life cycles to accommodate this difference from the biology of their host cells.
 * Retroviruses utilize reverse transcription to integrate into the host genome following infection . In these instances, the flow of genetic information is from RNA to DNA, then back to RNA prior to translation.
 * The mitochondrial genetic code differs from the universal genetic code in that UGA becomes a tryptophan codon, rather than a termination codon.
 * Epigenetic modifications, including DNA methylation and genetic imprinting effects, are modifications that are transmissible from parent to offspring that are not reflected in DNA sequence. These modifications impart non-sex linked parent-of-origin effects, non-genetic familial clustering, and age-related or environmentally induced, non-mutation based changes in gene function.

Implication of the Central Dogma to medicine
The importance of specific base pairing for the development of a normal organism and/or the maintenance of health cannot be overstated. The mechanisms of template-directed replication and transcription allow preservation of genetic information that encodes for functional proteins.

Errors in these processes, as well as intrinsic properties of this molecular machinery, have direct implications for the practice of medicine. As examples:


 * Errors in replication account for mutations that cause a wide array of diseases, including inherited disorders and malignancies.
 * Divergence between humans and bacteria in the transcriptional and translational enzymatic machinery provides the molecular targets for an array of antibiotics that are lethal to bacteria but harmless to humans.
 * Sequence-specific DNA and RNA hybridization methods, predicated on the specificity of base pairing, is central to many routine laboratory methods. Such methods include the polymerase chain reaction (PCR), microarray hybridization for measurement of gene expression profiles, genome-wide genotyping using single nucleotide polymorphisms (SNPs), and characterization of structural genetic variation   . These techniques are being integrated into clinical laboratory practice at an ever-increasing pace.

Summary

 * The Central Dogma implies a unidirectional flow of information from DNA to RNA to protein. Heritable genetic information, stored as DNA, is faithfully replicated and transmitted from one generation to the next. (See 'Central dogma of molecular biology' above.)
 * DNA replication occurs prior to initiation of cell divisions (mitosis and meiosis). Complementarity of base pairings allow a single strand of DNA to serve as a template for copying a new complementary strand, and preserves the encoded genetic information. (See 'The double helix and template-directed nucleic acid synthesis' above.)
 * Meiosis is the process of generating gametes through two sets of successive cell divisions. The goal of generating diverse haploid genomes is achieved through the processes of recombination and independent assortment. (See 'Meiosis and sustained genetic diversity' above.)
 * Copying DNA into RNA, through the process of transcription, activates genes. There are several important structural differences between DNA and RNA:


 * In DNA, the sugar is deoxyribose, while it is ribose in RNA. DNA is therefore more stable.
 * In DNA, thymine is the pyrimidine complementary to adenine, but uracil replaces thymine in RNA.
 * DNA is double-stranded and RNA is single stranded.

(See 'RNA transcription' above.)


 * Post-transcriptional modifications include intron splicing, mRNA capping, and polyadenylation. (See 'Post-transcription modification of mRNA' above.)
 * Translation is the process of template-directed peptide synthesis. Post-translational modifications confer additional functional properties to mature peptide sequences. (See 'Translation' above and 'Post-translation modification' above.)