Trinucleotide repeat expansion

A trinucleotide repeat expansion, also known as a triplet repeat expansion, is the DNA mutation responsible for causing any type of disorder categorized as a trinucleotide repeat disorder. These are labelled in dynamical genetics as dynamic mutations. Triplet expansion is caused by slippage during DNA replication, also known as "copy choice" DNA replication. Due to the repetitive nature of the DNA sequence in these regions, 'loop out' structures may form during DNA replication while maintaining complementary base pairing between the parent strand and daughter strand being synthesized. If the loop out structure is formed from the sequence on the daughter strand this will result in an increase in the number of repeats. However, if the loop out structure is formed on the parent strand, a decrease in the number of repeats occurs. It appears that expansion of these repeats is more common than reduction. Generally, the larger the expansion the more likely they are to cause disease or increase the severity of disease. Other proposed mechanisms for expansion and reduction involve the interaction of RNA and DNA molecules.

In addition to occurring during DNA replication, trinucleotide repeat expansion can also occur during DNA repair. When a DNA trinucleotide repeat sequence is damaged, it may be repaired by processes such as homologous recombination, non-homologous end joining, mismatch repair or base excision repair. Each of these processes involves a DNA synthesis step in which strand slippage might occur leading to trinucleotide repeat expansion.

The number of trinucleotide repeats appears to predict the progression, severity, and age of onset of Huntington's disease and similar trinucleotide repeat disorders. Other human diseases in which triplet repeat expansion occurs are fragile X syndrome, several spinocerebellar ataxias, myotonic dystrophy and Friedreich's ataxia.

History
The first documentation of anticipation in genetic disorders was in the 1800s. However, from the eyes of geneticists, this relationship was disregarded and attributed to ascertainment bias; because of this, it took almost 200 years for a link between onset of disease and trinucleotide repeats (TNR) to be acknowledged.

The following findings of served as support for TNR's link to onset of disease; the detection of various repeats within these diseases demonstrated this relationship.


 * In 1991, for fragile X syndrome, the fragile X mental retardation 1 (FMR-1) gene was found to contain a CGG expansion in its 5' untranslated region (UTR). In addition, a CAG expansion was located in X-linked spinal and bulbar muscular atrophy (SBMA) sequences. SMBA is the first "CAG / polygutamine" disease, which is a subcategory of repeat disorders.
 * In 1992, for myotonic dystrophy type 1 (DM1), CTG expansion was found in the myotonic dystrophy protein kinase (DMPK) 3' UTR.
 * In 1993, for Huntington's disease (HD), a longer-than-usual CAG repeat with was found in the exon 1 coding sequence.

Because of these discoveries, ideas involving anticipation in disease began to develop, and curiosity formed about how the causes could be related to TNRs. After the breakthroughs, the four mechanisms for TNRs were determined, and more types of repeats were identified as well. Repeat composition and location are used to determine the mechanism of a given expansion. Onwards from 1995, it was also possible to observe the formation of hairpins in triplet repeats, which consisted of repeating CG pairs and a mismatch.

During the decade after evidence that linked TNR to onset of disease was found, focus was placed on studying repeat length and dynamics on diseases, as well as investigating the mechanism behind parent-child disease inheritance. Research has shown that there is a clear inverse relationship between the length of the repeats in parents and the age of disease onset in children; therefore, the lengths of TNRs are used to predict age of disease onset as well as outcome in clinical diagnosis. In addition to this finding, another aspect of the diseases, the high variability of onset, was revealed. Although the onset of HD could be predicted by examining TNR length inheritance, the onset could vary up to fourfold depending on the patient, leading to the possibility of existence of age-modifying factors for disease onset; there were notable efforts in this search. Currently, CAG repeat length is considered the biggest onset age modifier for TNR diseases.

Detection of TNRs was made difficult by limited technology and methods early on, and years passed before the development of sufficient ways to measure the repeats. When PCR was first attempted in the detection of TNRs, multiple band artifacts were prevalent in the results, and this made recognition of TNRs troublesome; at the time, debate centered around whether disease was brought on by smaller amounts of short expansions or a small amount of long expansions. Since then, accurate methods have been established over the years. Together, the following clinically necessary protocols have 99% accuracy in measuring TNRs.


 * Small-pool polymerase chain reaction (SP-PCR) allows for recognition of repeat changes, and originated from the growing necessity for a method that would provide more accurate measurement of TNRs. It has been useful for examining how TNRs vary between human and mice in blood, sperm, and somatic cells.
 * Southern blots are used to measure CGG repeats because CG-rich regions limit polymerase movement in PCR.

Overall structure
These repetitive sequences lead to instability amongst the DNA strands after reaching a certain threshold number of repeats, which can result in DNA slippage during replication. The most common and well-known triplet repeats are CAG, GCG, CTG, CGG, and GAA. During DNA replication, the strand being synthesized can misalign with its template strand due to the dynamic nature and flexibility of these triplet repeats. This slippage allows for the strand to find a stable intermediate amongst itself through base pairing, forming a secondary structure other than a duplex.

Location
In terms of location, these triplet repeats can be found in both coding and non-coding regions. CAG and GCN repeats, which lead to polyglutamine and polyalanine tracts respectively, are normally found in the coding regions. At the 5' untranslated region, CGG and CAG repeats are found and responsible for fragile X syndrome and spinocerebellar ataxia 12. At the 3' untranslated region, CTG repeats are found, while GAA repeats are located in the intron region. Other disease-causing repeats, but not triplet repeats, have been located in the promoter region. Once the number of repeats exceeds normal levels, Triplet Repeat Expansions (TRE) become more likely and the number of triplet repeats can typically increase to around 100 in coding regions and up to thousands in non-coding regions. This difference is due to overexpression of glutamine and alanine, which is selected against due to cell toxicity.

Intermediates
Depending on the sequence of the repeat, at least three intermediates with different secondary structures are known to form. A CGG repeat will form a G-quadruplex due to Hoogsteen base pairing, while a GAA repeat forms a triplex due to negative supercoiling. CAG, CTG, and CGG repeats form a hairpin. After the hairpin forms, the primer realigns with the 3' end of the newly synthesized strand and continues the synthesis, leading to triplet repeat expansion. The structure of the hairpin is based on a stem and a loop that contains both Watson-Crick base pairs and mismatched pairs. In CTG and CAG repeats, the number of nucleotides present in the loop depends on if the number of triplet repeats is odd or even. An even number of repeats forms a tetraloop structure, while an odd number leads to the formation of a triloop.

Threshold
In trinucleotide repeat expansion there is a certain threshold or maximum amount of repeats that can occur before a sequence becomes unstable. Once this threshold is reached the repeats will start to rapidly expand causing longer and longer expansions in future generations. Once it hits this minimum allele size which is normally around 30-40 repeats, diseases and instability can be contracted, but if the number of repeats found within a sequence are below the threshold it will remain relatively stable. There is still not enough research found to understand the molecular nature that causes thresholds but researchers are continuing to study that the possibility could lie with the formation of the secondary structure when these repeats occur. It was found that diseases associated with trinucleotide repeat expansions contained secondary structures with hairpins, triplexes, and slipped-strand duplexes. These observations have led to the hypothesis that the threshold is determined by the number of repeats that must occur to stabilize the formation of these unwanted secondary structures, due to the fact that when these structures form there is an increased number of mutations that will form in the sequence resulting in more trinucleotide expansion.

Parental influence
Research suggests that there is a direct, important correlation between the sex of the parent that transmits the mutation and the degree and phenotype of disorder in the child., The degree of repeat expansion and whether or not an expansion will occur has been directly linked to the sex of the transmitting parent in both non-coding and coding trinucleotide repeat disorders. For example, research regarding the correlation between Huntington's Disease CAG trinucleotide repeat and parental transmission has found that there is a strong correlation between the two with differences in maternal and paternal transmission. Maternal transmission has been observed to only consist of an increase in repeat units of 1 while the paternal transmission is typically anywhere from 3 to 9 extra repeats. Paternal transmission is almost always responsible for large repeat transmission resulting in the early onset of Huntington's Disease while maternal transmission results in affected individuals experiencing symptom onset mirroring that of their mother., While this transmission of a trinucleotide repeat expansion is regarded to be a result of "meiotic instability", the degree to which meiosis plays a role in this process and the mechanism is not clear and numerous other processes are predicted to simultaneously play a role in this process.

Unequal homologous exchange
One proposed but highly unlikely mechanism that plays a role in trinucleotide expansion transmission occurs during meiotic or mitotic recombination. It is suggested that during these processes it is possible for a homologous repeat misalignment, commonly known for causing alpha-globin locus deletions, causes the meiotic instability of a trinucleotide repeat expansion. This process is unlikely to contribute to the transmission and presence of trinucleotide repeat expansions due to differences in expansion mechanisms. Trinucleotide repeat expansions typically favor expansions of the CAG region but, in order for the unequal homologous exchange to be a plausible suggestion, these repeats would have to go through expansion and contraction events at the same time. In addition, numerous diseases that result from transmitted trinucleotide repeat expansions, such as Fragile X syndrome, involve unstable trinucleotide repeats on the X chromosome that cannot be explained by meiotic recombination. Research has shown that although unequal homologous recombination is unlikely to be the sole cause of transmitted trinucleotide repeat expansions, this homologous recombination likely plays a minor role in the length of some trinucleotide repeat expansions.

DNA replication
DNA replication errors are predicted to be the main perpetrator of trinucleotide repeat expansion transmission in many predicted models due to the difficulty of Trinucleotide Repeat Expansion (TRE). TREs have been shown to occur during DNA replication in both in vitro and in vivo studies, allowing for these long tracts of triplet repeats to assemble rapidly in different mechanisms that can result in either small scale or large scale expansions.

Small scale expansions
These expansions can occur through either strand slippage or flap ligation. Okazaki fragments are a key element of the proposed error in DNA replication. It is suggested that the small size of Okazaki fragments, typically between 150 and 200 nucleotides long, makes them more likely to fall off or "slip" off the lagging strand, which creates room for trinucleotide repeats to attach to the lagging strand copy. In addition to this possibility of trinucleotide repeat expansion changes occurring due to slippage of Okazaki fragments, the ability of CG-rich trinucleotide repeat expansion sequences to form a special hairpin, toroid, and triplex DNA structures contributes to this model, suggesting error occurs during DNA replication. Hairpin structures can form as a result of the freedom of the lagging strand during DNA replication and are typically observed to form in extremely long trinucleotide repeat sequences. Research has found that this hairpin formation depends on the orientation of the trinucleotide repeats within each CAG/CTG trinucleotide strand. Strands that have duplex formation by CTG repeats in the leading strand are observed to result in extra repeats, while those without CTG repeats in the leading strand result in repeat deletions. These intermediates can pause activity of the replication fork based on their interaction with DNA polymerases through strand slippage. Contractions occur when the replication fork skips over the intermediate on the Okazaki fragment. Expansions occur when the fork reverses and restarts, which forms a chicken-foot structure. This structure results in the unstable intermediate forming on the nascent leading strand, leading to further TRE. Furthermore, this intermediate can avoid mismatch repair due to its affinity for the MSH-2-MSH3 complex, which stabilizes the hairpin instead of repairing it. In non-dividing cells, a process called flap-ligation can be responsible for TRE. 8-oxo-guanine DNA glycosylase removes a guanine and forms a nick in the sequence. The coding strand then forms a flap due to displacement, which prevents removal by an endonuclease. When the repair process finishes for either mechanism, the length of the expansion is equivalent to the number of triplet repeats involved in the formation of the hairpin intermediate.

Large scale expansions
Two mechanisms have been proposed for large scale repeats: template switching and break-induced replication.

Template switching, a mechanism for large scale GAA repeats that can double the number of triplet repeats, has been proposed. GAA repeats expand when their repeat length is greater than the Okazaki fragment's length. These repeats are involved in the stalling of the replication fork as these repeats form a triplex when the 5' flap of  TTC repeats fold back. Okazaki fragment synthesis continues when the template is switched to the nascent leading strand. The Okazaki fragment eventually ligates back to the 5' flap, which results in TRE.

A different mechanism, based on break-induced replication, has been proposed for large scale CAG repeats and can also occur in non-dividing cells. At first, this mechanism follows the same process as the small scale strand slippage mechanism until replication fork reversal. An endonuclease then cleaves the chicken-foot structure, which results in a one-ended double strand break. The CAG repeat of this broken daughter strand forms a hairpin and invades the CAG strand on the sister chromatid, which results in expansion of this repeat in a migrating D-loop DNA synthesis. This synthesis continues until it reaches the replication fork and is cleaved, which results in an expanded sister chromatid.

Background
Fragile X syndrome is the second most common form of intellectual disability affecting 1 in 2,000-4,000 women and 1 in 4,000-8,000 men, women being twice as likely to inherit this disability due to their XX chromosomes. This disability arises from a mutation at the end of the X chromosome in the FMR1 gene (Fragile X Mental Retardation Gene) which produces a protein essential for brain development called FMRP. Individuals with Fragile X syndrome experience a variety of symptoms at varying degrees that depend on gender and mutation degree such as attention deficit disorders, irritability, stimuli sensitivity, various anxiety disorders, depression, and/or aggressive behavior. Some treatments for these symptoms seen in individuals with Fragile X syndrome include SSRI's, antipsychotic medications, stimulants, folic acid, and mood stabilizers.

Genetic causation
Sizable expansions of a CGG trinucleotide element are the singular cause of the male genetic disorder called Fragile X Syndrome. In males without Fragile X Syndrome, the CGG repeat number ranges from 53 to 200 while those affected have greater than 200 repeats of this trinucleotide sequence located at the end of the X chromosome on band Xq28.3.1. Carriers that have repeats falling within the 53 to 200 repeat range are said to have "premutation alleles", as the alleles within this range approach 200, the likelihood of expansion to a full mutation increases, and the mRNA levels are elevated five-fold. Research has shown that individuals with premutation alleles in the range of 59-69 repeats have about a 30% risk of developing full mutation and compared to those in the high range of ≥ 90 repeats. Fragile X syndrome carriers (those that fall within the premutation range) typically have unmethylated alleles, normal phenotype, and normal levels of FMR1 mRNA and FMRP protein. Fragile X Syndrome men possess alleles in the full mutation range (>200 repeats) with FMRP protein levels much lower than normal and experience hypermethylation of the promoter region of the FMR1 gene. Some men with alleles in the full mutation range experience partial or no methylation which results in only slightly abnormal phenotypes due to only slight down-regulation of FMR1 gene transcription. Unmethylated and partially methylated alleles in the mutation range experience increased and normal levels of FMR1 mRNA when compared to normal controls. In contrast, when unmethylated alleles reach a repeat number of approximately 300, the transcription levels are relatively unaffected and operate at normal levels; the transcription levels of repeats greater than 300 is currently unknown.

Promoter silencing
The CGG trinucleotide repeat expansion is present within the FMR1 mRNA and its interactions are responsible for promoter silencing. The CGG trinucleotide expansion resides within the 5' untranslated region of the mRNA, which undergoes hybridization to form a complementary CGG repeat portion. The binding of this genomic repeat to the mRNA results in silencing of the promoter. Beyond this point, the mechanism of promoter silencing is unknown and still being further investigated.

Background
Huntington's disease (HD) is a dominantly, paternally transmitted neurological disorder that affects 1 in 15,000-20,000 people in many Western Populations. HD involves the basal ganglia and the cerebral cortex and manifests as symptoms such as cognitive, motor, and/or psychiatric impairment.

Causation
This autosomal dominant disorder results from the expansions of a trinucleotide repeat which involves CAG in exon 1 of the IT15 gene. The majority of all juvenile HD cases stem from the transmission of a high CAG trinucleotide repeat number that is a result of paternal gametogenesis. While an individual without HD has a number of CAG repeats that fall within a range between 9 and 37, an individual with HD has CAG is typically found to have repeats in a range between 37 and 102. Research has shown an inverse relationship between the number of trinucleotide repeats and age of onset, however, no relationship between trinucleotide repeat numbers and rate of HD progression and/or effected individual's body weight has been observed. Severity of functional decline has been found to be similar across a wide range of individuals with varying numbers of CAG repeats and differing ages of onset, therefore, it is suggested that the rate of disease progression is also linked to factors other than the CAG repeat such as environmental and/or genetic factors.

Background
Myotonic dystrophy is a rare muscular disorder in which numerous bodily systems are affected. There are four forms of Myotonic Dystrophy: mild phenotype and late-onset, onset in adolescence/young adulthood, early childhood featuring only learning disabilities, and a congenital form. Individuals with Myotonic Dystrophy experience severe, debilitating physical symptoms such as muscle weakness, heartbeat issues, and difficulty breathing that can be improved through treatment to maximize patients' mobility and everyday activity to alleviate some stress of their caretakers. The muscles of individuals with Myotonic Dystrophy feature an increase of type 1 fibers as well as an increased deterioration of these type 1 fibers. In addition to these physical ailments, individuals with Myotonic Dystrophy have been found to experience varying internalized disorders such as anxiety and mood disorders as well as cognitive delays, attention deficit disorders, autism spectrum disorders, lower IQ's, and visual-spatial difficulties. Research has shown that there is a direct correlation between expansion repeat number, IQ, and an individual's degree of visual-spatial impairment.

Causation
Myotonic dystrophy results from a (CTG)n trinucleotide repeat expansion that resides in a 3' untranslated region of a serine/threonine kinase coding transcript. This (CTG)n trinucleotide repeat is located within leukocytes; the length of the repeat and the age of the individual have been found to be directly related to disease progression and type 1 muscle fiber predominance. Age and (CTG)n length only have small correlation coefficients to disease progression, research suggests that various other factors play a role in disease progression such as changes in signal transduction pathway, somatic expression, and cell heterogeneity in (CTG)n repeats.

Background
Friedreich's ataxia is a progressive neurological disorder. Individuals experience gait and speech disturbances due to degeneration of the spinal cord and peripheral nerves. Other symptoms may include cardiac complications and diabetes. Typical age at symptom onset is 5–15, with symptoms progressively getting worse over time.

Causation
Friedreich's ataxia is an autosomal recessive disorder cause by a GAA expansion in the intron of the FXN gene. This gene codes for the protein frataxin, a mitochondrial protein involved in iron homeostasis. The mutation impairs transcription of the protein, so affected cells produce only 5-10% of the frataxin of healthy cells. This leads to iron accumulation in the mitochondria, and makes cells vulnerable to oxidative damage. Research shows that GAA repeat length is correlated with disease severity.

Fragile X syndrome
The precise timing of TNR occurrence varies by disease. Although the exact timing for FXS is not certain, research has suggested that the earliest CGG expansions for this disorder are seen in primary oocytes. It has been proposed that the repeat expansion happens in the maternal oocyte during meiotic cell cycle arrest in prophase I, however the mechanism remains nebulous. Maternally inherited premutation alleles may expand into full mutation alleles (greater than 200 repeats), resulting in decreased production of the FMR-1 gene product FMRP and causing fragile X mental retardation syndrome. For females, the large repeat expansions are based upon repair, while for males, the shortening of long repeat expansions is due to replication; therefore, their sperm lack these repeats, and paternal inheritance of long repeat expansions does not occur. Between weeks 13 and 17 of human fetal development, the large CGG repeats are shortened.

Myotonic dystrophy type 1
Many similarities can be drawn between DM1 and FXS involving aspects of mutation. Full maternal inheritance is present within DM1, repeat expansion length is linked to maternal age and the earliest instance of expansions is seen in the two-cell stage of preimplantation embryos. There is a positive correlation between male inheritance and allele length. A study of mice found the exact timing of CTG repeat expansion to be during development of spermatogonia. In DM1 and FXS, it is hypothesized that expansion of TNRs occurs by means of multiple missteps by DNA polymerase in replication. An inability of DNA polymerase to properly move across the TNR may cause transactivation of translesion polymerases (TLPs), which will attempt to complete the replication process and overcome the block. It is understood that as the DNA polymerase fails in this way, the resulting single-stranded loops left behind in the template strand undergo deletion, affecting TNR length. This process leaves the potential for TNR expansions to occur.

Huntington's disease
In Huntington's disease (HD), the exact timing has not been determined; however there are a number of proposed points during germ cell development at which expansion is thought to occur.


 * In four HD samples examined, CAG repeat expansion lengths were more variable in mature sperm than that of sperm in development in the testes, leading to the conclusion that repeat expansions had a likelihood of occurring later in sperm development.
 * Repeat expansions have been observed to occur before the completion of meiosis in humans, specifically the first division.
 * In germ cells undergoing differentiation, evidence suggests it is possible for expansions to generate after the completion of meiosis as well, as larger HD mutations have been found in postmeiotic cells.

Spinocerebellar ataxia type 1
Spinocerebellar ataxia type 1 (SCA1) CAG repeats are most often passed down through paternal inheritance and similarities can be seen with HD. The tract size for offspring of mothers with these repeats does not display any degree of change. Because TNR instability is not present in young female mice, and female SCA1 patient age and instability are directly related, expansions must occur in inactive oocytes. A trend has seemed to emerge of larger expansions occurring in cells inactive in division and smaller expansions occurring in actively dividing or nondividing cells.

Therapeutics
Trinucleotide repeat expansion, is a DNA mutation that is responsible for causing any type of disorder classified as a trinucleotide repeat disorder. These disorders are progressive and affect the sequences of the human genome, frequently within the nervous system. So far the available therapeutics only have modest results at best with emphasis on the research and studying of genomic manipulation. The most advanced available therapies aim to target mutated gene expression by using antisense oligonucleotides (ASO) or RNA interference (RNAi) to target the messenger RNA (mRNA). While solutions for the interventions of this disease is a priority, RNAi and ASO have only reached clinical trial stages.

RNA interference (RNAi)
RNA interference is a mechanism that can be used to silence the expression of genes, RNAi is a naturally occurring process that is leveraged using synthetic small interfering RNAs (siRNAs) that are used to change the action and duration of the natural RNAi process. Another synthetic RNA is the short hairpin RNAs (shRNA) these can also be used to monitor the action and predictability of the RNAi process.

RNAi begins with RNase Dicer cleaving a 21-25 nucleotide long stand of double stranded RNA substrates into small fragments. This process results in the creation of the siRNA duplexes that will be used by the complex RNA induced silencing complex (RISC). The RISC contains the antisense that will bind to complementary mRNA strands, once they are bound they are cleaved by the protein found within the RISC complex called Argonaute 2 (Ago2) between the bases 10 and 11 relative to the 5' end. Before the cleavage of the mRNA strand the double stranded antisense of the siRNA is also cleaved by the Ago2 complex, this leaves a single stranded guide within the RISC compound that will be used to find the desired mRNA strand resulting in this process to have specificity. Some problems that may occur is if the guide single strand siRNA within the RISC complex may become unstable when cleaved and begin to unwind, resulting in binding to an unfavorable mRNA strand. The perfect complementary guides for the targeted RNAs are easily recognized and will be cleaved within the RISC complex; if there is only partial complementary pairing between the guide strand and the targeted mRNA may cause the incorrect translation or destabilization at the target sites.

Antisense oligonucleotides
Antisense oligonucleotides (ASOs) are small strand single stranded oligodeoxynucleotides approximately 15-20 nucleic acids in length that can alter the expression of a protein. The goal of using these antisense oligonucleotides are the decrease in protein expression of a specific target usually by the inhibition of the RNase H endonuclease, as well as inhibition of the 5' cap formation or alteration of the splicing process. In the native state ASOs are rapidly digested, this requires the use of phosphorylation order for the ASO to go through the cell membranes.

Despite the obvious benefits that antisense therapeutics can bring to the world with their ability to silence neural disease, there are many issues with the development of this therapy. One problem is the ASOs are highly susceptible to degradation by the nucleases within the body. This results in a high amount of chemical modification when altering the chemistry to allow for the nucleases to surpass the degradation of these synthetic nucleic acids. Native ASOs have a very short half-life even before being filtered throughout the body especially in the kidney and with the a high negative charge makes the crossing through the vascular system or membranes very difficult when trying to reach the targeted DNA or mRNA strands. With all these barriers, the chemical modifications may lead to devastating effects when being introduced into the body making each problem develop more and more side effects.

The synthetic oligonucleotides are negatively charged molecules that are chemically modified in order for the molecule to regulate the gene expression within the cell. Some issues that come about this process is the toxicity and variability that can come about with chemical modification. The goal of the ASO is to modulate the gene expression through proteins which can be done in 2 complex ways; a)the RNase H-dependent oligonucleotides, which induce the degradation of mRNA, and (b) the steric-blocker oligonucleotides, which physically prevent or inhibit the progression of splicing or the translational machinery. The majority of investigated ASOs utilize the first mechanism with the Rnase H  enzyme that hydrolyzes an RNA strand, when this enzyme is assisted using the oligonucleotides the reduction of RNA expression is efficiently reduced by 80-95% and can still inhibit expression on any region of the mRNA.