Ribose-seq

Ribose-seq is a mapping technique used in genetics research to determine the full profile of embedded ribonucleotides, specifically ribonucleoside monophosphates (rNMPs), in genomic DNA. Embedded ribonucleotides are thought to be the most common alteration to DNA in cells, and their presence in genomic DNA can affect genome stability. As recent studies have suggested that ribonucleotides in mouse DNA may affect disease pathology, ribonucleotide incorporation in genomic DNA has become an important target of medical genetics research. Ribose-seq allows scientists to determine the precise location and type of ribonucleotides that have been incorporated into eukaryotic or prokaryotic DNA.

The technique exploits the presence of the extra hydroxyl groups (OH) found in the 2’ end of ribonucleotides, which can distort and destabilize DNA. The technique was developed through a collaboration with a group of researchers at the Georgia Institute of Technology including Francesca Storici and Kyung Duk Koh (now at the University of California San Francisco), and Jay Hesselberth at the University of Colorado Anschutz Medical School.



History
Nucleic acids are the essential macromolecules that carry genetic information in all life forms. These biopolymers consist of nucleotide monomers, which are organic molecules that consist of a phosphate group, a nitrogen-containing base, and a five-carbon sugar (ribose in ribonucleotides and deoxyribose in deoxyribonucleotides). While ribonucleotides are typically located in ribonucleic acid (RNA), they can also be found in deoxyribonucleic acid (DNA). The first discovery of ribonucleoside monophosphates (rNMPs) embedded into DNA was in mitochondrial DNA from mouse and human cells in 1973 by Lawrence Grossman, Robert Watson, and Jerome Vinograd. In 2006, the presence of rNMPs were confirmed in nuclear DNA of Schizosaccharomyces pombe ("fission yeast"), and exploration in other species has continued since.

Incorporation of ribonucleotides into DNA
Multiple studies have identified mechanisms by which ribonucleotides can be added to, removed from, or generated from DNA. Primarily, ribonucleotides are incorporated into DNA during the DNA synthesis process. To initiate DNA replication, short RNA primers synthesized on complementary DNA to allow for the subsequent binding and replication action of the primary replication enzymes, DNA polymerases. Typically, these primers are then removed by nuclease enzymes, RNases H or Flap Structure-specific Endonuclease 1 (FEN1). As such, there is a fairly substantial volume of transient rNMPs that are present in DNA in the form of RNA primers and then later removed. After DNA replication is initiated, rNMPs can be fully incorporated into DNA by DNA polymerases and this represents the major mechanism of ribonucleotide incorporation in DNA with over 1 ribonucleotide per 1,000 deoxyribonucleotides incorporated. The embedded rNMPs are often targeted for removal by the Ribonucleotide Excision Repair (RER) mechanism. However, if RER is not working properly, this can become a more persistent source of embedded rNMPs in DNA. Another mechanism by which rNMPs may arise in DNA is through oxidative stress, which alters the deoxyribose unit in DNA to a ribose. While the RER pathway is the most efficient process for removing rNMPs from DNA, ribonucleotides can also be removed through the 3’-5’ exonuclease activity that is orchestrated by the proofreading activities of DNA polymerase during DNA replication. Additional rNMP removal methods include incisions that are mediated by the isomerase enzyme, topoisomerase I.

Consequences of ribonucleotides in DNA
If not removed and replaced, ribonucleotides in DNA can have structural, chemical, and functional impacts on cellular processes. As first recognized by T.N. Jasihree et al in 1993, the presence of ribonucleotides can alter the helical shape of DNA from B form to A form. This, in addition to a number of other chemical consequences, can impact the binding of proteins to DNA and have implications for processes such as chromatin dynamics, DNA replication, transcription, repair, and meiosis. Additionally, some DNA polymerases are unable to bypass embedded rNMPs during DNA replication, which is further exacerbated at sites with longer strands of rNMPs. This can cause replication stress which can potentially interfere with transcription, chromosome segregation, and future replication. Furthermore, certain byproducts or failures in the RER process can also potentially lead to DNA breaks, rearrangements, and genome instability.

Ribonucleotide mapping
By 2015, some of the mechanisms for, consequences of, and approximate rates for the incorporation of rNMPs into DNA were becoming increasingly clear. However, a technique for identifying the location of rNMPs across the genome was still lacking. The growing accessibility of next-generation sequencing technologies largely facilitated progress towards the goal of more precise identification of rNMPs throughout genomic DNA. In 2015, Kyung Duk Koh, Sathya Balachander, Jay R Hesselberth and Francesca Storici presented ribose-seq, a newly developed technique for the systematic profiling of rNMPs in DNA. They applied ribose-seq to the Saccharomyces cerevisiae genome to document the genome-wide identity, frequency, and localization of rNMPs throughout mitochondrial and nuclear DNA. These findings ultimately provided new insight into patterns of rNMP incorporations across the genome. In parallel with ribose-seq, three other techniques were developed to capture sites of rNMPs embedded in DNA: emRibo-seq, HydEn-seq, and Pu-seq. The ribose-seq procedure utilizes a number of unique features to capture and tag ribonucleotides in preparation for high-throughput sequencing. To target the rNMP sites directly, the authors utilized an alkali treatment process which denatures and cleaves the DNA at the rNMP sites. The specificity for cleavage at these sites is driven by the relative sensitivity of the molecule at this region due to the additional hyrdroxyl group present on ribonucleotides. To allow for the stability and enrichment of these small fragments and protection from T5 exonuclease digestion ahead of PCR amplification, single stranded DNA fragments - containing the rNMP of interest and a unique molecular identification number - are self-ligated to form single stranded DNA (ssDNA) circles. In order to facilitate ligation of an rNMP and a DNA end, Koh et al. harness the distinctive ligation action of Arabidopsis thaliana tRNA ligase (AtRNL) to allow for the binding of the 2’-phosphate (rNMP side) to the 5′-phosphate (DNA side). Finally, high-throughput sequencing of the entire generated library from this technique identifies the genomic coordinates of all captured embedded rNMPs. In 2019, the Storici lab presented an updated and further optimized protocol for ribose-seq. The modifications included using shorter molecular barcode adaptors, smaller fragments, altered PCR settings, and adjusted size ranges for cutting and purifying from the gel. Overall, it was suggested that the improved protocol may substantially increase the efficiency of the method for applications to large genomes.

Genomic DNA extraction
To extract genomic DNA from cells of interest, cells must first be lysed using mechanical or chemical methods. This releases the DNA from each cell’s nucleus, allowing it to be purified and quantified. A total of 10uG of DNA is required for the ribose-seq protocol.

Preparation of ribose-seq adaptor
The ribose-seq adaptor is made up of one dT (deoxyribonucleotide thymine) followed by an annealing sequence, two primer sequences, and a unique molecular identifier (UMI). To prepare the ribose-seq adaptor, the three oligonucleotides (oligos) are added to a solution and heated to 95-100°C. At this temperature, the oligo sequences anneal to each other, creating a suspension of ribose-seq adaptors. The amount of desalted double-stranded ribose-seq adaptor is then quantified. A concentration of 10 uM is needed for ribose-seq.

Fragmentation of Genomic DNA
Genomic DNA is fragmented using blunt-ended restriction enzyme digestion. Following digestion, fragmented DNA is purified and quantified. The restriction enzyme set and combination optimization tools for rNMP capture techniques (RESCOT) were developed to maximize rNMP capture rate. The resulting solution contains a population of 500- to 3,000-bp genomic fragments with an average size of ~1.5 kb. 10ug of fragmented genomic DNA is required for subsequent dA-Tailing and Adaptor Ligation.

dA-Tailing and adaptor ligation
dA molecules are bonded to the tails of each fragment, allowing the dT of the ribose-seq adaptor to bind it's corresponding dA tail in the subsequent step. Following this process, the dA-tailed DNA is purified and eluted with water. The ribose-seq adaptor, containing its randomized 8-base unique molecular identifier (UMI) and single dT overhang, is then ligated to the dA end of the target fragment by cooling the solution to 15C overnight. The sample is then purified and prepared for Alkali Treatment.

Alkali treatment
During alkali treatment, double stranded DNA (dsDNA) is treated with sodium hydroxide, a strong base, to denature the dsDNA and cleave DNA at embedded ribonucleotide sites. This generates single-stranded DNA (ssDNA) fragments with exposed two ends: one 2′,3′-cyclic phosphate end and an opposite 5′-hydroxyl end. The alkaline solution is then neutralized, and the ssDNA is purified.

Self-Ligation (Circularization)
Next, the ssDNA-containing solution is incubated with Arabidopsis thaliana tRNA ligase (AtRNL24). AtRNL aids in A. thalinana rRNA maturation by converting 2′,3′-cyclic phosphate ends of RNA to 2′-phosphate and ligates these to 5′-phosphate ends of RNA or DNA. Introducing AtRNL to the ssDNA solution catalyzes self-ligation of the ssDNA as the 2',3' cyclic phosphate ends ligate to the 5' phosphate end of the same ssDNA strand. The resulting cyclical ssDNA structures contain the targeted embedded ribonucleotide sequence, a UMI, and oligo sequences which make up the rest of the circular ssDNA structure.

Removal of linear ssDNA
The circular ssDNA structures created by self-ligation are resistant to degradation by enzyme T5 exonuclease. However, unligated, linear ssDNA is not. Therefore, treating the products and remaining DNA fragments with T5 exonuclease selectively degrades unligated, linear ssDNA strands, while leaving self-ligated ssDNA circles intact. In this way, the unwanted linear ssDNA fragments are degraded and only successfully-ligated ssDNA circles remain.

Removal of 2'Phosphate
Before PCR is possible, the 2' Phosphate groups at the ligated junctions of the ssDNA circles must be removed. In order to accomplish this, the circular ssDNA solution is treated with 2'phosphotransferase, Tpt1, removing the 2′ phosphate remaining at the ligation junction.

Polymerase Chain Reaction (PCR)
Following ribose-seq, amplification is performed on each ssDNA circle using PCR amplification. In the first round of PCR, sequencing primers are introduced to the solution and the ssDNA circles are amplified. The sequencing primers selected must be compatible with the target library of choice. SeeApplications section for details. In the second round of PCR, more specific sequencing primers are introduced, and the ssDNA circles are amplified further. Polyacrylamide Gel Electrophoresis (PAGE) is performed to confirm the presence of the ribose-seq library, and to check the positive and negative controls.

Size selection and gel purification
Gel purification is used to purify the ribose-seq library before sequencing. At this stage, fragments are selected by length to eliminate fragments that fall outside the target size range of interest.

Use for downstream applications
After the sequences of interest are captured and purified, the samples can be sequenced and used for downstream applications. SeeApplications section for details.

Applications
Ribose-seq can be applied in numerous circumstances and for many cell types of various eukaryotic species (see: Advantages and Limitations).

Initial characterization
Ribose-seq was first utilized by Koh et al. (2015) to characterize the location of rNMPs in the mitochondrial and nuclear DNA of RNase-H2-deficient (rnh201∆) budding yeast. They found that rNMPs were present at 0.449 rNMPs per kilobase (kb) in the nuclear genome and 19.5 rNMPs per kb in the mitochondrial genome. Additionally, Koh et al. (2015) were able to identify an apparent bias that exists for the incorporation of rCs and rGs into DNA. They found that in yeast cells, rCs and rGs are retained in genomic DNA at higher frequencies than rAs and rUs are, relative to the respective expected frequencies of all bases. Regardless of base type, rNMPs were often present at sites that were immediately downstream of a dA in both nuclear and mitochondrial DNA. Many additional "hotspots" for rNMP incorporation were also identified with respect to specific genomic features (e.g. at particular gene sites, strands, or chromosomal regions).

Multi-species comparisons
Expanding upon the initial characterization, Balachander et al. (2020) optimized their ribose-seq protocol and used it in conjunction with a bioinformatics tool (Ribose-Map, available at https://github.com/agombolay/ribose-map) to map rNMPs throughout the mitochondrial and nuclear DNA of multiple strains of three yeast species: Saccharomyces cerevisiae, Saccharomyces paradoxus, and Schizosaccharomyces pombe. In mitochondrial DNA (mtDNA) from each of these species and strains, it was demonstrated that the type of rNMP that is incorporated into a given genomic position is most impacted by the dNMP that is located at the immediate upstream position (-1). By using ribose-seq across multiple genomes, Balachander et al. were also able to elucidate various similarities and differences between yeast strains and species. The rNMP patterns in mtDNA of all strains showed low frequencies of rUs; however, S. paradoxus and S. cerevisiae indicate preferences for rC whereas S. pombe indicated preferences for rG. In nuclear DNA, the three yeast species were largely similar. Additionally, rNMPs were significantly enriched in short-nucleotide repeat sequences. This application of ribose-seq allowed for the characterization of rNMP incorporation patterns and the identification of potential drivers of this process within the sequence context.

Beyond the yeast genome
In the first ribonucleotide mapping study to be conducted in a non-yeast species, ribose-seq was used in conjunction with Ribose-Map to map rNMPs throughout the nuclear, mitochondrial, and chloroplast genomes of Chlamydomonas reinhardtii, a photosynthetic unicellular green alga. In mitochondrial and chloroplast DNA of this species, rAs appeared to be preferentially incorporated over other bases, whereas nuclear DNA showed bias towards rG and rC incorporation. In all C. reinhardtii organelles, rU was consistently the least represented rNMP, which is consistent with findings from yeast mitochondria. Similar to yeast profiles, the incorporation of rNMPs in C. reinhardtii is most impacted by the dNMP immediately upstream (position -1). In this setting, the application of Ribose-seq allowed for the identification of rNMP incorporation distributions and patterns in this species for the first time.

Advantages of ribose-seq
Because ribose-seq specifically targets embedded ribonucleotides in DNA, and does not capture RNA primers or Okazaki fragments, it is highly transferable and can effectively target embedded ribonucleotides in a wide variety of genomic DNA. Therefore, ribose-seq allows for rNMP mapping in different types of genomes, including small genomic molecules or large nuclear genomes, without necessitating standardized procedures. Additionally, the specificity of ribose-seq allows for the mapping of rNMPs even in conditions in which the DNA is exposed to environmental stressors that damage the DNA by generating breaks and/or abasic sites.

Ribose-seq uses the UMI in the adaptors for high-throughput sequencing which allows to perform deduplication of the sequencing reads, and thus providing for a more accurate detection of rNMP hotspots.

Ribose-seq is not specific to yeast cells; the technique could potentially be applied to any cell type from any species, making it a useful tool for elucidating the and location and pattern of rNMPs in most organisms.

Ribose-seq uses common laboratory equipment and reagents that are fairly easy to source, which makes it an accessible technique for use in many molecular biology laboratories.

Limitations of ribose-seq
Because it targets single rNMPs, ribose-seq cannot capture longer ribonucleotide tracts. Therefore, ribose-seq would not be an appropriate technique for mapping RNA primers or Okazaki fragments, which can be formed during DNA replication or breaks.

Ribose-seq takes about 4 days to prepare the rNMP libraries from the genomic DNA. The protocol is also relatively complex, requiring multiple steps.

Ribose-Map
Ribose-Map is a bioinformatic tool for processing and analysis of large, complex rNMP raw sequencing datasets. Ribose-Map is compatible with ribose-seq, emRiboSeq, Pu-seq, Alk-HydEn-seq, and RHII-HydEn-seq. (see: Other Techniques for rNMP capture).

RESCOT (Restriction Enzyme Set and Combination Optimization Tools)
RESCOT is a computational method that calculates the genomic coverage of RNMP-captured regions for a given choice of restriction enzymes and then optimize the RE sets and select the RE set with highest coverage to maximize rNMP capture rate.

Other techniques for rNMP capture

 * emRiboSeq: Uses human RNase H2 and T4 Quick ligase to capture the deoxyribonucleotide upstream of the embedded rNMP from the 5′ side.
 * RHII-HydEn-seq: Uses Escherichia coli RNase HII and T4 RNA ligase to capture the embedded rNMP from its 5′ side. RHII-HydEn-seq can also capture the DNA sequence terminated with an rNMP, and Okazaki fragments.
 * Pu-seq: Uses alkaline conditions and T4 RNA ligase to capture the deoxyribonucleotide downstream of the embedded rNMP from the 3′ side. Can capture Okazaki fragments.
 * Alk-HydEn-seq: Uses alkaline conditions and T4 RNA ligase to capture the deoxyribonucleotide downstream of the embedded rNMP from the 3′ side. Can also capture Okazaki fragments.
 * RiSQ-seq (Ribonucleotide Scanning Quantification sequencing): Uses absolute quantification to capture the upstream neighbor deoxyribonucleotide of the rNMP.