Draft:BotSeqS

The Bottleneck Sequencing System (BotSeqS) is a next-generation sequencing technology (NGS) used to identify the presence of rare somatic mutations in nuclear and mitochondrial genomes. It aims to identify rare mutations in an unbiased and specific manner.. The technique can be used to achieve various objectives, such as:
 * Establishing the prevalence of mutations across the entire genome
 * Comparing mutation rates between mitochondrial and nuclear genomes
 * Determining mutation rates across different tissues of varying age, DNA repair capacity, and environmental exposures
 * Identifying rare mutations in normal tissue and comparing them to mutational profiles observed in disease tissue

Traditional next-generation sequencing methods have difficulty identifying rare somatic mutations due to their high error rates. Decreasing the error rate and sensitivity of NGS is critical to identify mutations which may only be present in a small population of cells. By sequencing both strands of a molecule of DNA, using molecular barcoding, and a dilution step during library preparation, BotSeqS decreases the errors generated by NGS, allowing for the identification of rare somatic mutations.

Improvement of rare somatic mutation detection has implications in understanding the effect of environmental carcinogenic exposure, the development of cancer, and somatic mutations which accumulate with age.

Somatic Mutations
Somatic point mutations can occur throughout the genome at different rates. The site of the mutation, as well as the downstream effect of the mutation, could lead to the development of physiological changes or disease in the tissue in which it arises; alternatively, the mutation could have no effect. Continued cell division, and subsequent DNA replication, over time increases somatic mutational burden, as does exposure to environmental mutagens. Identification of rare point mutations in different tissues can therefore provide insight into the areas of genomic instability within the genome and mutational signatures associated with different mutagens. Additionally, searching for mutations in different tissues allows for the recognition of tissue-specific mutational spectra and insights into tissue-specific processes. All identified mutations in normal tissue can further be compared to mutational profiles of disease tissues to determine the risk these mutations pose to disease and disruption of normal physiological functions.

Enhancing Detection of Rare Mutations with BotSeqS
Detection of rare mutations has been a challenge due to their low frequency and the error rate of the technologies used to detect them. BotSeqS aims to overcome these limitations in the following manner:

Dilution of Sequencing Library
Dilution of the sequencing library creates a bottleneck effect, in which mutations that would be considered rare within the entire sample population are found at a higher frequency, and are therefore more easily detectable, within the sequenced sample. Dilution furthermore increases the chance that both DNA strands get sequenced, as there is less overall template in the sample and therefore a greater likelihood that both strands will be detected. Duplex sequencing increases the specificity of the technique and decreases the false positive rate. This is because only mutations observed in both strands are considered true mutations and those only found in one are attributed to technical error.

Molecular Barcoding
Barcoding aids in the detection of rare mutations as it allows for the distinction between unique DNA molecules. A rare mutation within a population of DNA molecules is identified by a uniquely barcoded DNA molecule in comparison to a clonal mutation which may be present in DNA molecules with diverse barcodes. It therefore aids in increasing the sensitivity of the technique.

Library Generation
The BotSeqS method applies typical Next Generation Sequencing (NGS) technologies. In the original BotSeqS protocol, a standard Illumina TruSeq PCR-Free kit was used to generate BotSeqS libraries. The general steps are as follows:

1. Random Fragmentation of Genomic DNA


 * Genomic DNA is fragmented using a sonicator. This step produces DNA molecules with ends that align uniquely with the reference genome. The unique mapping coordinates are used as molecular barcodes, also known as endogenous barcodes or unique identifiers (UIDs).

2. Ligation of Sequencing Adaptors


 * Adaptors are ligated onto the ends of genomic DNA fragments in order to allow for sequencing in Step 6. The addition of adaptors allows fragments to effectively attach to the flow cell on which sequencing by synthesis takes place.

3. Dilution of Sequencing Library


 * Sequencing libraries are diluted prior to amplification to increase the relative frequency of rare mutations. This step is what allows BotSeqS to differentiate itself from other sequencing techniques and increases its ability to detect rare point mutations.


 * The dilution factor will need to be assessed on a case-by-case basis, typically depending on the number of desired reads per adapter-ligated fragment. A dilution series can be performed (e.g two-fold, four-fold, etc)

4.  Polymerase Chain Reaction (PCR)


 * DNA fragments in the diluted sequencing library are amplified through PCR in order to increase their abundance and facilitate sequencing in Step 6.

5. Grouping of Sequencing Reads


 * Once amplified, each fragment is grouped into a family based on the presence of its specific UID. This allows for grouping of sequences that align to the same region of the reference genome.

6. Next-Generation Sequencing (NGS)


 * Illumina NGS is performed to determine the sequences of the amplified DNA fragments. NGS employs a sequencing by synthesis technique whereby fragments are denatured and forward strands are replicated. Fluorescently tagged dNTPs are used during generation of complementary strands; the tag indicates the base being added and in this way provides sequencing information. Once the fluorescence has been recorded, the group is removed to allow for further synthesis and addition of the next dNTP to the growing DNA strand.

Data Processing Pipeline
Following library generation and sequencing, the generated reads will be processed and analyzed. In the original BotSeqS protocol, the processing pipeline begins by mapping the sequencing results back to a reference genome. This then allows for identification of sequences that contain variants within tissues sampled. Of these variants, only rare mutations are considered, and germline and clonal variants are removed. For the purposes of this technique, clonal variants are considered those that are not found within the reference genome, but are observed within all samples of a population; due to their abundance within the sampled tissue they are therefore not considered rare. Artifacts and sequencing errors are also identified and removed to ensure that only true rare variants are considered in subsequent analysis and applications. Examination of both strands is helpful in the identification of such false positives, since mutations identified in only one strand are more likely to be due to error, whereas the presence of a mutation in both strands indicates that the mutation in fact exists within this population.

Applications
BotSeqS can be used for the detection of rare somatic mutations in numerous biological contexts. It can identify mutation prevalence in individuals that have been exposed to environmental carcinogens, measure random mutations that accumulate with age, and further has the capability to compare rare mutations in both mitochondrial and nuclear genomes.

Detecting mutations following exposure to environmental factors and carcinogens
BotSeqS has been used to identify the accumulation of mutations between tissues of individuals who have been exposed to environmental factors and carcinogens. For example, by comparing kidney tissues from individuals who have known exposure to either aristolochic acid (AA) or smoking to those without known exposure to either carcinogen, differences in mutation accumulation were identified.

Detecting accumulation of mutations with age
Delineating somatic mutations which accumulate throughout the human lifespan is critical for the understanding of the mechanisms and diseases associated with aging. BotSeqS has previously been shown to determine the age-associated mutational burden of tissues including brain, kidney, and colon.

Identifying Mutational Burden in Both the Mitochondrial and Nuclear Genome
Mutations in both the nuclear and mitochondrial genomes can be analyzed simultaneously via BotSeqS due to the implementation of the dilution step which allows for the random sampling of both genomes. This has allowed for the comparison of mutation rates in nuclear and mitochondrial genomes of tissues with age and that have been exposed to different carcinogens.

Advantages
Previous methods for detecting rare mutations were often limited in their sensitivity or restricted to specific loci. In particular, the use of molecular barcoding was restricted to detect rare mutations only at specific regions of the genome, overall resulting in a biased detection of somatic mutations. BotSeqS is an unbiased technique that randomly samples the genome to identify rare mutations. This is achieved through the implementation of barcoding and a dilution step during library preparation. The dilution step creates a bottlenecked library, improving the random sampling of the genome and increasing the likelihood of sequencing both DNA strands, thus enhancing sequencing accuracy. The nature of BotSeqS with its added dilution step, allowing for efficient random sampling, also enables simultaneous sequencing and assessment of both the nuclear and mitochondrial genomes.

Limitations
BotSeqS faces the issue of end-repair artifacts, whereby genome fragmentation during library preparation leads to the production of uneven blunt ends. Uneven blunt ends are repaired by DNA polymerase which can be error-prone and introduce new mutations. In addition, BotSeqS requires the removal of clonal mutations found in more than one cell to accurately identify rare somatic mutations. The identification and removal of these clonal variants is necessary to prevent false positives and identify rare mutations. Furthermore, BotSeqS identifies and generates mutational profiles from a heterogeneous population of cells found within a given tissue, and therefore cannot differentiate mutational profiles generated from individual cell types. This leads to an inability to discriminate which mutations are associated with which cell type, and does not account for the presence of non-tissue cell types that may be present in the sample (e.g., cells of the immune system)

Duplex sequencing
Duplex-Sequencing is a next generation sequencing technology used to detect rare mutations. Similarly to BotSeqS, both strands of a DNA molecule are sequenced, allowing researchers to identify and filter out sequencing errors, amplification biases, and other artifacts, resulting in accurate mutation detection. Duplex-Seq also implements the use of unique molecular tags to DNA molecules for their subsequent identification in downstream analyses.

Nanorate sequencing (NanoSeq)
NanoSeq is another sequencing technology that is used to identify rare mutations. It differs from BotSeqS in its usage of restriction enzymes to fragment the genome rather than sonication. The technique further implements non-A dideoxynucleotides (ddBTPs) to prevent additional errors that arise during nick extension which can occur during library preparation following fragmentation of the genome in BotSeqS.