Off-target genome editing

Off-target genome editing refers to nonspecific and unintended genetic modifications that can arise through the use of engineered nuclease technologies such as: clustered, regularly interspaced, short palindromic repeats (CRISPR)-Cas9, transcription activator-like effector nucleases (TALEN), meganucleases, and zinc finger nucleases (ZFN). These tools use different mechanisms to bind a predetermined sequence of DNA (“target”), which they cleave (or "cut"), creating a double-stranded chromosomal break (DSB) that summons the cell's DNA repair mechanisms (non-homologous end joining (NHEJ) and homologous recombination (HR)) and leads to site-specific modifications. If these complexes do not bind at the target, often a result of homologous sequences and/or mismatch tolerance, they will cleave off-target DSB and cause non-specific genetic modifications. Specifically, off-target effects consist of unintended point mutations, deletions, insertions inversions, and translocations.

Designer nuclease systems such as CRISPR-cas9 are becoming increasingly popular research tools as a result of their simplicity, scalability and affordability. With this being said, off-target genetic modifications are frequent and can alter the function of otherwise intact genes. Multiple studies using early CRISPR-cas9 agents found that greater than 50% of RNA-guided endonuclease-induced mutations were not occurring on-target. The Cas9 guide RNA (gRNA) recognizes a 20 bp target DNA sequence, which it binds and cleaves to "edit" the DNA sequence. However, target sequence binding can tolerate mismatches up to several base pairs, meaning there are often thousands of possible binding sites which present several experimental and safety concerns. In the research sphere, off-target effects can confound variables in biological studies leading to potentially misleading and non-reproducible results. In the clinical sphere, the major concerns surround the disruption of vital coding regions leading to genotoxic effects such as cancer. Accordingly, the improvement of the specificity of genome editing tools and the detection of off-target effects are rapidly progressing research areas. Such research incorporates designer nuclease development and discovery, computational prediction programs and databases, and high-throughput sequencing  to reduce and anticipate mutational occurrence. Many designer nuclease tools are still in their relative infancy and as their molecular properties and in vivo behaviors become better understood they will become increasingly precise and predictable.

Mechanisms
The CRISPR-Cas9 system works as the adaptive immune system in bacteria and archaea. When a virus infects the bacteria, this system incorporates segments of the viral DNA into the bacterial genome. Upon a second invasion, transcripts from these sequences direct a nuclease activity to its complementary sequence in the invading virus so as to destroy it.

In order to extrapolate this method into eukaryotes to develop a gene editing method, a Cas9 protein, a recognition sequence RNA, and a transactivating RNA are required. The fusion of both the recognition sequence specificity CRISPR RNA (crRNA) and transactivating RNA (tracrRNA) is commonly used in experiments and called a single guide RNA (sgRNA). It performs both functions: the first 20 nucleotides of the sgRNA are complementary to the DNA target sequence (cr function), while the nucleotides following are part of a protospacer adjacent motif (PAM; tracr function).

Off-targeting nuclease binding originates from a partial but sufficient match to the target sequence. Off-target binding mechanisms can be grouped into two main forms: base mismatch tolerance, and bulge mismatch.

Base mismatch tolerance
While the Cas9 specificity is believed to be controlled by the 20nt sgRNA and PAM, off-target mutations are still prevalent and could occur with as many as 3-5 base pair mismatches (out of 20) between the sgRNA and the target DNA sequence. Furthermore, sgRNA secondary structures could also affect cleavage of on-target and off-target sites. As mentioned above, sgRNA consists of a sequence (~20nt) which is complementary to the target sequences and this is followed by a PAM sequence which activates the endonuclease activity. While it was shown that 10-12 nt adjacent to PAM (called the “seed sequence”) was enough for Cas9 specificity, Wu et al. showed that in a catalytically dead Cas9 only 1-5 base pairs of seed sequence is required for specificity. This was later proven by other studies as well. The Cas9 protein binding is further affected by a number of mechanisms: It is important to note that DNA methylation of CpG sites reduces efficiency of binding of Cas9 and other factors in cells. Therefore, there is an epigenetic link which will be explored more for the future of epigenome editing. Variations within the PAM sequence can also affect sgRNA activity, in turn affecting the sgRNA itself. In commonly used Cas9 systems, the PAM motif is 5’ NGG 3’, where N represents any of the four DNA nucleotides. The requirement of the PAM sequence can cause specificity problems as some regions will have an available target sequence to make a desired genetic modification. A report stated that 99.96% of sites previously assumed to be unique Cas9 targets in human exons may have potential off target effects containing NAG or NGG PAM and a single base mismatch in the seed sequence.
 * The seed sequence determines the frequency of a seed plus PAM in the genome and controls the effective concentration of Cas9 sgRNA complex.
 * Uracil-rich seeds are likely to have low sgRNA levels and increase specificity since multiple uracil in the sequence can introduce termination of the sgRNA transcription.
 * Mismatches in the 5’ end of the crRNA are more tolerated as the important site would be adjacent to the PAM matrix. Single and double mismatches are also tolerated based on how to place it.
 * In a recent study, Ren et al. observed a link between mutagenesis efficiency and GC content of sgRNA. At least 4-6bp adjacent to the PAM are required for a good edit.
 * While picking a gRNA, guanine is preferred over cytosine as the first base of the seed adjacent to PAM, cytosine as the first in the 5’ and adenine in the middle of the sequence. This design is based on stability linked to formation of G quadruplexes.
 * A ChIP was performed by Kim et al. showcasing that addition of a purified Cas9 along with the sgRNA caused low off target effects which means that there are more factors causing these effects.

Bulge mismatch
Both off target sites with missing bases (or deletions) and off target sites with extra bases (or insertions) called RNA bulge and DNA bulge respectively, have effects in Cas9 specificity and cleavage activity. Lin et al. mimicked these bulges by adding and deleting bases from the sgRNA sequence such that a base deletion in the sgRNA would yield an RNA bulge and a base insertion would yield a DNA bulge. By studying the mutation rates via NHEJ, they came to the following results:
 * In case of pure DNA bulges, the mutations were well tolerated (i.e. Cas9 cleavage activity was still prevalent). The regions of bulge tolerance included seven bases from the PAM and the 5’ and 3’ ends of the seed sequence. This resulted in similar or slightly higher (in some cases) mutations as compared to zero bulges.
 * In case of purely RNA bulges, higher Cas9 activity was induced at many positions as compared to DNA bulges. This characteristic was attributed to the fact that RNA is more flexible than DNA and thus has a smaller binding penalty with RNA bulge resulting in a higher tolerance and higher off target mutations.
 * Higher GC content of the sgRNA sequence resulted in a higher tolerance and thus, a higher off target mutation rate.
 * Bulges of 2bp-5bp strikingly were more tolerant and mutation inducing than a single 2bp bulge.

Methods to increase specificity
The widely used Streptococcus pyogenes Cas9 (SpCas9) nuclease is effective, however it induces unwanted off-target mutations at high frequencies. Several engineering and screening methods have been described in an effort to reduce genome-wide off-target mutations including nuclease mutation, protospacer adjacent motif (PAM) sequence modification, guide RNA (gRNA) truncation and novel nuclease discovery. For example, in 2013, Fu et al. reported that by truncating the gRNA from <20 bp in length to 17 or 18 bp the target specificity of the nuclease increased up to 5,000 fold and mismatch occurrences above 3 bases rarely, if ever occurred.

Cas9 nickases
The spCas9 nuclease can also be mutated in a variety of ways to improve specificity and control. Nuclease domains can be mutated independently of each other into what are known as Cas9 nickases. These nucleases have one active and inactive nuclease domain which result in a complex that performs single strand cleavage. Cas9 nickases can be employed in tandem (known as paired nickases), which perform two single strand 'cuts' on alternate strands. Using this strategy both Cas9 nickases must co-localize, bind and cleave their target, which drastically reduces the probability of off-target indels. Also, the DSBs cleaved by paired nickases have long overhangs instead of blunt ends which provide improved control of targeted insertions.

Fok1-dCas9 and dimerization nucleases
As monomeric nucleases often involve high levels of off-target effects, dimerization is an attractive strategy. In a dimer system, both nucleases must bind to their individual targets or ‘half-sites’ and then interact and dimerize to initiate cleavage which greatly decreases the probability of off-target effects. A method that incorporates the reliability of dimerization-dependent FokI nuclease domains, used in ZFNs and TALENs, with the simplicity of CRISPR-cas9 has been developed. The FokI nuclease was originally found in Flavobacterium okeanokoites, and will only cleave DNA given dimerization activation. Basically, the researchers fused this nuclease to a CRISPR complex with an inactive Cas9 nuclease (Fok1-dCas9). The gRNA directs the CRISPR complex to the target site but the 'cut' is made by dimerized Fok1. It is estimated that the Fok1-dCas9 strategy reduces detectable off-target effects by 10,000 fold, which makes it effective for applications requiring highly precise and specific genome editing.

Nuclease mutation
In addition to a gRNA target, Cas9 requires binding to a specific 2-6 nucleotide sequence PAM. In commonly used SpCas9 systems the PAM motif is 5’ NGG 3’, where N represents any of the four DNA nucleotides. The requirement of the PAM sequence can cause specificity limitations as some regions will not have an available target sequence to make a desired genetic modification. The PAM sequence can be edited to non-canonical NAG and NGA motifs which not only improve the specificity but also reduced off-target effects. A D1135E mutant appears to alter PAM specificities. The D1135E mutant reduces off-target effects and increases the specificity of SpCas9. An additional variant, SpCas9-HF1, also results in favorable improvements to Cas9 specificity. Several combinations of substitutions known to form non-specific DNA contacts (N497A, R661A, Q695A, and Q926A) have been identified. A quadruple substitution of these residues (later named SpCas9-HF1), has extremely low levels of off-target effects as detected by GUIDE-seq experiments. Variants such as SpCas9-HF1 and D1135E, and others like it can be combined, tested and readily added to existing SpCas9 vectors to reduce the rates of off-target mutations. Additionally, many of the engineering strategies listed above can be combined to create increasingly robust and reliable RNA-guided nuclease editing tools. Directed evolution can also be used to reduce nuclease activity on particular target sequences, leading to variants such as SpartaCas (containing mutations D23A, T67L, Y128V, and D1251G relative to wildtype SpCas9).

CRISPRi and CRISPRa
CRISPR interference (CRISPRi) and CRISPR activation (CRISPRa) have also been developed. These systems can precisely alter gene transcription at the DNA level without inflicting irreversible genetic alterations. Furthermore, by directly acting on DNA they are generally more specific and predictable compared to RNAi. Although CRISPRi/a cannot replace genome editing in all experiments, they can act as effective alternatives in some cases. CRISPRi and CRISPRa use a deactivated Cas9 (dCas9) enzyme that cannot cut DNA, but can deliver transcriptional activators and repressors to modulate desired gene expression with high precision. Currently, off-target effects of CRISPRi are minimal, and show a reduced response and sensitivity to single-base mismatches. Importantly, when non-specific effects do inevitably occur they are reversible, time-dependent, and less damaging than DNA editing, making them effective alternatives that can limit the off-target burden when possible. CRISPR-cas13b, using a type IV CRISPR-Cas system (as opposed to the commonly used type II) can target and edit specific RNA sequences. Such an RNA editing platform has the ability to specifically edit mRNA, and therefore protein translation, without altering the DNA. The represents a promising technology that if successful would reduce the burden of irreversible off-target mutations.

Detection
Even though one might take careful measures to avoid off-target mutations, and even if one succeeds, a confirmatory screen needs to be done in order to screen for unintended mutations. Currently there are plenty of biased and unbiased methods for such a screen and only two in vitro methods. All of these are listed below:

Targeted, exome, and whole genome sequencing
In case of normal targeted sequencing, the biased approach will yield results only for the intended area of capture, which hinders the search as no unexpected mutations will come up on the screen. While it is easy and cheap, it becomes time-consuming and expensive once more target sites are added. Exome sequencing utilizes exome capture to acquire the protein coding regions of the genome. It is unbiased, however, it will not yield off target mutations in the non-coding region of the genome. In case of whole genome sequencing, the entire genome is screened for off target mutations. Currently, this method is expensive and like exome sequencing, whole genome also requires a reference genome to make inferences.

BLESS
BLESS is the easiest way to detect and quantify off-target mutations by screening for DSBs in the genome. This method relies on direct in situ breaks labeling enrichment on streptavidin. Developed in 2013, BLESS is performed by ligating the DSB ends with biotin i.e. biotinylation. This is followed by separation/collection of said ligated ends using streptavidin. A linked sequence is added to the biotinylated sequences and this final mix is then sequenced to yield the position of the off target mutation. Being unbiased in nature, BLESS gives information about site of mutation within the genome rather than the proteins involved or associated with the DSBs. However, BLESS can only detect mutations at the time of experiment and not the ones which were formed earlier and were repaired.

LAM-HTGTS
Linear Amplification Mediated - High Throughput Genome Wide Translocation Sequencing, or LAM-HTGTS, is a method developed to track translocation events caused by joining between DSBs. Developed to detect off-target mutations from TALEN and CRISPR-Cas9, this technique is based on DNA repair by end joining in DSBs. Once the nuclease is added, it goes on to produce on- and off-target mutations. Along with this there is a bait sequence which also gets cleaved. Therefore, if another DSB occurs on a chromosome other than the bait sequence chromosome, both of them are joined leading to a translocation. Since the bait sequence is known, this translocated sequence is amplified with primers. In case there is no translocation, there is a restriction site within which gets cleaved in order to prevent amplification of only the bait sequence. The amplified DNA is then sequenced to study large genomic rearrangements due to off-target mutations. One drawback is that it relies on simultaneous presence of bait and another DSB.

GUIDE-Seq
Another approach to find off-target mutations due to nuclease activity is the GUIDE-Seq method. GUIDE-seq or Genome Wide Unbiased Identification of DSBs Enabled by Sequencing is based on the incorporation of double stranded oligodeoxynucleotides (dsODN) into DSBs via NHEJ. Its amplification is followed by sequencing. Since two primers will be used to sequence the dsODNs, the regions flanking the DSB along with the DSB will be amplified. Thus allowing mapping the off target mutation. This technique has been applied to identify all previously known off-target sites as well as new ones with frequencies as low as 0.03%. Just like BLESS, however, GUIDE-seq can only detect DSBs present at the time of study.

Digenome-Seq
One of the current in vitro methods, Digenome-Seq utilizes Cas9's property of cleaving the genome to get an unbiased profile of the entire genome. In this method, Cas9 is added to gDNA and the after effects are studied using high-throughput sequencing. Since the fragments are formed due to the same nuclease, the ends of these fragments can be mapped aligned. Two big advantages are that it can be used to study up to 10 gRNAs at once and can identify targets to frequencies as low as 0.01%. The main advantage, however, is that this method is in vitro i.e. the DSBs introduced by Cas9 will not be processed by the DNA repair machinery (unlike BLESS and GUIDE-seq) and thus will include all possible off target mutants. However, it might lead to large number of false positives as well.

CIRCLE-Seq
The latest addition to the in vitro methods in detecting off target mutations is CIRCLE-seq. Licensed by Beacon genomics (along with GUIDE-seq), CIRCLE-seq aims to remove the drawbacks of Digenome-seq such as the need for a large sample size and read depth (~400 million reads) and the high background that makes identification of low frequency cleavage events harder. It adopts a restriction enzyme independent strategy to create and select conversion of randomly sheared DNA. On cleavage, the target DNA forms a stem loop to which adaptors can be added for sequencing. While this proved possible, the other possibility yielded a fold high difference in detection/. In the second case, the sequence is cleaved using Cas9 and when it is cleaved again at the half site, a circular cut is available (which is the reason for the name CIRCLE-seq). Nearly all sites identified by circularization contain both linear detected sites and newer ones suggesting that CIRCLE-seq does not bias between breaks and obtains strong low frequency breaks as well. It further helps to sequence the break site from both sides of cleavage as compared to other methods which have only one read side.

Barcoded libraries of targets
Nucleases such as Cas9 may also be challenged in vitro by randomized libraries of targets. Adapter ligation to quantify cleaved and uncleaved library members allows for unbiased measurement of a nuclease's specificity profile. Measurement of cleavage of barcoded libraries of targets (BLT) with SpCas9 indicated that specificity profiles were guide-specific and depend on the guide sequence as well as the nuclease itself. Unbiased specificity profiles based on each particular Cas9-gRNA complex may then be used to build guide-specific predictive models for in vitro cleavage.

Gene therapy
In order for gene editing technologies to make the leap towards safe and widespread use in the clinic, the rate of off-target modification needs to be rendered obsolete. The safety of gene therapy treatment is of utmost concern, especially during clinical trials when off-target modifications can block the further development of a candidate product. Perhaps the most well-known example of modern gene therapy is CAR-T therapy, which is used for the treatment of B-cell lymphoma. To limit the rate of off-target cleavage, the therapy uses a highly specific and finely tuned TALEN, which has proven to have little-to-no background off-target interaction. CAR-T immunotherapy is an ex vivo procedure, which means that the patient's immune cells (in this case T-cells) are extracted and edited using designer nucleases. While TALEN system development is expensive and time-consuming, research and engineering modifications have drastically limited their rate of off-target interaction. However, patients receiving the treatment are still monitored frequently and will be for the next 15 years so that off-target effects and immunogenic responses can be analyzed and brought into consideration as new gene therapies are brought to clinical trial.

CCR5 ZFN-modified autologous helper T cell trials
A phase I/II clinical trial enrolled 12 patients with acquired immune deficiency syndrome (AIDS) to test the safety and effectiveness of administering ZFN-modified autologous helper T cells. Through targeted deletions, the custom ZFN disables the C-C chemokine receptor 5 (CCR5) gene, which encodes a co-receptor that is used by the HIV virus to enter the cell. As a result of the high degree of sequence homology between C-C chemokine receptors this ZFN also cleaves CCR2, leading to off-target ~15kb deletions and genomic rearrangements. The impacts of these CCR2 modifications are still not known, and to date there have been no reported side effects. However, CCR2 is known to have many critical roles in neural, and metabolic systems.

Gene Drives
Engineered gene drives using CRISPR-cas9 are currently being tested and have been proposed as strategies to eliminate invasive species and disease vectors. By genetically modifying an organism to express an endogenous sequence-specific endonuclease, a target (such as a fertility gene) can be cleaved on the opposite chromosome. A DSB at the target leads to homologous repair which effectively renders the organism homozygous for the desired target sequence. This strategy, known as a homing drive, can suppress a population by affecting a critical gene or inducing recessive sterility. However, if such a system were released into the wild, the CRISPR-cas9 system would remain function indefinitely. With every subsequent generation, off-target mutations would become increasingly likely and the effects of these mutations on a species would be stochastic. Off-target mutations could disable the suppressive qualities of a gene drive while maintaining the endonuclease expression. In such a situation there would be an increased risk of gene flow between the target species and other species likely leading to undesired outcomes.

Controversy
The increased use of genome editing and its eventual translation towards clinical use has evoked controversy surrounding the true off-target burden of the technologies.

Schaefer et al. 2017
On May 30, 2017, a two-page correspondence article was published in Nature Methods that reported an unusually high number of off-target SNVs and indels after sequencing mice that were previously involved in an in vivo gene repair experiment. The previous experiment, completed by the same group, successfully restored the vision of blind mouse strain (rd1) by correcting the Y347X mutation in the Pde6b gene using a CRISPR-cas9 system. After completing the experiment two genetically corrected mice were whole genome sequenced and compared to control and known mouse strain genomes. Greater than 1,600 SNVs, and 128 indels were discovered, of which 1,397 SNVs and 117 indels were shared between the two edited mice, suggesting that the off-target effects were not random. Algorithms attempting to predict the location of these off-target mutations failed for an overwhelming majority of loci. In comparison, a 2016 whole exome sequencing study found 19 SNVs and 3 indels in 5 edited mice, while Schaefer et al. found 115 exonic SNVs and 9 indels in just 2 edited mice. Many experts disagreed with the paper and criticized it through journal articles and social media, suggesting that unusual CRISPR treatments were used in the initial paper and the sample size was too low for significance (n=2). Nature Methods has issued two editorial notes on the paper, and later retracted it. Nonetheless, off-target rates are consistently found to be more frequent in vivo compared to cell culture experiments, and are thought to be particularly common in humans.