User:Cas9 wiki project/sandbox

Cas9 (CRISPR associated protein 9) is an RNA-guided DNA endonuclease enzyme associated with the CRISPR (Clustered Regularly Interspersed Palindromic Repeats) adaptive immunity system in Streptococcus pyogenes. S. pyogenes utilizes Cas9 to interrogate and cleave foreign DNA, such as invading bacteriophage DNA or plasmid DNA. Cas9 performs this interrogation by unwinding foreign DNA and checking for complementarity with the 20 base pair long spacer region of the guide RNA. Specifically, the guide RNA is composed of two disparate RNAs that associate to make the guide- the CRISPR RNA (crRNA), and the trans-activating RNA (tracrRNA). If a DNA substrate is cognate to the guide RNA, Cas9 cleaves the invading DNA. In this sense, the CRISPR-Cas9 mechanism has a number of parallels with the RNA interference (RNAi) mechanism in eukaryotes. Apart from its original function in bacterial immunity, the Cas9 protein has been heavily utilized as a genome engineering tool to induce site-directed double strand breaks in DNA. These breaks can lead to gene inactivation or the introduction of heterologous genes through non-homologous end joining and homologous recombination respectively in many laboratory model organisms. Alongside zinc finger nucleases and TALEN proteins, Cas9 is becoming a prominent tool in the field of genome editing.

Cas9 has gained traction in recent years because it can cleave nearly any sequence complementary to the guide RNA. Because the target specificity of Cas9 stems from the guide RNA:DNA complementarity and not modifications to the protein itself (like TALENs and Zinc-fingers), engineering Cas9 to target new DNA is quite easy. The design flexibility coupled with versions of Cas9 that binds but does not cleave cognate DNA also has potential for turning genes on and off by localizing transcriptional activator or repressors to specific DNA sequences. Further simplification was provided in a 2012 seminal paper that depicts the creation of a chimeric single guide RNA by combining the tracr- and crRNA moieties. Scientists have suggested that Cas9-based gene drives may be capable of editing the genomes of entire populations of organisms. Much like the revolution in molecular biology that accompanied the discovery of restriction enzymes in the 1970s, the “Cas9 toolbox” also holds great potential.

Overview
Cas9 features a bi-lobed architecture with the guide RNA nestled between the alpha-helical lobe (blue; fig 1) and the nuclease lobe (cyan, orange and gray). These two lobes are connected through a single bridge helix. There are two nuclease domains located in the multi-domain nuclease lobe, the RuvC (gray) which cleaves the non-target DNA strand, and the HNH nuclease domain (cyan) that cleaves the target strand of DNA. Interestingly, the RuvC domain is encoded by sequentially disparate sites that interact in the tertiary structure to form the RuvC cleavage domain (See Figure 1).

A key feature of the target DNA is that it must contain a protospacer adjacent motif (PAM) consisting of the three-nucleotide sequence- NGG. This PAM is recognized by the PAM-interacting domain (PI domain, orange) located near the C-terminal end of Cas9. Cas9 undergoes distinct conformational changes between the apo, guide RNA bound, and guide RNA:DNA target bound states, which are detailed below.

In type II CRISPR system, Cas9 recognizes the stem-loop structure formed by repeat and mediates the maturation of crRNA-tracrRNA complex. Cas9 in complex with CRISPR RNA (crRNA) and trans-activating crRNA (tracrRNA) further recognizes and degrades the target dsDNA. In the co-crystal structure shown here (Fig. 1), the crRNA-tracrRNA complex is replaced by a chimeric single-guide RNA (sgRNA) which has been proved to have the same function as the natural RNA complex. The sgRNA base paired with target ssDNA is anchored by Cas9 as a T-shaped architecture. The first structure of Cas9 shown here reveals this protein consists of a recognition lobe (REC) and a nuclease lobe (NUC). By additional sequence alignment, Cas9 can be further divided into six regions: REC1, REC2, RuvC, HNH, PAM-interacting (PI), and Bridge helix (BH). All other regions except HNH form tight interactions with each other and sgRNA-ssDNA complex, while the HNH domain forms few contacts with the rest of the protein. In another conformation of Cas9 complex observed in the crystal, the HNH domain is not visible. These structures suggest the conformational flexibility of HNH domain.

The interactions between sgRNA and Cas9
In sgRNA-Cas9 complex, based on the crystal structure, REC1, BH and PI domains have important contacts with backbone or bases in both repeat and spacer region. Several Cas9 mutants including REC1 or REC2 domains deletion and residues mutations in BH have been tested. REC1 and BH related mutants show lower or none activity compared with wild type, which indicate these two domains are crucial for the sgRNA recognition at repeat sequence and stabilization of the whole complex. Although the interactions between spacer sequence and Cas9 as well as PI domain and repeat region need further studies, the co-crystal demonstrates clear interface between Cas9 and sgRNA.

Target digestion
Previous sequence analysis and biochemical studies have suggested Cas9 contain RNase H and HNH endonuclease homologous domains which are responsible for cleavages of two target DNA strands, respectively. These results are finally proved in the structure. Although the low sequence similarity, the sequence similar to RNase H has a RuvC fold (one member of RNase H family) and the HNH region folds as T4 Endo VII (one member of HNH endonuclease family). Previous works on Cas9 have demonstrated that HNH domain is responsible for complementary sequence cleavage of target DNA and RuvC is responsible for the non-complementary sequence (Westra, et al. 2012; Wiedenheft, et al. 2014).