Genome-wide CRISPR-Cas9 knockout screens

Genome-wide CRISPR-Cas9 knockout screens aim to elucidate the relationship between genotype and phenotype by ablating gene expression on a genome-wide scale and studying the resulting phenotypic alterations. The approach utilises the CRISPR-Cas9 gene editing system, coupled with libraries of single guide RNAs (sgRNAs), which are designed to target every gene in the genome. Over recent years, the genome-wide CRISPR screen has emerged as a powerful tool for performing large-scale loss-of-function screens, with low noise, high knockout efficiency and minimal off-target effects.



History
Early studies in Caenorhabditis elegans and Drosophila melanogaster saw large-scale, systematic loss of function (LOF) screens performed through saturation mutagenesis, demonstrating the potential of this approach to characterise genetic pathways and identify genes with unique and essential functions. The saturation mutagenesis technique was later applied in other organisms, for example zebrafish and mice.

Targeted approaches for gene knockdown emerged in the 1980s with techniques such as homologous recombination, trans-cleaving ribozymes,  and antisense technologies.

By the year 2000, RNA interference (RNAi) technology had emerged as a fast, simple, and inexpensive technique for targeted gene knockdown, and was routinely being used to study in vivo gene function in C. elegans. Indeed, in the span of only a few years following its discovery by Fire et al. (1998), almost all of the ~19,000 genes in C. elegans had been analysed using RNAi-based knockdown.

The production of RNAi libraries facilitated the application of this technology on a genome-wide scale, and RNAi-based methods became the predominant approach for genome-wide knockdown screens.

Nevertheless, RNAi-based approaches to genome-wide knockdown screens have their limitations. For one, the high off-target effects cause issues with false-positive observations. Additionally, because RNAi reduces gene expression at the post-transcriptional level by targeting RNA, RNAi-based screens only result in partial and short-term suppression of genes. Whilst partial knockdown may be desirable in certain situations, a technology with improved targeting efficiency and fewer off-target effects was needed.

Since initial identification as a prokaryotic adaptive immune system, the bacterial type II clustered regularly interspaced short palindrome repeats (CRISPR)/Cas9 system has become a simple and efficient tool for generating targeted LOF mutations. It has been successfully applied to edit human genomes, and has started to displace RNAi as the dominant tool in mammalian studies. In the context of genome-wide knockout screens, recent studies have demonstrated that CRISPR/Cas9 screens are able to achieve highly efficient and complete protein depletion, and overcome the off-target issues seen with RNAi screens. In summary, the recent emergence of CRISPR-Cas9 has dramatically increased our ability to perform large-scale LOF screens. The versatility and programmability of Cas9, coupled with the low noise, high knockout efficiency and minimal off-target effects, have made CRISPR the platform of choice for many researchers engaging in gene targeting and editing.

CRISPR/Cas9 Loss of function


The clustered regularly interspaced short palindrome repeats (CRISPR)/Cas9 system is a gene-editing technology that can introduce double-strand breaks (DSBs) at a target genomic locus. By using a single guide RNA (sgRNA), the endonuclease Cas9 can be delivered to a specific DNA sequence where it cleaves the nucleotide chain. The specificity of the sgRNA is determined by a 20-nt sequence, homologous to the genomic locus of interest, and the binding to Cas9 is mediated by a constant scaffold region of the sgRNA. The desired target site must be immediately followed (5’ to 3’) by a conserved 3 nucleotide protospacer adjacent motif (PAM). In order to repair the DSBs, the cell may use the highly error prone non-homologous end joining, or homologous recombination. By designing suitable sgRNAs, planned insertions or deletions can be introduced into the genome. In the context of genome-wide LOF screens, the aim is to cause gene disruption and knockout.

Constructing a Library
To perform CRISPR knockouts on a genome-wide scale, collections of sgRNAs known as sgRNA libraries, or CRISPR knockout libraries, must be generated. The first step in creating a sgRNA library is to identify genomic regions of interest based on known sgRNA targeting rules. For example, sgRNAs are most efficient when targeting the coding regions of genes and not the 5’ and 3’ UTRs. Conserved exons present as attractive targets, and position relative to the transcription start site should be considered. Secondly, all the possible PAM sites are identified and selected for. On- and off-target activity should be analysed, as should GC content, and homopolymer stretches should be avoided. The most commonly used Cas9 endonuclease, derived from Streptococcus pyogenes, recognises a PAM sequence of NGG.

Furthermore, specific nucleotides appear to be favoured at specific locations. Guanine is strongly favoured over cytosine on position 20 right next to the PAM motif, and on position 16 cytosine is preferred over guanine. For the variable nucleotide in the NGG PAM motif, it has been shown that cytosine is preferred and thymine disfavoured. With such criteria taken into account, the sgRNA library is computationally designed around the selected PAM sites.

Multiple sgRNAs (at least 4–6) should be created against every single gene to limit false-positive detection, and negative control sgRNAs with no known targets should be included. The sgRNAs are then created by in situ synthesis, amplified by PCR, and cloned into a vector delivery system.

Existing libraries
Developing a new sgRNA library is a laborious and time-consuming process. In practice, researchers may select an existing library depending on their experimental purpose and cell lines of interest. As of February 2020, the most widely used resources for genome-wide CRISPR knockout screens have been the two Genome-Scale CRISPR Knock-Out (GeCKO) libraries created by the Zhang lab. Available through Addgene, these lentiviral libraries respectively target human and mouse exons, and both are available as a one-vector system (where the sgRNAs and Cas9 are present on the same plasmid) or as a two-vector system (where the sgRNAs and Cas9 are present on separate plasmids). Each library is delivered as two half-libraries, allowing researchers to screen with 3 or 6 sgRNAs/gene.

Aside from GeCKO, a number of other CRISPR libraries have been generated and made available through Addgene. The Sabatini & Lander labs currently have 7 separate human and mouse libraries, including targeted sublibraries for distinct subpools such as kinases and ribosomal genes (Addgene #51043–51048). Further, improvements to the specificity of sgRNAs have resulted in ‘second generation’ libraries, such as the Brie (Addgene #73632) and Brunello (Addgene #73178) libraries generated by the Doench and Root labs, and the Toronto knockout (TKO) library (Addgene #1000000069) generated by the Moffat lab.

Lentiviral vectors
Targeted gene knockout using CRISPR/Cas9 requires the use of a delivery system to introduce the sgRNA and Cas9 into the cell. Although a number of different delivery systems are potentially available for CRISPR, genome-wide loss-of-function screens are predominantly carried out using third generation lentiviral vectors. These lentiviral vectors are able to efficiently transduce a broad range of cell types and stably integrate into the genome of dividing and non-dividing cells. Third generation lentiviral particles are produced by co-transfecting 293T human embryonic kidney (HEK) cells with:


 * 1) two packaging plasmids, one encoding Rev and the other Gag and Pol;
 * 2) an interchangeable envelope plasmid that encodes for an envelope glycoprotein of another virus (most commonly the G protein of vesicular stomatitis virus (VSV-G));
 * 3) one or two (depending on the applied library) transfer plasmids, encoding for Cas9 and sgRNA, as well as selection markers.

The lentiviral particle-containing supernatant is harvested, concentrated and subsequently used to infect the target cells. The exact protocol for lentiviral production will vary depending on the research aim and applied library. If a two vector-system is used, for example, cells are sequentially transduced with Cas9 and sgRNA in a two-step procedure. Although more complex, this has the advantage of a higher titre for the sgRNA library virus.

Phenotypic selection
In general, there are two different formats of genome-wide CRISPR knockout screens: arrayed and pooled. In an arrayed screen, each well contains a specific and known sgRNA targeting a specific gene. Since the sgRNA responsible for each phenotype is known based on well location, phenotypes can be identified and analysed without requiring genetic sequencing. This format allows for the measurement of more specific cellular phenotypes, perhaps by fluorescence or luminescence, and allows researchers to use more library types and delivery methods. For large-scale LOF screens, however, arrayed formats are considered low-efficiency, and expensive in terms of financial and material resources because cell populations have to be isolated and cultured individually.

In a pooled screen, cells grown in a single vessel are transduced in bulk with viral vectors collectively containing the entire sgRNA library. To ensure that the amount of cells infected by more than one sgRNA-containing particle is limited, a low multiplicity of infection (MOI) (typically 0.3-0.6) is used. Evidence so far has suggested that each sgRNA should be represented in a minimum of 200 cells. Transduced cells will be selected for, followed by positive or negative selection for the phenotype of interest, and genetic sequencing will be necessary to identify the integrated sgRNAs.

Next-generation sequencing & hit analysis
Following phenotypic selection, genomic DNA is extracted from the selected clones, alongside a control cell population. In the most common protocols for genome-wide knockouts, a 'Next-generation sequencing (NGS) library' is created by a two step polymerase chain reaction (PCR). The first step amplifies the sgRNA region, using primers specific to the lentiviral integration sequence, and the second step adds Illumina i5 and i7 sequences. NGS of the PCR products allows the recovered sgRNAs to be identified, and a quantification step can be used to determine the relative abundance of each sgRNA.

The final step in the screen is to computationally evaluate the significantly enriched or depleted sgRNAs, trace them back to their corresponding genes, and in turn determine which genes and pathways could be responsible for the observed phenotype. Several algorithms are currently available for this purpose, with the most popular being the Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method. Developed specifically for CRISPR/Cas9 knockout screens in 2014, MAGeCK demonstrated better performance compared with alternative algorithms at the time, and has since demonstrated robust results and high sensitivity across different experimental conditions. As of 2015, the MAGeCK algorithm has been extended to introduce quality control measurements, and account for the previously overlooked sgRNA knockout efficiency. A web-based visualisation tool (VISPR) was also integrated, allowing users to interactively explore the results, analysis, and quality controls.

Cellular signaling mechanisms
Over recent years, the genome-wide CRISPR screen has emerged as a powerful tool for studying the intricate networks of cellular signaling. Cellular signaling is essential for a number of fundamental biological processes, including cell growth, proliferation, differentiation, and apoptosis.

One practical example is the identification of genes required for proliferative signaling in cancer cells. Cells are transduced with a CRISPR sgRNA library, and studied for growth over time. By comparing sgRNA abundance in selected cells to a control, one can identify which sgRNAs become depleted and in turn which genes may be responsible for the proliferation defect. Such screens have been used to identify cancer-essential genes in acute myeloid leukemia and neuroblastoma, and to describe tumor-specific differences between cancer cell lines.

Identifying synthetic lethal partners
Targeted cancer therapies are designed to target the specific genes, proteins, or environments contributing to tumor cell growth or survival. After a period of prolonged treatment with these therapies, however, tumor cells may develop resistance. Although the mechanisms behind cancer drug resistance are poorly understood, potential causes include: target alteration, drug degradation, apoptosis escape, and epigenetic alterations. Resistance is well-recognised and poses a serious problem in cancer management.

To overcome this problem, a synthetic lethal partner can be identified. Genome-wide LOF screens using CRISPR-Cas9 can be used to screen for synthetic lethal partners. For this, a wild-type cell line and a tumor cell line containing the resistance-causing mutation are transduced with a CRISPR sgRNA library. The two cell lines are cultivated, and any under-represented or dead cells are analyzed to identify potential synthetic lethal partner genes. A recent study by Hinze et al. (2019) used this method to identify a synthetic lethal interaction between the chemotherapy drug asparaginase and two genes in the Wnt signalling pathway NKD2 and LGR6.

Host dependency factors for viral infection
Due to their small genomes and limited number of encoded proteins, viruses exploit host proteins for entry, replication, and transmission. Identification of such host proteins, also termed host dependency factors (HDFs), is particularly important for identifying therapeutic targets. Over recent years, many groups have successfully used genome-wide CRISPR/Cas9 as a screening strategy for HDFs in viral infections.

One example is provided by Marceau et al. (2017), who aimed to dissect the host factors associated with dengue and hepatitis C (HCV) infection (two viruses in family Flaviviridae). ELAVL1, an RNA-binding protein encoded by the ELAVL1 gene, was found to be a critical receptor for HCV entry, and a remarkable divergence in host dependency factors was demonstrated between the two flaviviridae.

Further applications
Additional reported applications of genome-wide CRISPR screens include the study of: mitochondrial metabolism, bacterial toxin resistance, genetic drivers of metastasis, cancer drug resistance, West Nile virus-induced cell death, and immune cell gene networks.

Limitations
''This section will specifically address genome-wide CRISPR screens. For a review of CRISPR limitations see Lino et al. (2018) ''

The sgRNA library
Genome-wide CRISPR screens will ultimately be limited by the properties of the chosen sgRNA library. Each library will contain a different set of sgRNAs, and average coverage per gene may vary. Currently available libraries tend to be biased towards sgRNAs targeting early (5’) protein-coding exons, rather than those targeting the more functional protein domains. This problem was highlighted by Hinze et al. (2019), who noted that genes associated with asparaginase sensitivity failed to score in their genome-wide screen of asparaginase-resistant leukemia cells.

If an appropriate library is not available, creating and amplifying a new sgRNA library is a lengthy process which may take many months. Potential challenges include: (i) effective sgRNA design; (ii) ensuring comprehensive sgRNA coverage throughout the genome; (iii) lentiviral vector backbone design; (iv) producing sufficient amounts of high-quality lentivirus; (v) overcoming low transformation efficiency; (vi) proper scaling of the bacterial culture.

Maintaining cellular sgRNA coverage
One of the largest hurdles for genome-wide CRISPR screening is ensuring adequate coverage of the sgRNA library across the cell population. Evidence so far has suggested that each sgRNA should be represented and maintained in a minimum of 200-300 cells.

Considering that the standard protocol uses a multiplicity of infection of ~0.3, and a transduction efficiency of 30-40% the number of cells required to produce and maintain suitable coverage becomes very large. By way of example, the most popular human sgRNA library is the GeCKO v2 library created by the Zhang lab; it contains 123,411 sgRNAs. Studies using this library commonly transduce more than 1x108 cells

As CRISPR continues to exhibit low noise and minimal off-target effects, an alternative strategy is to reduce the number of sgRNAs per gene for a primary screen. Less stringent cut-offs are used for hit selection, and additional sgRNAs are later used in a more specific secondary screen. This approach is demonstrated by Doench et al. (2016), who found that >92% of genes recovered using the standard protocol were also recovered using fewer sgRNAs per gene. They suggest that this strategy could be useful in studies where scale-up is prohibitively costly.

Lentiviral limitations
Lentiviral vectors have certain general limitations. For one, it is impossible to control where the viral genome integrates into the host genome, and this may affect important functions of the cell. Vannucci et al. provide an excellent review of viral vectors along with their general advantages and disadvantages. In the specific context of genome-wide CRISPR screens, producing and transducing the lentiviral particles is relatively laborious and time-consuming, taking about two weeks in total. Additionally, because the DNA integrates into the host genome, lentiviral delivery leads to long-term expression of Cas9, potentially leading to off-target effects.

Arrayed vs pooled screens
In an arrayed screen, each well contains a specific and known sgRNA targeting a specific gene. Arrayed screens therefore allow for detailed profiling of a single cell, but are limited by high costs and the labour required to isolate and culture the high number of individual cell populations. Conventional pooled CRISPR screens are relatively simple and cost effective to perform, but are limited to the study of the entire cell population. This means that rare phenotypes may be more difficult to identify, and only crude phenotypes can be selected for e.g. cell survival, proliferation, or reporter gene expression.

Culture media
The choice of culture medium might affect the physiological relevance of findings from cell culture experiments due to the differences in the nutrient composition and concentrations. A systematic bias in generated datasets was recently shown for CRISPR and RNAi gene silencing screens (especially for metabolic genes), and for metabolic profiling of cancer cell lines. For example, a stronger dependence on ASNS (asparagine synthetase) was found in cell lines cultured in DMEM, which lacks asparagine, compared to cell lines cultured in RPMI or F12 (containing asparagine). Avoiding such bias might be achieved by using a uniform media for all screened cell lines, and ideally, using a growth medium that better represents the physiological levels of nutrients. Recently, such media types, as Plasmax and Human Plasma Like Medium (HPLM), were developed.

CRISPR + single cell RNA-seq
Emerging technologies are aiming to combine pooled CRISPR screens with the detailed resolution of massively parallel single-cell RNA-sequencing (RNA-seq). Studies utilising “CRISP-seq”, “CROP-seq”, and “PERTURB-seq” have demonstrated rich genomic readouts, accurately identifying gene expression signatures for individual gene knockouts in a complex pool of cells. These methods have the added benefit of producing transcriptional profiles of the sgRNA-induced cells.