GESTALT

Genome editing of synthetic target arrays for lineage tracing (GESTALT) is a method used to determine the developmental lineages of cells in multicellular systems. GESTALT involves introducing a small DNA barcode that contains regularly spaced CRISPR/Cas9 target sites into the genomes of progenitor cells. Alongside the barcode, Cas9 and sgRNA are introduced into the cells. Mutations in the barcode accumulate during the course of cell divisions and the unique combination of mutations in a cell's barcode can be determined by DNA or RNA sequencing to link it to a developmental lineage.

Background
Fate mapping is the process of identifying the embryonic origins of adult tissues. Lineage tracing is more specific, encompassing methods which examine the progeny that arise from a single/few cells. One of the first lineage tracing methods developed involved the injection of dyes into specific cells of an early embryo, thereby labeling them and their progeny at each cell division. Later methods used retroviral labeling, employing retroviruses to introduce a marker gene like fluorescent protein or beta-galactosidase into the genomes of the cells of interest, resulting in constitutive expression of the marker in those cells and their progeny. These methods have the drawback of being invasive, and relatively difficult in targeting which cells to label. Currently, the most widely used approach involves cell labeling via genetic recombination systems. These methods use recombinases, the two main ones being the Cre-loxP and Flp-firt systems, which can delete segments of DNA flanked by the loxP and frt sites, respectively. In this method, a transgenic model is created that can express Cre recombinase and has a reporter gene with an upstream stop cassette flanked by loxP sites. Cre recombination deletes the STOP cassette upstream of a reporter gene, allowing for expression of the reporter. Spatial control over the labeled cells is achieved by using specific Cre alleles under the control elements of a chosen marker gene, and temporal control can be obtained if inducible Cre alleles are used. For example, CreERT only has active recombination activity upon administration of tamoxifen. Although powerful, it requires significant optimization to facilitate single cell lineage tracing and is low throughput. Sequencing-based methods of lineage tracing have begun to emerge as they provide significantly higher resolution and high-throughput tracing of cell fate. Early approaches leveraged naturally occurring somatic mutations to identify cell lineage relationships.

Principles
GESTALT takes advantage of the CRISPR-Cas9 system, which allows for the targeting of double stranded breaks in DNA to highly specific sites adjacent to PAM motifs based on the sequence of the sgRNA. These breaks are then repaired by one of the endogenous cellular DNA mechanisms: non-homologous end joining DNA repair, or homology-directed repair. Non-homolgous end joining is the more active of the two repair pathways, resulting in indels occurring at the targeted site. The GESTALT system uses an array of ten CRISPR/Cas9 targets, with the first site having perfect specificity to the designed sgRNA, and the other nine having less Cas9 activity due to mismatches with the sgRNA. Introducing the CRISPR-Cas9 reagents to cells carrying this array will cause the accumulation of indels at potentially each target of the array, marking the cell with a unique barcode sequence that can be used to identify it and its progeny via DNA or RNA-sequencing.

Design of the barcode array
The target sequences are 23 bp long, including a protospacer and PAM sequence. The target sequences are placed in contiguous array, separated by 3 to 5 bp linker sequences. Each target sequence must be screened against the genome of the host organism to ensure the specificity of the target sequences. Cas9 activity at each target site can be assessed using the GUIDE-seq assay.

Introducing the array into the target cell/organism
Two separate methods of introducing barcode arrays into the genomes of cells are used. The first method transduces progenitor cells with a lentivirus construct containing the barcode array inserted into the 3'-UTR of EGFP. This results in the incorporation of the barcode array into the genome and marks barcoded cells through stable expression of EGFP. A second method involves creating transgenic animal lines; the transgenic model has previously been generated using a Tol2 transgenesis vector which contains a barcode array cloned into the 3' UTR of DsRed under control of the ubiquitin promoter.

Induction of the CRISPR-Cas9-mediated editing of cellular barcodes
Initiation of barcode editing and labeling of cells is done by introducing the Cas9 protein and sgRNAs into progenitor cells. The CRISPR-Cas9 complex randomly produces double-stranded breaks in the barcode regions and subsequent NHEJ repair introduces random indels, resulting in a unique DNA sequence at the barcode region in each cell at time of labeling. There are multiple methods of delivering the CRISPR-Cas9 reagents into cells and it is an active field of research. CRISPR-Cas9 reagents can be introduced into cells via transfection using lipid nanoparticles. Alternatively, microinjection of the CRISPR-Cas9 reagents can be performed on 1-cell embryos. The delivery of CRISPR-Cas9 reagents can be done at different developmental times to change the labeled populations. Barcode editing may persist for several hours after delivery.

Sequencing of barcodes and reconstruction of cell lineage tree
Following delivery of the CRISPR-Cas9 reagents, time is allowed for barcode editing and further development to occur, resulting in the expansion of the labeled populations and the unique marking of their progeny. Genomic DNA or RNA can then be extracted from the progeny cells or tissues of interest and the barcodes can be PCR-amplified. Unique molecular identifiers are used to correct for PCR bias and each UMI-barcode combo is therefore from a single cell. All barcode alleles can then be sequenced via NGS and the entire set of identified alleles can be subjected to phylogenetic analysis, identifying cell lineage based on barcode similarity. To control for sequencing error, only indels can be considered as most sequencing errors inherent to next-generation sequencing are base substitutions.

scGESTALT
Single cell GESTALT (scGESTALT) adds upon the GESTALT system by integrating simultaneous capture of barcode and transcriptome information using scRNA-seq. In scGESTALT, the barcode is cloned into progenitor cells of interest downstream of an inducible promoter. When the developmental period is complete, expression of the barcode will be induced and the barcode mRNA will be sequenced alongside the rest of the transcriptome using scRNA-seq. The transcriptomic data can be used to track cell type differentiation while the barcodes can be used to create developmental relationships with other cells. An additional improvement is the ability to induce labeling at two different time points. This is enabled through the cloning of the Cas9/sgRNA under a heat shock promoter; the first labeling event is induced via microinjection like traditional GESTALT, while a subsequent second labeling period is initiated by heat shock-induced expression of Cas9 and sgRNAs. This enables lineage tracing during later stages of development, beyond what is possible with GESTALT.

Limitations

 * GESTALT is restricted to early embryogenesis because microinjection of Cas9 and sgRNA is only viable when performed on a small number of progenitor cells. As a result, barcode editing is restricted to early development, meaning that deciphering later lineage relationships is not possible. This limitation was partially addressed by the development of scGESTALT and related methods with inducible Cas9 and sgRNA expression systems which enable labeling at later developmental time points.
 * The barcode sequences of GESTALT alone do not provide any information about the cell type the barcode was identified in. scGESTALT addresses this challenge by linking the barcodes to the transcriptome of the cell, allowing for determination of cell identity.
 * In a portion of the cells, overlapping deletions may result in the loss of previously accrued marks in the barcode region, resulting in the loss of lineage information.
 * It is possible that a similar or identical edit can emerge by chance in cells belonging to two separate lineages, resulting in the erroneous association of those lineages.
 * scGESTALT suffers from the same drop-out issues observed in single-cell methods. Barcode sequences are only captured in 30% of the cells. Additionally, some cell types may silence the expression of the barcode construct, resulting in loss of lineage information for that cell type.

Applications
GESTALT was initially developed to examine the contributions of embryonic progenitors to the adult organ systems of zebrafish. By sequencing the barcodes from bulk extractions of organ systems, each organ was found to possess only a small number of the barcode alleles, indicating that organs arise from the clonal expansion of a small number of early progenitors. The lineage information of thousands of differentiated cells was captured in the experiment and demonstrated the high-throughput lineage tracing capabilities of GESTALT.

scGESTALT has been used to refine the lineage tree of the zebrafish brain. The existence of multipotent progenitors which give rise to cells that migrate across the brain was discovered following a scGESTALT experiment where some barcode sequences were captured in cell populations in the forebrain, midbrain, and the hindbrain. Pseudotime trajectories generated using the scRNA-seq data for oligodendrocyte progenitors to oligodendrocytes as well as atoh1c+ progenitors to pax6b+ neurons were found to be consistent with the barcode distribution across those cell types.

Related techniques

 * Memory by Engineered Mutagenesis with Optical In Situ Readout (MEMOIR) is a related lineage tracing method that relies on Cas9/gRNA modification of a barcode. There are two major differences from GESTALT. Firstly, instead of introducing mutations to a barcode, in MEMOIR the Cas9-sgRNA deletes regions of the barcode. Secondly, instead of traditional sequencing, MEMOIR employs sequential multiplexed single-molecule RNA fluorescence hybridization (seqFISH) to in-situ read the barcode within single cells.
 * Lineage tracing by nuclease-activated editing of ubiquitous sequences (LINNAEUS) is an attempt to improve upon scGESTALT. In LINNAEUS, the barcode is replaced with multiple transgenic reporter genes which are targeted by Cas9/sgRNA. The reporter genes are spread throughout the genome which ensures that subsequent Cas9 editing does not overwrite previous editing.
 * ScarTrace is a method based on the same principles as scGESTALT. It tracks cell lineages through Cas9/sgRNA editing of a barcode composed of eight in-tandem copies of a histone–green fluorescent protein (GFP) transgene. ScarTrace integrates scRNA-seq data for cell type analysis but instead of only sequencing the barcode from mRNA as in scGESTALT, ScarTrace also uses a nested PCR to amplify the barcode from gDNA. This is purported to be more reliable as the mRNA barcode could be unstable or situationally silenced.