CITE-Seq

CITE-Seq (Cellular Indexing of Transcriptomes and Epitopes by Sequencing) is a method for performing RNA sequencing along with gaining quantitative and qualitative information on surface proteins with available antibodies on a single cell level. So far, the method has been demonstrated to work with only a few proteins per cell. As such, it provides an additional layer of information for the same cell by combining both proteomics and transcriptomics data. For phenotyping, this method has been shown to be as accurate as flow cytometry (a gold standard) by the groups that developed it. It is currently one of the main methods, along with REAP-Seq, to evaluate both gene expression and protein levels simultaneously in different species.

The method was established by the New York Genome Center in collaboration with the Satija lab., while a similar approach was earlier shown by AbVitro Inc..

Applications
Concurrent measurement of both protein and transcript levels opens up opportunities to use CITE-Seq in various biological areas, some of which were touched upon by the developers. For instance, it may be used to characterize tumor heterogeneity in different cancers, a major research field. It also permits identifying rare subpopulations of cells as a high-throughput single-cell method and thus detect information otherwise lost with bulk methods. It also may aid in tumor classification - for example, identification of novel subtypes. All of the above are possible due to single-cell output of both protein and transcript data at the same time, also leading to novel information on protein-RNA correlation.

It also has potential in immunology. For example, it can be utilized for immune cell characterization – recent research on T-cells has investigated the ability of T cells to maintain an effector state. Another study by one of CITE-Seq coauthors suggested CITE-Seq as a methods to look at the mechanisms of host-pathogen interactions.

Workflow
CITE-seq, like any other sequencing technique, has a wet lab portion, where the actual antibodies are prepared, cells stained, cDNA synthesized and RNA libraries are prepared that are further sequenced, and a dry lab portion for analysis of the sequencing data obtained. The most crucial part in the wet lab experiments is designing the antibody-oligonucleotide conjugates and titrating the amount of each conjugate that needs to be present in the pool to achieve a desired read-out and quantification.

Wet lab workflow
The first step involves preparation of the antibody-oligo conjugates also known as A ntibody- D erived T ags (ADTs). ADT preparation involves labeling an antibody directed against a cell surface protein of interest with oligonucleotides for barcoding the antibody.

Once you have the ADTs, the next step is to bind the cells with the desired ADT pool. The scRNA-seq libraries can be prepared using Drop-seq, 10X Genomics or ddSeq methods. In brief, ADT labelled cells are encapsulated within a droplet as single cells with DNA-barcoded microbeads.

Within a droplet, the cells are next lysed to release both bound ADTs as well as mRNA. These then are converted to cDNA. Each DNA sequence on a microbead has a unique barcode thus indexing cDNA with cell barcodes. cDNA is prepared from both ADTs and cellular mRNAs.

In the next step, based on the developer's guidelines, cDNA is PCR-amplified and ADT cDNA and mRNA cDNA are separated based on size (generally, ADT-derived cDNAs are < 180bp and mRNA-derived cDNAs are > 300bp). Each of the separated cDNA molecules is independently amplified and purified to prepare sequencing libraries. Finally, the independent libraries are pooled together and sequenced. Thus, proteomics and transcriptomics data can be obtained from a single sequencing run.

Dry lab workflow
Analysis of single-cell sequencing presents many challenges, such as determining the best way to normalize the data. Due to a new level of complications that arise from sequencing of both proteins and transcripts at a single-cell level, the developers of CITE-Seq and their collaborators are maintaining several tools to help with data analysis.

scRNA-Seq data analysis based on the developer's guidelines: The initial analysis steps are the same as in a standard scRNA-Seq experiment. Firstly, reads need to be aligned to a reference genome of a species of interest and cells with very low number of transcripts mapped to the reference are removed. Finally, a normalized count matrix with gene expression values is obtained.

ADT data analysis  (based on the developer's guidelines): CITE-seq-Count is a Python package from CITE-Seq developers that can be used to obtain raw counts. Seurat package from Satija lab further allows combining of the protein and RNA counts and performing clustering on both measurements, as well as doing differential expression analysis between cell clusters of interest. ADT quantification needs to take into account the differences between the antibodies. Additionally, filtering may be required to reduce noise, similarly to scRNA-Seq analysis. But in contrast to RNA data, due to higher amounts of protein in a cell, there is less dropout.

The analyses may result in identification of novel cell clusters through such methods as PCA or tSNE, crucial genes responsible for a specific cell function and other new knowledge specific to a question of interest. In general, the results obtained with ADT counts substantially increase the amount of information obtained through single cell transcriptomics.

Adaptations of the technique
The applications of antibody-oligonucleotide conjugates have expanded beyond CITE-seq, and can be adapted for sample multiplexing as well as CRISPR screens.

Cell Hashing: New York Genome Center further adapted the use of their antibody-oligonucleotide conjugates to enable sample multiplexing for scRNA-seq. This technique called, Cell Hashing, uses oligonucleotide-labelled antibodies against ubiquitously expressed cell surface proteins from a particular tissue sample. In this case, an oligonucleotide sequence contains a unique barcode which would be specific to cells from distinct samples. This sample-specific cell tagging allows pooling of the sequencing libraries prepared from different samples on a sequencing platform. Sequencing the antibody tags along with the cellular transcriptome helps identify a sample of origin for each analyzed cell. A unique barcode sequence used on the cell hashing antibody can be designed to be different from an antibody barcode present on the ADTs used in CITE-seq. This makes it possible to couple cell hashing with CITE-seq on a single sequencing run. Cell hashing allows super-loading of the scRNA-seq platform, resulting in a lower cost of sequencing. It also enables detection of artifactual signals from multiplets, a major challenge in scRNA-seq. The cell hashing method has further been used by Gaublomme et al. to multiplex single-nucleus RNA-seq (snRNA-seq) by performing nucleus hashing.

ECCITE-seq: E xpanded C RISPR-compatible C ellular I ndexing of T ranscriptomes and E pitopes by sequencing or ECCITE-seq was developed to apply the use of CITE-seq to characterize multiple modalities from a single cell. By modifying the basic CITE-seq protocol to a 5' tag-based scRNA-seq assay, it can detect transcriptome, immune receptor clonotypes, surface markers, sample identity and single guide RNAs (sgRNAs) from each single cell. The ability of ECCITE-seq to detect sgRNA molecules and measure their effect on gene expression levels opens a prospect of applying this technique in CRISPR screens.

Advantages and Limitations of CITE-seq
Advantages: CITE-seq enables simultaneous analysis of the transcriptome as well as the proteome of single cells. Previous efforts of coupling index-sorting measurements from single cell sorts with scRNA-seq were limited to running a small sample size and were not compatible with multiplexing and massive parallel high-throughput sequencing. CITE-seq has been shown to be compatible with high-throughput microfluidic platforms like 10X Genomics and Drop-seq. It is also adaptable to micro/nano-well platforms. Coupling it with cell hashing enables the application of CITE-seq on bulk samples and sample multiplexing. These techniques work to reduce an overall cost of high-throughput sequencing on multiple samples. Lastly, CITE-seq can be adapted to detect small molecules, RNA interference, CRISPR, and other gene editing techniques.

Limitations: One of the limitations of CITE-Seq is a loss of location information. Due to the way the cells are treated, the spatial distribution of cells within a sample, as well as proteins within a cell is not known. In addition, this method shares the challenges of scRNA-Seq, such as high amount of noise and possible challenges in detecting lowly expressed genes. In terms of phenotyping, optimization of the assay and antibodies also presents a potential problem if proteins of interest are not included in the currently available panels. Moreover, right now CITE-Seq is not able to detect intracellular proteins. With the current protocol, there are many challenges that would arise during the permeabilization step, thus limiting the technique to surface markers.

Alternative methods

 * REAP-seq: Peterson et al. from Merck developed a technique similar to CITE-seq called RNA Expression and Protein Sequencing assay (REAP-seq). While REAP-seq, similarly to CITE-seq, measures levels of both transcripts and proteins in a single cell, the difference between the two techniques is how the antibody is conjugated to the oligonucleotides. CITE-seq typically links the oligonucleotide to the antibody non-covalently, via streptavidin conjugation to the antibody and biotin conjugation to the oligonucleotide. REAP-seq covalently links the antibody and an aminated DNA barcode
 * PLAYR: PLAYR or Proximal Ligation Assay for RNA makes use of mass spectrometry to simultaneously analyse the transcriptome and protein levels in single cells. In this technique both the proteins and RNA transcripts are labelled with isotope-conjugated antibodies and isotope-labelled probes, respectively, enabling their detection on a mass spectrometer