Ribonucleoprotein Networks Analyzed by Mutational Profiling

Ribonucleoprotein Networks Analyzed by Mutational Profiling (RNP-MaP) is a strategy for probing RNA-protein networks and protein binding sites at a nucleotide resolution. Information about RNP assembly and function can facilitate a better understanding of biological mechanisms. RNP-MaP uses NHS-diazirine (SDA), a hetero-bifunctional crosslinker, to freeze RNA-bound proteins in place. Once the RNA-protein crosslinks are formed, MaP reverse transcription is then conducted to reversely transcribe the protein-bound RNAs as well as introduce mutations at the site of RNA-protein crosslinks. Sequencing results of the cDNAs reveal information about both protein-RNA interaction networks and protein binding sites.

Components
RNA-MaP involves three major components:


 * Ribonucleoproteins (RNPs): complexes made up of RNAs and RNA-binding proteins (RBPs)
 * NHS-diazirine (SDA): a cell permeable crosslinking reagent. SDA contains two reactive groups - a diazirine and a succinimidyl ester. The reaction between succinimidyl esters and amine groups (e.g. lysine side chains) results in peptide bonds (or amide bonds). When exposed to UV light with a wavelength of 365 nm, an intermediate broadly reactive toward nucleotide riboses and bases is formed. As a result, proteins are crosslinked with RNA by the SDA linker.
 * Mutational profiling (MaP): a method using reverse transcriptase with relaxed fidelity to incorporate modified residues at protein-RNA binding sites.

Workflow
Long-wavelength UV and SDA reagents are first supplied to living cells to crosslink protein residues with RNA by forming amide bonds between amine groups of lysine (or arginine) residues and succinimidyl esters. Next, cells containing crosslinked RNPs are lysed and the RNA-bound proteins are digested into peptide adducts. MaP reverse transcription is then performed to label the protein-RNA binding sites through peptide adduct-induced mutations. Sequencing of the mutation-containing cDNA product will reveal the mutation sites (or RNP-MaP sites) and the correlations between the RNP-MaP sites are computationally determined using 3-nucleotide windows.

RNP-MaP site identification
RNP-MaP sites are defined as protein bound nucleotides. SDA and UV treated and UV only treated sample sequence reads are aligned and mutations are counted using ShapeMapper2 software. The SDA or RNP-MaP reactivity for a nucleotide is the ratio of the crosslinked (SDA and UV treated) mutational frequency to the un-crosslinked (UV only) mutation frequency. Using differential mutational signatures, RNP-MaP sites are identified based on universal normalization factors and thresholds on each RNA nucleotide (U, A, C, and G) derived from analysis of ribonucleoproteins of known structure.

A nucleotide is identified as a RNP-MaP site if it passes three filters:


 * 1) The number of mutation events in the SDA + UV-treated sample is at least 50 greater than the UV-treated sample.
 * 2) Site reactivities must exceed the nucleotide-dependent thresholds ($$T_x$$) that are empirically defined as:
 * 3) * $$T_x = \frac{BG_{X>10} - MED_{all X}}{SD_{all X}}$$, where
 * 4) ** $$X$$ is the nucleotide U, A, C, or G
 * 5) ** $$BG_X$$ is the background threshold defined by the 90% reactivity values of nucleotides in a >10Å group
 * 6) ** $$MED_{all X}$$ is the median of all reactivities for all nucleotides
 * 7) ** $$SD_{all X}$$ is the standard deviation of reactivities for all nucleotides
 * 8) The calculated Z-factor is greater than zero
 * 9) * The calculated Z-factor is defined as
 * 10) ** $$Z-factor = 1-\frac{2.575(\sigma_{SDA+UV}+\sigma_{UV})}{|\text{mutation rate}_{SDA+UV}-\text{mutation rate}_{UV}|}$$, where
 * 11) *** $$\text{mutation rate}_{SDA+UV}$$ is the mutation rate of a nucleotide treated with SDA and UV and $$\text{mutation rate}_{UV}$$ is the mutation rate of a nucleotide treated with UV only
 * 12) *** $$\sigma_{nt} = \frac{\sqrt{\text{mutation rate}_{nt}}}{\sqrt{\text{reads}_{nt}}}$$, where $$nt$$ is treatment (SDA + UV or UV only)

Protein-RNA interaction network identification
Protein-RNA interactions networks are identified using RNP-MaP correlations since multiple crosslink sites can be detected for a single RNA molecule. RNP-MaP correlations provide a complementary measure of protein binding to RNA independent of RNP-MaP sites. They are identified using a G-test framework known as RingMapper.

RNP-MaP correlations require a single RNA molecule to form at least two crosslinks and arise from any of three scenarios:


 * 1) A single protein binds to two locations of one RNA
 * 2) Two proteins that interact and bind to two locations on one RNA
 * 3) Two proteins are deposited on two locations on one RNA by a coordinated assembly process

Using RNP-MaP correlations, a network of protein-RNA interaction sites is found and can then be used for functional analysis.

Cross-linking immunoprecipitation (CLIP)
CLIP analyzes protein interactions with RNA by combining UV cross-linking and immunoprecipitation. CLIP-based techniques are able to map RNA binding protein binding sites of interest on a genome-wide scale.

There are many CLIP-based methods including:


 * HITS-CLIP (High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation or CLIP-seq)
 * PAR-CLIP (Photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation)
 * iCLIP (Individual nucleotide-resolution cross-linking and immunoprecipitation)
 * eCLIP (Enhanced cross-linking and immunoprecipitation followed by high-throughput sequencing)
 * sCLIP (Simple cross-linking and immunoprecipitation)

Mass spectrometry
Quantitative mass spectrometry (MS) (or quantitative proteomics) can be used to discover RNA-binding proteins (RBPs) bound to RNA. Labeling MS methods involve the differential use of stable isotope labels or chemical tagging of proteins in samples and controls. This is used to obtain enrichment scores and true binding partners through the ratio of labeled peptides. Label-free MS methods are able to identify proteins in samples and controls. In order to distinguish true binding partners for nonspecific proteins, analytical tools used alongside spectral count data from non-quantitative MS are used to score the probability of a true RBP-RNA interaction

Advantages
RNP-MaP can help reveal functionally important RNA-protein binding networks through binding site density and interconnectivity independent of previous knowledge of interacting proteins. Because of the unbiased nature of the analysis, RNP-MaP is able to detect conserved RNA-protein interactions between species.

RNP-MaP is also able to facilitate the characterization of functionally critical elements in large non-coding RNAs or even viral RNAs.

Limitations
As a standalone technique, RNP-MaP cannot be used to determine protein-RNA binding mechanisms or protein identities. In order to do so, RNP-MaP must be used in conjunction with other techniques such as CLIP and mass spectrometry.

RNP-MaP requires extremely high read-depths for analysis. To identify RNP-MaP sites, 1000x sequencing coverage is required, while RNP-MaP correlation sites require 10,000x sequencing coverage.

There are severe limitations on the ability to characterize RNP-MaP correlations between distant (>500 nucleotides) RNP-MaP sites. This is due to limitations of MaP reverse transcription processivity (500-600 nucleotides) and sequencing instrument clustering (<1,000 nucleotides).