User:Timothyinhksar

Sandbox for MIP page

Molecular Inversion Probe (MIP) belongs to the class of Capture by Circularization molecular techniques for performing genomic partitioning, a process through which one captures and enriches specific regions of the genome. MIP probes are single stranded DNA molecules and, similar to other genomic partitioning techniques, contain sequences that are complementary to the target in the genome with which MIP probes hybridize to and capture the genomic target. MIP stands unique from other genomic partitioning strategies in that MIP probes share the common design of two genomic target complementary segments separated by a linker region. With this design, when the probe hybridizes to the target, the probe undergoes an inversion in configuration (as suggested by the name of the technique) and circularizes. Specifically, the two target complementary regions at the 5’ and 3’ ends of the probe become adjacent to one another while the internal linker region forms a free hanging loop. The technology has been used extensively in the HapMap project for large-scale Single-nucleotide polymorphism (SNPs) genotyping as well as for studying gene copy alterations and characteristics of specific genomic loci to identify biomarkers for cancer and diseases. Key strengths of the MIP technology include its high specificity to its target and its ability to be scaled up for high-throughput, multiplexed analyses where tens of thousands of genomic loci are assayed simultaneously.

Technique Procedure


Molecular Inversion Probe Structure Molecular inversion probes (MIP) are designed with sequences that are complementary to the genomic target at the 5’ and 3’ ends. The internal sequence contains two universal PCR primer sequences that are common to all MIPs as well as a probe-release site, which is usually a restriction site. If the identification of the captured genomic target is performed using array-based hybridization approaches, the internal sequence may optionally contain a probe-specific tag sequence that unique identifies the given probe as well as a tag-release site, which similar to probe-release site is also a restriction site.

Protocol
Probes are hybridized to single stranded genomic DNA and become inverted such that the target complementary ends of the probe are annealed to genomic target. However, the probes are designed such that a gap delimited by the hybridized ends of the probe remains over the target region. The size of the gap ranges from a single nucleotide for SNP genotyping to several hundred nucleotides for loci (e.g. Exon) capture .
 * Anneal probe to genomic target DNA

The gap is filled by DNA polymerase using free nucleotides and the probe is fully circularized by ligase. If SNP genotyping is performed, the probes are separated into four reactions, one for each type of nucleotide added to the reaction. If the SNP at the target region is complementary to the added nucleotide, the ligation is successful and the probe becomes fully circularized. Since each probe hybridizes to exactly one SNP target in the genome, successfully circularized probe provides the nucleotide identity of SNP.
 * Gap filling

Since gap filling is not performed for non-reacted probes, they remain linear. Exonuclease treatment removes these non-reacted probes and any linear DNA.
 * Remove non-reacted probes

In some versions of the protocol, the probe-release site (commonly a restriction site) is cleaved by restriction enzyme treatment such that the probe becomes linearized, in which the universal PCR primer sequences are the 5’ and 3’ ends of the probe and the captured genomic target becomes part of the internal segment of the probe.
 * Probe release

If the probe is linearized, traditional PCR amplification is performed to enrich the captured target using the universal primers of the probe. Otherwise, rolling circle amplification is performed for the circular probe.
 * Captured target enrichment

The captured target can be identified either via array-based hybridization approaches or by sequencing of the target. If array-based approach is used, the probe may optionally contain a probe-specific tag that uniquely identifies the probe as well as the genomic region targeted by the probe. The tags from each probe are released by cleaving the tag release site with restriction enzymes. The array contains spots with sequences that are complementary to the tag sequences of the probes.
 * Captured target identification

For SNP genotyping, tags from the four nucleotide-specific reactions are hybridized to either four genotyping arrays or two, dual-colours arrays (one channel for each reaction). Analyzing which spots on the array are bound by the tags allows the determination of the SNP identities at the genomic loci represented by the spots. For instance, if tags generated from probes that underwent gap filling with adenosine are applied to an array, the array spots that are hybridized represent SNP loci with thymidine.

Multiplex analysis

Although each probe examines one specific genomic locus, multiple probes can be combined into a single tube for multiplexed assay that simultaneously examines multiple loci. Currently, multiplexed MIP analysis can examine more than 55,000 loci in a single assay.

Padlock Probe
The design of molecular inversion probes (MIP) originates from padlock probes, a molecular biology technique first reported by Nilsson et al. 1994 . Similar to MIP, padlock probes are single stranded DNA molecules and the structure consists of a two 20 nucleotides target complementary segments connected by a 40 nucleotides linker region. In addition, when the target complementary regions are hybridized to the DNA target, the probe inverts and forms a circular structure. However, unlike MIP, padlock probes are designed such that the bound complementary regions span the entire target region, leaving no gaps. Thus, padlock probes are useful for detecting DNA molecules with known sequences.

Nilsson et al. 1994 demonstrated the use of padlock probes to detect numerous DNA targets, including a synthetic oligonucleotide and a circular genomic clone.] Padlock probes have high specificity towards their target and can distinguish target molecules that closely resemble one another. Nilsson et al. 1994 demonstrated the use of padlock probes to differentiate between a normal and a mutant cystic fibrosis conductance receptor (CFCR) where the CFCR mutant has a 3bp deletion corresponding to one of the ends of the probe. Since ligation requires the ends of the probe to be immediately adjacent to one another when hybridized to the target, the 3bp deletion in the mutant prevented successful ligation. Padlock probes were also successfully used for in situ hybridization to detect alphoid repeats specific to chromosome 12 in a sample of chromosomes in metastasis state where traditional linear, oligonucleotide probes failed to yield results. Thus, padlock probes possess sufficient specificity to detect single coy elements in the genome.

Molecular Inversion Probe
To perform SNP genotyping, Hardenbol et al. 2003 modified padlock probes such that when the probe is hybridized to the genomic target, there is a gap at the SNP locus. Gap filling using a nucleotide with complementarity to the SNP determines the SNP identity. Thus, only one type of probe is needed per allelic locus. This design brings numerous benefits since using multiple padlock probes specific to the plausible SNP polymorphisms at a locus requires careful balancing of the concentration of these allelic-specific probes to ensure SNP counts at a given locus are properly normalized. In addition, with this design, bad probes affect all genotypes at a given locus equally.

In their procedure, more than 1000 SNP loci were assayed simultaneously in one tube where each tube contained more than 1000 multiplexed probes, where each probe assayed a distinct SNP locus. The pool of probes was used into four separate reactions where, in each reaction, a distinct nucleotide (A, T, C or G) was used in gap filling. Only when the nucleotide at the SNP locus was complementary to the applied nucleotide would the gap be filled by ligation and the probe be circularized. Identification of the captured SNPs was performed on genotyping microarrays where each spot on the array contained sequences complementary to the genomic locus-specific tags in the probes. The performance of the use of two chips with two colour channels was compared to that of the use of four individual chips for determining the nucleotide identity at each SNP locus. The results were similar in terms of SNP call rate and signal-to-noise ratio ..

In a recent report (Hardenbol et al. 2005 .), the group successfully increased the level of multiplexing to simultaneously assay more than 10,000 SNP loci, using 12,000 probes. The study examined SNP polymorphisms in 30 trios samples (each trio consisted of a mother, father and their child). Knowing the genotypes of the parents, the accuracy of the SNPs calling in the child was determined by examining whether the agreement between expected Mendelian inheritance patterns and the predicted SNPs. In addition, a set of MIP-specific performance metrics were developed. The work set the framework for high-throughput SNPs genotyping for the HapMap project.

Connector Inversion Probe
To capture longer regions of the genomic target rather than a single nucleotide, Akhras et al. 2007 modified the design of MIP by extending the genomic target region spanned gap delimited by the hybridized probe ends and named the design as Connector Inversion Probe (CIP). The gap corresponded to the genomic region of interest (eg. exons) for capture. Gap filling reaction was achieved with DNA polymerase using all four nucleotides (A, C, G & T). The group sequenced the captured region for identification with a locus-specific sequencing primer that mapped to one of the target complementary ends of the probe.

The group developed the multiplexing multiplex padlocks (MMP) barcode system. Since to lower the costs for reagents, a single assay might involve DNA samples from multiple individuals and examine multiple genomic loci in each individual, a DNA barcode system that mapped uniquely to each combination of individual and genomic locus was incorporated into the linker region of the probes. Thus, sequences from the captured regions would include the barcode, allowing the non-ambiguous determination of the individual and the genomic locus that the captured region belonged to.

CIP creator 1.0.1, software for designing locus-specific CIPs, was developed

MIP Design and Optimization
By using target and varying the length of the gap delimited by the target complementary ends of the probe, a variety of locus-specific genomic elements (eg. SNPs, exons) can be captured.

Probe Design Optimization Strategies

 * To optimize the degrees of multiplexing and the lengths of captured regions, a number of factors should be considered when designing probes.


 * The sequences of the probe complementary to the DNA target must be specific and map only to unique regions in the genome with reasonable sequence complexity. Regions with repeats should be treated with caution.


 * For all probes used in a single assay, the annealing temperatures of the two target complementary ends should be similar such that hybridization of the two ends can be achieved at the same temperature.


 * The GC content of the target should similar and the target lengths variability should be restricted such that all gaps can be filled under similar elongation timeframes.


 * The lengths of the DNA target cannot be too long (current successful applications worked with 100 to 200bp target lengths), otherwise steric effects may interfere with successful hybridization of the probe to the target.


 * The tags from each probe used for microarray-based captured region identification should have similar melting temperatures and have maximal orthogonal base complexities. This ensures that all tags can be hybridized to the array under similar conditions and that cross-hybridizations are minimized, respectively.

MIP Protocol Optimization Srategies
A number of experimental conditions can be modified for optimization, which include
 * Hybridization and gap-fill time
 * Probes, Ligase and DNA polymerase concentration
 * Captured target enrichment by either rolling circle amplification or linearizing the probes to perform multi-template PCR using the universal primers common for all probes.
 * Captured target identification via either microarray-based hybridization approaches or direct sequencing of the target

These factors are critical since in one study, proper optimization strategies increased target capture efficiency from 18 to 91 (50,000/55,000) percent.

Performance Metrics
Turner et al. 2009 summarized two metrics that are commonly reported in MIP-based genomic capture experiments that identify the target by sequencing.


 * Capture Uniformity: analogous to recall – the fraction of genomic targets that are captured with confidence. Specifically, the relative abundance of sequence reads that are mapped to each genomic target.


 * Capture Specificity: analogous to precision – the fraction of sequence reads that actually map to the genomic targets of interest.

These two metrics are directly affected by the quality of the batch of probes. To improve the results for low quality probes, higher levels of sequencing depths can be performed. The amount of sequencing scales needed nearly exponentially with decreases in uniformity and specificity.

Hardenbol et al. 2005 proposed a set of metrics that concern SNP genotyping using MIPs.
 * Single/noise ratio: Ratio of true genotype counts over background counts
 * Probe conversion rate: Number of genomic SNP loci for which probes can be designed and successfully assayed. In other words, this metric concerns the fraction of probes that produce genotyping results.
 * Call rate: For a given SNP locus, the number of DNA samples whose genotypes at this locus can be measured. In other words, the number of supporting evidence for the genotype(s) assigned to the given SNP locus.
 * Completeness: For the set of SNPs assayed, the total fraction of genotypes that are successfully obtained.
 * Accuracy: For the set of SNPs assayed, the fraction detected genotypes that are correct. This is commonly measured by the repeatability of the results.

An inherent trade-off exists between probe conversion rate and accuracy. Removing probes that yielded incorrect genotypes increases the accuracy but decreases the probe conversion rate. In contrast, using a lenient probe acceptance threshold increases probe conversion rate but decreases the accuracy.

Other Genomic Partitioning Techniques
To reduce the costs from sequencing whole genomes, many methods that enrich specific genomic regions of interest have been proposed.

Other Capture by Circularization Methods
Molecular Inversion probe belongs to the class of Capture by circularization.

Gene selector method : An initial multiplex PCR step is performed to enrich the targets of interest. The PCR products are circularized upon hybridization to target-specific probes with sequences complementary to the two primers used in the PCR step.

Capture by selective circularization method : The genomic DNA is digested into fragments with restriction enzymes. Using selector probes with flanking regions that are complementary to the target of interest, the digested DNA fragments are circularized upon hybridization to the selector probes.

Performance Comparisons between Genomic Partitioning Techniques
Each method comes with its own set of tradeoffs between uniformity, capture specificity, costs, scalability and availability.

In terms of capture specificity, Capture by Circularization methods demonstrate the best results. Since all methods in this class requires two ends of the same DNA molecule (eg. two ends of MIP probes) to simultaneously bind to a single cognate partner molecule (eg. genomic target region) in the proper configuration for successful ligation.

In contrast, Capture by Circularization methods demonstrate less uniformity compared to other methods since the probe design for each distinct genomic target is unique such that performance between individual probes may vary.

Regarding scalability, Capture by Circularization and Solution-based Capture methods are most appropriate for studies that involve large aggregate genomic targets and many samples due to the high specificity. Array-based Capture techniques are appropriate for studying large aggregative genomic targets but with fewer samples due to limited resolution and specificity of microarrays. Multiplex PCR methods are most appropriate for small-scale studies due to it ease of use and availability of reagents.

The costs associated with each technique are difficult to compare given the vast choices of designs and experimental conditions. However, attaining a high multiplexing level where many loci are assayed simultaneously amortizes the costs.

Advantages of MIP
i) The hybridization of the probes to their targets proceeds via an intramolecular reaction, which minimizes potential cross-reactions and that ii) Exonuclease treatment removes non-reacted, linear probes.
 * High specificity (on the order of 90 percent). This eliminates the need for the amplification of the DNA sample prior to MIP application. High specificity is achieved by that
 * Simple infrastructure (only common bench-top reagents and tools are required) and simple design make this technique broadly applicable in many laboratories
 * A very high level of multiplexity (on the order of 104-105) can be achieved.
 * Low amount of sample DNA (eg 0.2ng/SNP ) is needed since the MIP probes can be applied directly to genomic DNA instead of shotgun libraries
 * The choices of the platform for captured target identification are very flexible such that cost-efficiency may be optimized. For instance, the captured targets can be directly sequenced, bypassing the need for sequencing library construction.

Limitations of MIP
i) Large gap region leads to steric constraints for the intramolecular circularization of the probe and ii) Large gap requires longer probes be synthesized, increasing the costs.
 * Sensitivity and uniformity are relatively low compared to other genomic capture techniques since not all targets can be captured under the same experimental conditions for high-throughput runs that involve multiple probes. However, a recent study that used probes with longer linker regions improved uniformity.
 * The plausible sizes of the target that can be captured are limited since
 * The degree of multiplexing is constrained by the multiplexing capability of the method chosen for target identification. If array-based detection methods are used, the number of targets that can be assayed is limited by the available spots on the array.
 * Since a distinct probe is needed to capture each region, it is costly to assay many regions. However, with multiplexity, the costs are amortized. For instance, at a multiplexity level of 1000, the costs become $0.01 per probe for each assay.
 * MIP reaction conditions may require optimization, which is particularly important for assaying heterozygotic sites.