User:Purple.emz/sandbox

Introduction
microRNA sequencing (miRNA-seq) was developed as an application of next-generation sequencing or massively parallel high-throughput sequencing technologies to uncover novel miRNAs and their expression profiles in a given sample.

MicroRNAs (miRNAs) family of small ribonucleic acids, 21-25 nucleotides in length, that modulate protein expression through inhibition of translation 1-3. The first miRNA, lin-4, was discovered in C eleans in a genetic mutagenesis screen looking to identify molecular elements that control post-embryonic development. Additional studies on the lin-4 gene revealed it encoded a 22nt non-coding RNA with conserved complementary binding sites in the 3’-untranslated region of the lin-14 mRNA transcript. The location of the binding sites, tied in with the fact lin-4 downregulated LIN-14 protein expression suggested lin-4 was inhibiting the translation of lin-14. Continued research into miRNA mediated gene regulation revealed miRNAs are involved in the regulation of many developmental and biological processes including haematopoiesis (miR-181 in Mus musculus ), lipid metabolism (miR-14 in Drosophila melanogaster ) and neuronal development (lsy-6 in Caenorhabditis elegans ). The uncovering of miRNAs as a fundamental cellular regulatory mechanism then led to rapid development of different techniques to identify and characterize miRNAs such as miRNA-SEQ.

History
miRNA sequencing in of itself is not a novel idea, initial methods of sequencing utilized Sanger sequencing methods. Sequencing preparation involved creating libraries by cloning of DNA reverse transcribed from endogenous small RNAs of 21–25 bp size selected by column and gel electrophoresis. However, this method is exhaustive in turns of time and resources, as each clone has to be individually amplified and prepared for sequencing. This method also inadvertently favors miRNAs that are highly expressed. Next-generation sequencing eliminates the need for sequence specific hybridization probes required in microarray analysis as well as laborious cloning methods required in the Sanger sequencing method. Additionally, next-generation sequencing platforms in the miRNA-SEQ method facilitate the sequencing of large pools of small RNAs in a single sequencing run. miRNA-SEQ can be performed using a verity of sequencing platforms. The first analysis of small RNAs using miRNA-SEQ methods examined approximately 400,000 small RNAs from Caenorhabditis elegans using Life Sciences 454 Sequencing platform. This study identified 18 novel miRNA genes as well as a new class of nematode small RNAs termed 21U-RNAs. Another study comparing small RNA profiles of human cervical tumours and normal tissue, utilized the Illumina Genome Analyzer to identify 64 novel human miRNA genes as well as 67 differentially expressed miRNAs. Applied Biosystems SOLiD sequencing platform as also been used to examine the prognostic value of miRNAs in detecting human brest cancer.

Data Analysis
Central to miRNA-seq data analysis is the ability to 1) obtain miRNA abundance levels from sequence reads, 2) discover novel miRNAs and then be able to 3) determine the differentially expressed miRNA and their 4) associated mRNA gene targets.

miRNA Alignment & Abundance Quantification
miRNAs may be preferentially expressed in certain cell types, tissues, stages of development, or in particular disease states such as cancer. Since deep sequencing (miRNA-seq) generates millions of reads from a given sample, it allows us to profile miRNAs; whether it may be by quantifying their absolute abundance, to discover their variants (known as isomirs ) Note that given that the average length of sequence reads are longer than the average miRNA (17-25 nt), the 3’ and 5’ ends of the miRNA should be found on the same read. There are several miRNA abundance quantification algorithms. Their general steps are as follows :
 * 1) After sequencing, the raw sequence reads are filtered based on quality. The adaptor sequences are also trimmed of the raw sequence reads.
 * 2) The resulting reads are then formatted into a fasta file where the copy number and sequence is recorded for each unique tag.
 * 3) Sequences that may represent E. Coli contamination are identified by a BLAST search against an E. Coli database and removed are from analysis.
 * 4) Each of the remaining sequences are aligned against a miRNA sequence database (such as miRBase ) In order to account for imperfect DICER processing, a 6nt overhang on the 3’ end, and 3nt on the 5’ end are allowed.
 * 5) The reads that do not align to the miRNA database are then loosely aligned to miRNA precursors to detect miRNAs that might carry mutations or those that have gone through RNA editing.
 * 6) The read counts for each miRNA are then normalized to the total number of mapped miRNAs to report the abundance of each miRNA.

Novel miRNA Discovery
Another advantage of miRNA-seq is that it allows the discovery of novel miRNAs that may have eluded traditional screening and profiling methods. There are several novel miRNA discovery algorithms. Their general steps are as follows:
 * 1) Obtain reads that did not align to known miRNA sequences, and map them to the genome.
 * 2) RNA Folding Method
 * 3) For the miRNA sequences were an exact match is found, obtain the genomic sequence including ~100bp of flanking sequence on either side, and run the RNA through RNA folding software such as the Vienna package.
 * 4) Folded sequences that lie on one arm of the miRNA hairpin and have a minimum free energy of less than ~25kcal/mol are shortlisted as putative miRNA.
 * 5) The shortlisted sequences are trimmed down to include only the possible precursor sequence and are then refolded to ensure that the precursor was not artificially stabilized by neighbouring sequences.
 * 6) The resulting folded sequences are considered novel miRNAs if the miRNA sequence falls within one arm of the hairpin, and are highly conserved between species.
 * 7) Star Strand Expression Method (miRdeep )
 * 8) Novel miRNA sequences are identified based on the characteristic expression pattern that they display due to DICER processing: higher expression of the mature miRNA over the star strand and loop sequences.

Differential Expression Analysis
After the abundances of miRNAs are quantified for each sample, their expression levels can be compared between samples. One would then be able to identify miRNA that are preferentially expressed that particular time points, or in particular tissues or disease states. After normalizing the for number of mapped reads between samples, one can use a host of statistical tests (like those used in gene expression profiling) to determine differential expression

Target Prediction
Identifying a miRNA’s mRNA targets will provide an understanding of the genes or networks of genes whose expression they regulate. Public databases provide predictions of miRNA targets. But to better distinguish true positive predictions from false positive predictions, miRNA-seq data can be integrated to mRNA-seq data to observe for miRNA:mRNA functional pairs. TargetScan, miRanda , and PicTar are software designed for this purpose. A list of prediction software is given [here] The general steps are:
 * 1) To determine miRNA:mRNA binding pairs, complementarity between the miRNA sequences at the 3’-UTR of the mRNA sequence is identified. Typically one more bp mismatches are allowed since miRNA binding is not perfectly complementary.
 * 2) The degree of conservation of miRNA:mRNA binding pairs across species is determined. Typically, more highly binding pairs are less likely to be false positives of prediction.
 * 3) Observe for evidence of miRNA targeting in mRNA-seq or protein expression data: where the miRNA expression is high, the gene and protein expression of its target gene should be low.

Identification of Novel miRNAs
miRNA-seq has revealed novel miRNAs that were previously eluded in traditional miRNA profiling methods. Examples of such findings are in embryonic stem cells, chicken embryos , acute lymphoblastic leukemia , diffuse large b-cell lymphoma and b-cells , acute myeloid leukemia , and lung cancer.

Comparison With Other Methods of miRNA Profiling
The disadvantage of using miRNA-seq over other methods of miRNA profiling is that it is more expensive, requires a larger amount of total RNA, and is more time consuming than microarray and qPCR methods. As well, miRNA-seq library preparation methods seem to have systematic preferential representation of the miRNA complement, and thisprevents accurate determination of miRNA abundance. Despite these disadvantages, miRNA-seq remains the platform of choice for profiling miRNA. The approach is hybridization independent and therefore does not require a priori information. As such, its advantages over previous miRNA profiling techniques include allowing one to see different miRNA isoforms (isoMirs) or very similar miRNAs and allows the identification of point mutations in miRNA genes. miRNA-seq also allows for the validation of novel miRNA discovery and predictions at the nucleotide level.