User:FedeFede89/sandbox

Gene expression profiling refers to the determination of the profile of transcribed genes within a cell, thus being a function of genomic responses to events and general functioning. The set of all RNA molecules in the cell (called the transcriptome) can be determined by a variety of newer techniques, such as high-throughput sequencing, or the presence of individual genes can be determined through older techniques such as Northern blotting and real-time polymerase chain reaction.

Gene expression profiling is particularly useful in adjunction with genomic sequencing, to determine what genes are actually active at different times, since although the gene information may be encoded in the genome, analysis of the transcriptome informs researchers of what genes are active, based on the promoters and inhibitors microenvironments. Furthermore, proteomic analysis is of particular importance when coupled to transcriptomic analysis, since even though genes may be transcribed, modifications at the translation levels may prevent genes from being activately translated into proteins, or may be upregulated comparatively to other genes, even with comparable amounts of mRNA transcripts.

As mRNAs can be very short lived and directly active (e.g. in the suprachiasmatic nucleus clock machinery genes), gene expression profiling must be conducted with carefully considered conditions and appropriate controls. Gene expression profiling can be qualitative or quantitative based on the method used, thus selection of method is an important factor in determining how much information about a particular transcriptome is required.

Applications
There are many research sectors where gene expression profiling may be applied.

The gene expression changes in specific diseases such as cancer may be investigated, to better understand the genomic basis for these changes, and particularly whether changes in gene expression are due to epigenetic or genomic changes, based on comparison between gene expression profiling and genomic sequencing.

The cellular response to drugs and treatments may be investigated, by determining what genes are upregulated or downregulated to cope with the cancer therapy. Particular implications arise in determining what causes resistance to certain anti-cancer chemotherapeutic agents, such as for example the abrogation of certain cell-cycle checkpoints, which may better inform treatment strategies to overcome these resistance effects.

Specific signalling pathways may be investigated, based on their genomic sequence. This is particularly relevant in cases where many genes orchestrate various events, perhaps within very fine time-keeping systems, such as the complicated interactions of the genetic clock machinery in the suprachiasmatic nucleus.

Example investigation

 * 1) Determine the subject cell type
 * 2) Determine conditions
 * 3) Determine experimental conditions
 * 4) Determine at least one control condition
 * 5) Perform the experiment
 * 6) Lyse the cells
 * 7) Extract mRNA
 * 8) Analyse with method of choice
 * 9) Read gene expression level from output
 * 10) Perform statistical analyses to determine changes in expression

Databases
The Gene Expression Omnibus (GEO) repository is the main database containing gene-expression profiling experiments from a variety of fields. The database contains numerous results and methods of experiments performed in which gene expression profiling was available. Stored results may be in the form of next-generation sequencing, microarrays or older sequencing methods, and can be a useful tool in preliminary analysis, or to advise particular lines of research. All data contained in the database conforms to the MIAME (Minimum Information About a Microarray Experiment) standards, which carefully describe what information is required in order to properly documents experiments which are verifiable and scientifically valid.

THIS IS WHERE THE VIDEO WILL GO AS TO HOW TO SEARCH IT

Techniques
Many techniques can be employed in the profiling of gene expression. The main techniques are compared in Table 1. Table 1. Comparison of gene expression profiling techniques.

High-throughput sequencing
High-throughput sequencing refers to techniques such as Roche, Illumina and SOLiD which utilise cDNA in order to gain information about the content of RNA within a cell. These methods are paving the way for gene sequencing and are at the forefront of gene expression profiling. This method of analysis has become increasingly attractive due to the fact that these methods are capable of processing millions of sequences in parallel rather than the previous staccato approach of 96 at a time. High-throughput sequencing is ever more appealing due to the minimal bias in comparison to capillary based methods that require cloning and a vector, and requires just a few micrograms of DNA in order to construct a library. There are various high-throughput sequencing methods, with the most popular being Illumina, Roche(454) and SOLiD (summarised in Table 2).

Table 2. Comparison of high-throughput sequencing techniques.

Illumina
Illumina’s sequencing by synthesis (SBS) technology is one of the most successful and widely-adopted next-generation sequencing platforms. TruSeq technology supports parallel sequencing through a reversible terminator-based method that enables detection of single bases as they are incorporated into growing DNA strands. A fluorescently-labelled terminator is imaged as each dNTP is added and then cleaved to allow incorporation of the next base. The end result is base-by-base sequencing that enables the most accurate data for a broad range of applications. SBS technology supports both single read and paired-end libraries. It is the only platform that offers a short-insert paired-end capability for high-resolution genome sequencing as well as long-insert paired-end reads using the same chemistry for efficient sequence assembly, de novo sequencing, large-scale structural variation detection, and more. The combination of short inserts and longer reads increase the ability to fully characterize any genome. A wide array of available sample preparation methods allow for diverse applications, including: whole-genome and candidate region resequencing, transcriptome analysis, small RNA discovery, methylation profiling, and genome-wide protein-nucleic acid interaction analysis.

Microarrays
Microarray technology evolved from Southern blotting and was first used in 1982 in a study that looked at the cloning and screening of sequences expressed in a mouse colon tumour. A microarray consists of small, solid supports onto which the sequences of thousands of different genes are immobilized. It is the most commonly used technique to profile thousands of transcripts simultaneously. The main use of arrays is to identify candidate genes expressed under a certain set of conditions, for example, the analysis of genes expressed during yeast sporulation.

There are two types of platforms that are commonly used; cDNA arrays and oligonucleotide arrays. In cDNA arrays, cDNAs from a clone collection or cDNA library are spotted on a nylon membrane or glass slide. Oligonucleotide arrays use oligonucleotides that are either etched on a silicon chip or printed on glass slides.

The oligonucleotide or cDNA spotted array is hybridized to cDNAs synthesized from the mRNA or total RNA extracted from the cell or tissue of interest. The cDNA from two different samples are labelled with fluorescent dyes such as Cy3 (green) and Cy5 (red). These samples can be different cell populations or treatment conditions. The cDNA labelled with Cy3 and Cy5 are mixed together and hybridized against the same array. The two populations compete for the same targets or probe spots on the array. The array is scanned with two different wavelengths following hybridization and washing2. The spot intensity at the two wavelengths is determined. To interpret the results, a ratio or log ratio between the two fluorescent intensities is calculated. Alternatively, radioactivity can be used to increase the sensitivity of the assay but at the cost of decreased density of the array.

In terms of analysing the results, competitive hybridizations detect relative levels of expression by comparing fluorescence intensities of probes from each treatment on each spot. It is generally thought that a two-fold change (induction or repression) represents a biologically meaningful change in gene expression. But there are several problems with only looking at a simple ratio of fluorescence produced by each sample (e.g. dye effects, sample amounts/intensity, background, slide-to-slide variation). Therefore, it is best to assess the statistical significance of a difference in signal strength. This is often done by a T-test, which tests if experiment reference ratios differ from one another or by an ANOVA, which compares normalized expression levels to the mean

Microarray technology has proved to be very useful in the fields of genomics, bioinformatics and gene expression profiling. It has been widely used in comparative genomics of important bacterial strains, for example, a study by Behr et al has used whole-genome DNA microarrays to study the comparative genomics of M. tuberculosis and M. bovis and identified specific virulence associated regions in the genomes.

Sage analysis
Serial analysis of gene expression (SAGE) is a method originally developed in 1995. SAGE analysis is used in order to compare gene expressions between two mRNA populations. SAGE analysis results in the formation of small sequence tags that are specifically located within cDNA from which they are derived. This method is advantageous because it allows specific identification of cDNA from large quantities of varying transcripts. The structure of the tags produced may be in the form of a dimer or ditag which are then ligated together to form concatemers. These concatemers may then be cloned. The clone samples are run in an automated sequencing gel and from each lane more than 30 individual tags may be read. The expression of the gene is directly identified by the abundance of the tag. This method allows serial analysis of thousands of gene tags and from this information of the genes expressed in a given tissue and the gene expression profile may be simultaneously developed. It is possible to analyse the entire transcriptome with great sensitivity even if the abundance of mRNA is low. This method has been of great use in oncology research as it is able to identify markers in malignant samples. One downfall of SAGE analysis was that cross comparison of tissue samples could not be easily conducted however, in 2010 Yang et al., applied Set theory to the analysis so that common and tissue-specific SAGE tag sequences could to put into ‘sets’. SAGE is a very flexible in its applications and can be used to form digital gene expression databases. Other forms of SAGE analysis include LongSAGE and SuperSAGE. The long sage method involves the use of individual transcripts to produce 21 bp tags which can be matched to the human genome. Long SAGE is advantageous over SAGE analysis as it produces a higher percentage of accurate tag-to-gene, although for applications where expressed genes are vital and costing is key-factor SAGE analysis is suitable. Long SAGE is more useful is situations where gene discovery is the main objective and also when a large database is in use so that standardization may be achieved along with a low error rate Despite longSAGE being more advantageous regarding tag-gene-mapping, the extra bases noticeably increases the cost of analysis, especially in large-scale gene expression projects for example the Cancer Genome Anatomy Project (CGAP). Super Sage is the superior form of SAGE analysis as it allows the development of 26 bp tags to be formed from a cDNA template and can identify novel genes in any eukaryotic organism. Aside from gene expression profiling, a further function of SAGE analysis that has been identified is transcript detection. A common method originally used for the identification of transcripts is Expressed Sequence Tag (EST) sequencing but it has now been established that SAGE analysis is more powerful.

Real-time reverse transcription polymerase chain reaction
Reverse transcription-polymerase chain reaction is another technique which is used for quantification and detection of a known mRNA sequence in a sample. This technique is highly sensitive and it’s used in the gene expression as it enables you to test if a specific gene is active or inactive. Reverse transcription -PCR uses the enzyme reverse transcriptase to convert the RNA into cDNA. cDNA is then amplified using PCR. However, the exponential growth during each cycle makes the end point quantification unreliable. Due to product unreliability, Real-time polymerase chain reaction (RT-PCR) is used. RT-PCR is the preferred technique in gene expression for quantitative analysis. RT-PCR enables us to collect data in real time as the PCR reaction proceeds. It’s highly sensitive technique and has a superior reproducibility. RT-PCR is required in order to quantitate the difference between mRNA expressions. It is a reliable method in order to detect and measure products that are generated by the PCR. This technique is only available after the introduction of the Oligonucleotide probe. Oligonucleotide probe is a short sequence of nucleotide which are synthesised in order to match a specific regions of the DNA or RNA which then use a molecular probes to detect the specific DNA or RNA sequence. Due to the activity of the Taq polymerase, the amplification of the target specific product during PCR can be detected after probe cleavage. ICycler is one of many machines that are used to monitor the amplification. It can accommodate up to 96 samples which mean many sample can be monitored simultaneously. The PCR arrays include a green optimized primer assay for a thorough study of panel of relevant, pathway or disease focused gene. The simultaneous monitoring allows for high amplification efficiency and specificity which is required for RT results. The fluorescence probe in the 96 well plates is monitored by a sensitive camera which is built within the machine. Due to its simplicity, the PCR array can be designed for a routine use, making the gene expression profiling accessible in every research lab.

Northern blot


The Northern blot was first discovered in 1977. It is used to evaluate gene expression via the detection of RNA in a given sample. This technique is often used to evaluate gene expression during different conditions, such as during embryogenesis or tumour development. To evaluate this different expression, sample would be simultaneously collected and evaluated. The first step in this technique is to extract the RNA from a homogenised sample, which allows the mRNA to be isolated. Gel electrophoresis is used to separate the RNA sample by size and weight, followed by transfer to a nylon membrane. After the transfer, it is immobilised and then hybridized to a labelled probe to allow detection of the RNA. The next phase is the ‘washing phase’, which washes any unbound probes off the membrane and reduces background signals, to give a clearer result. The signals from the probes are detected by X-ray films and quantified by densitometry.

A newly develop version of the Northern blot, called the reverse Northern blot allows for more specific detection of RNA. The substrate nucleic acid that is fixed to the membrane is a collection of DNA fragments, which are cDNAs of RNA transcripts. After extraction of the RNA from a sample, the RNA is radioactively labelled and then brought into contact with the membrane. The RNA will hybridize with the matching DNA fragments already fixed to the membrane. This technique is useful when looking to determine if a particular gene is present in particular samples, such as if a particular gene is expressed in tumour growth.

For an accurate analysis of gene expression, this technique should be followed by proteomics, as the presence of RNA doesn’t always mean that the RNA is being transcribed.

Example protocols
Microarray

Northern blot

SAGE

RT-PCR

Illumina (High-throughput)