Disease gene identification

Disease gene identification is a process by which scientists identify the mutant genotypes responsible for an inherited genetic disorder. Mutations in these genes can include single nucleotide substitutions, single nucleotide additions/deletions, deletion of the entire gene, and other genetic abnormalities.

Significance
Knowledge of which genes (when non-functional) cause which disorders will simplify diagnosis of patients and provide insights into the functional characteristics of the mutation. The advent of modern-day high-throughput sequencing technologies combined with insights provided from the growing field of genomics is resulting in more rapid disease gene identification, thus allowing scientists to identify more complex mutations.



Generic gene identification procedure
Disease gene identification techniques often follow the same overall procedure. DNA is first collected from several patients who are believed to have the same genetic disease. Then, their DNA samples are analyzed and screened to determine probable regions where the mutation could potentially reside. These techniques are mentioned below. These probable regions are then lined-up with one another and the overlapping region should contain the mutant gene. If enough of the genome sequence is known, that region is searched for candidate genes. Coding regions of these genes are then sequenced until a mutation is discovered or another patient is discovered, in which case the analysis can be repeated, potentially narrowing down the region of interest. The differences between most disease gene identification procedures are in the second step (where DNA samples are analyzed and screened to determine regions in which the mutation could reside).

Pre-genomics techniques
Without the aid of the whole-genome sequences, pre-genomics investigations looked at select regions of the genome, often with only minimal knowledge of the gene sequences they were looking at. Genetic techniques capable of providing this sort of information include Restriction Fragment Length Polymorphism (RFLP) analysis and microsatellite analysis.

Loss of heterozygosity (LOH)
Loss of heterozygosity (LOH) is a technique that can only be used to compare two samples from the same individual. LOH analysis is often used when identifying cancer-causing oncogenes in that one sample consists of (mutant) tumor DNA and the other (control) sample consists of genomic DNA from non-cancerous cells from the same individual. RFLPs and microsatellite markers provide patterns of DNA polymorphisms, which can be interpreted as residing in a heterozygous region or a homozygous region of the genome. Provided that all individuals are affected with the same disease resulting from a manifestation of a deletion of a single copy of the same gene, all individuals will contain one region where their control sample is heterozygous but the mutant sample is homozygous - this region will contain the disease gene.

Post-genomics techniques
With the advent of modern laboratory techniques such as High-throughput sequencing and software capable of genome-wide analysis, sequence acquisition has become increasingly less expensive and time-consuming, thus providing significant benefits to science in the form of more efficient disease gene identification techniques.

Identity by descent mapping
Identity by descent (IBD) mapping generally uses single nucleotide polymorphism (SNP) arrays to survey known polymorphic sites throughout the genome of affected individuals and their parents and/or siblings, both affected and unaffected. While these SNPs probably do not cause the disease, they provide valuable insight into the makeup of the genomes in question. A region of the genome is considered identical by descent if contiguous SNPs share the same genotype. When comparing an affected individual to his/her affected sibling, all identical regions are recorded (ex. Shaded in red in above figure). Given that an affected sibling and an unaffected sibling do not have the same disease phenotype, their DNA must by definition be different (barring the presence of a genetic or environmental modifier). Thus, the IBD mapping results can be further supplemented by removing any regions that are identical in both affected individuals and unaffected siblings. This is then repeated for multiple families, thus generating a small, overlapping fragment, which theoretically contains the disease gene.

Homozygosity/autozygosity mapping
Homozygosity/Autozygosity mapping is a powerful technique, but is only valid when searching for a mutation segregating within a small, closed population. Such a small population, possibly created by the founder effect, will have a limited gene pool, and thus any inherited disease will probably be a result of two copies of the same mutation segregating on the same haplotype. Since affected individuals will probably be homozygous in the regions, looking at SNPs in a region is an adequate marker of regions of homozygosity and heterozygosity. Modern day SNP arrays are used to survey the genome and identify large regions of homozygosity. Homozygous blocks in the genomes of affected individuals can then be laid on top of each other, and the overlapping region should contain the disease gene.

This analysis is often extended by analyzing autozygosity, an extension of homozygosity, in the genomes of affected individuals. This can be accomplished by plotting a cumulative LOD score alongside the overlaid blocks of homozygosity. By taking into consideration the population allele frequencies for all SNPs via autozygosity mapping, the results of homozygosity can be confirmed. Furthermore, if two suspicious regions appear as a result of homozygosity mapping, autozygosity mapping may be able to distinguish between the two (ex. If one block of homozygosity is a result of a very non-diverse region of the genome, the LOD score will be very low).

Tools for Homozygosity Mapping
 * 1) HomSI: a homozygous stretch identifier from next-generation sequencing data  A tool that identifies homozygous regions using deep sequence data.

Genome-wide knockdown studies
Genome-wide knockdown studies are an example of the reverse genetics made possible by the acquisition of whole genome sequences, and the advent of genomics and gene-silencing technologies, mainly siRNA and deletion mapping. Genome-wide knockdown studies involve systematic knockdown or deletion of genes or segments of the genome. This is generally done in prokaryotes or in a tissue culture environment due to the massive number of knockdowns that must be performed. After the systematic knockout is completed (and possibly confirmed by mRNA expression analysis), the phenotypic results of the knockdown/knockout can be observed. Observation parameters can be selected to target a highly specific phenotype. The resulting dataset is then queried for samples which exhibit phenotypes matching the disease in question – the gene(s) knocked down/out in said samples can then be considered candidate disease genes for the individual in question.

Whole exome sequencing
Whole exome sequencing is a brute-force approach that involves using modern day sequencing technology and DNA sequence assembly tools to piece together all coding portions of the genome. The sequence is then compared to a reference genome and any differences are noted. After filtering out all known benign polymorphisms, synonymous changes, and intronic changes (that do not affect splice sites), only potentially pathogenic variants will be left. This technique can be combined with other techniques to further exclude potentially pathogenic variants should more than one be identified.