User:Melmunro/sandbox

Overview
Orphan genes are genes that do not share homologues in genomes of other organisms. This means that they cannot be linked to other lineages based on gene sequences. Depending on what analysis of known species’ genomes that is examined, it will indicate orphan genes comprising of somewhere between 1-30% of the functional genes in the genome, where 10-20% as the most accepted range. There is no correlation between the complexity of an organism and the amount of its genome that is comprised of orphan genes. It should also be noted that the proportion of orphan genes is independent of genome size. In order to be considered an orphan gene, the gene must be encoding a protein that lacks homology to any predicted peptide from other genomes of similar species (such as other species in the same phylum). Scientists then also apply a couple filters to the gene to weed out false positives, including eliminating genes that were similar in other taxonomic divisions or could be potentially from genomes that have not been sequenced yet.

There are a couple more specific groups within orphan genes. One of these is taxonomically-restricted genes. (aka TRGs) This group compares the distribution of the genes in comparison to gene sequences in other species’ genomes. TRGs are synonymous with the term lineage-specific genes. Orphan genes could also be taxon-specific orphan genes (TSOGs), which are characterized by the lack of homology to genes outside of a focal taxonomic group. TSOGs are different than TRGs only by the parameters outlined by the scientists who are analyzing a group of genomes. Another group is species-specific orphan genes (SSOGs), which are TSOGs that do not share homology with any gene in any species. Any TSOG can be moved out of this grouping as soon as distant homology is proven.

Orphan genes are a previously unexplored sector of the world of biology that potentially can offer scientists much more and constitute a rich source of discovery. A neuropharmacologist and his team studying orphan genes said “The challenge is to figure out what it is doing. That's what really drives us." They are rather unusual compared to the lineage-specific genes that are studied frequently in evolution. The commonly accepted model of evolution is significantly based on duplication, rearrangement, and mutation of genes with the idea of common descent.  This fits with lineage-specific genes but not quite with orphan genes. Understanding orphan genes can help adapt the working model of evolution. This helps explain why evolutionary biologists can be fascinated with orphan genes. However, they have comparatively received a relatively small amount of attention compared to lineage-specific genes.  This could be because the only biological purpose of orphan genes that is currently known is that they are adaptively useful for organisms to evolve.  This has been proven by the gene contents of bacterial genomes varying greatly amongst species and is well accepted in the scientific community. This is all intriguing, but the study of lineage-specific genes has generated much more interest in the scientific community because their causes and effects are easier to determine.

History of orphan genes
Orphan genes were first discovered when the yeast genome-sequencing project began in 1996. Orphan genes accounted for an estimated 26% of the yeast genome, but it was dismissed that the number would drop to a negligible amount once databases entered genomes of many other species. Since the world is comprised of between an estimated 1 to 20 million animal species, the discovery was ignored for some time. However, the cumulative number of orphan genes in sequenced genomes did not level off as time passed. In the sequencing of Schizosaccharomyces pombe and Schizosaccharomyces cerevisiae in 2002, researchers found that 14 percent and 19 percent, respectively, of the protein encoding genes were totally unique to that specific species. This makes them SPOG. Unfortunately for the study of orphan genes, researchers were more interested in studying the similar gene sequences and not the unknown regions.

It wasn’t until 2003 that orphan genes were directly accessed. In a study of Caenorhabditis briggsae and related species, researchers studied over 2000 genes in each species and performed a BLAST. Their results proposed that these genes must be evolving too quickly for BLAST to pick up and are consequently sites of very rapid evolution. In 2005, Wilson examined 122 bacterial species to try to examine whether the large number of orphan genes in many species was legitimate. The study found that it was legitimate and played a role in bacterial adaptation. The definition of taxonomically-restricted genes was introduced into the literature to make orphan genes seem less “mysterious.”

Additionally in 2009, a study went into “‘the dark matter of protein space’’ to analyze the 2,200 domains of unknown function and found that the extreme diversification they cause enable organisms to evolve new functions in an easier fashion than it would be without them. This was important because orphan genes were recognized to have a purpose at the level of proteins.

How to identify orphan genes
To identify orphan genes, scientists can examine if they encode for a protein that lacks resemblance to any predicted peptide from other similar genomes of similar species. If they do, they can initially be considered an orphan gene.

The most preferred quantitative way to identify orphan genes is with the Basic Local Alignment Search Tool, more commonly known as BLAST. BLAST is a program that compares nucleotides and protein sequences to the sequence databases to figure out the statistical relevance of a match being generated by “chance” alone. In other words, it finds similarity of sequences in genomes. BLAST is able to pick up the most remote homologues that exist, but also can pick up genes that are from species that are in diverged in the very distant past. (Tautz). One of the downsides of BLAST though is that its results can be incorrect. Often, short orphan genes determined true by BLAST are later determined unlikely due to "unsuitable codon usage” and can be considered questionable on whether or not they really are orphan genes.

Where do orphan genes come from?
De-novo evolution of genes can create orphan genes. A de-novo mutation can take place due to a lot of transcription taking place in non-coding RNA regions that over time turn into orphan genes. This means that genomes may have evolutionary constrains that help maintain functional gene regions. About 5.5% of primate orphan genes originated from de-novo. However, it is difficult to find out the amount of these in all genomes in order to determine if this is a general trend for orphan genes. Mechanisms that create TRGs are shrouded in a little more biological mystery. The two most accepted ways are due to de novo into non-coding regions and gene duplications.

Some researchers have proposed that orphan genes drive morphological specification because they allow organisms to "adapt to constantly changing ecological conditions." These all give more possibilities of differences within a population to help it survive in its environment, which can be helpful if it recently experienced a bottleneck.