Draft:Microgenes

Microgenes, also referred to as microproteins or micropeptides, are a version of nested intronic genes, however the microgene only encodes for a small open reading frame (sORFs) inside the intron. Typical eukaryotic genes tend to have protein sequences that are very large, ranging from thousands to hundreds of thousands of base pairs (bps). Microgenes on the other hand, tend to have protein polypeptides of 100 or fewer amino acids. Through recent studies, microgenes have been found to play an important role in cell differentiation, signaling, and enzyme regulation. While more commonly found in the intronic regions of eukaryotic species, microgenes can occur within any part of the genome. Microgenes can occur in all organisms, however in organisms such as bacteria, archaea, and viruses, it is difficult to differentiate between genes and microgenes, due to their small genomes.

Classification
Microgenes can be found throughout the genome, with each coding microgene having a classification based on location:


 * Upstream open reading frames (uORFs) are microgenes that are encoded upstream of a coding DNA sequence. These microgenes are encoded by 5' untranslated regions, with the 40S ribosomal unit scanning over the mRNA and initiating the expression of the uORF. Sometimes the uORF can be used to promote the expression of the downstream gene, cleaving certain proteins that would inhibit the expression of that gene.
 * Upstream overlapping open reading frames (uoORFs) are similar in function to uORFs, where both are encoded by a 5' UTR, however the positioning of the uoORF overlaps slightly, or completely, with the starting sequence of the downstream gene.
 * Internal open reading frames (intORFs) are microgenes that are encoded within the start codon and stop codon of a larger gene. The framework for the gene and microgene differ from each other, allowing the two genes to be expressed independently.
 * Downstream open reading frames (dORFs) are microgenes that are encoded by the 3' UTR sequences, instead of the 5' UTR sequences. dORFs can either be partially overlapping a gene, similar to uoORFs, or further downstream from the gene. This is the rarest version of microgenes.
 * Long non-coding open reading frames (lncORFs) originate specifically from long non-coding RNAs (lncRNAs).

History
Previously, microgenes have been dismissed by scientists as artifacts of genetic sequencing, as the protein size was incredibly small compared to the large size of standard genes. With newer sequencing techniques, this category of genes has been verified. From this research, many microgenes have been found across a range of species, including ~30,000 coding microgenes found in humans, with over 100,000 additional micro-pseudogenes found in the genome. The function of many of the microgenes has yet to be discovered, however there are many potential functions of microgenes in humans, as well as other eukaryotes.

In Drosophila melanogaster, there are two sORFs that have been studied functionally. The first microgene was discovered to have a length between 11 and 32 peptides and had been found to be expressed during the embryonic development of D. melanogaster. Inactivation of the microgene would lead to a lethal phenotype in the growing flies, where they would lose the epidermal trichomes on their legs. Reactivating the microgene would allow the gene encoding trichome formation to reactivate as well. The second microgene in D. melanogaster that had been found to be expressed in muscle cells, with their function aiding in the regulation of calcium ions through an ion channel, which in turn aided in muscle contraction. In recent years, researchers have discovered that some of the 30,000 coding microgenes found in the human genome serve a functional purpose. These microgenes can play major roles in the regulation of the human body, with some causing disease when removed from the genome. Other microgenes have been found overlapping genes, one of which is co-expressed with ATXN1 in the cerebellum, where it will co-localize in the nucleus with ATXN1. As this field of research is new and ongoing, there is not much known on the exact functions of each microgene in the human genome, although they may be similar to other documented microgenes, such as cell signaling, transport, and adhesion.

Evolution
The origin of microgenes is not known, but could be a random point mutation creating a start codon to an ORF. It is widely believed that microgenes are not a product of pre-existing genes, and the creation of novel microgenes is mostly de novo. Other theories on the origin of microgenes come from the duplication or fission of a protein-coding sequence (CDS), or could have possibly derived from transposable elements.

For the case of humans, recent discoveries have found a major increase in lncORF, uORF, and dORF microgenes compared to their common ancestors. These novel microgenes are shared with humans only by their most common ancestor, however there is still a major portion that is novel only to humans.