C1orf131

Uncharacterized protein C1orf131 is a protein that in humans is encoded by the gene C1orf131. The first ortholog of this protein was discovered in humans. Subsequently, through the use of algorithms and bioinformatics, homologs of C1orf131 have been discovered in numerous species, and as a result, the name of the majority of the proteins in this protein family is Uncharacterized protein C1orf131 homolog.

Gene
In humans C1orf131 is located on the minus strand of chromosome 1 and on the cytogenetic band 1q42.2 along with 193 other genes. Notably, the gene upstream of C1orf131 is GNPAT, and the gene downstream of C1orf131 is TRIM67. When this gene is transcribed in humans, C1orf131 most often forms an mRNA of 1458 base pairs long which is composed of seven exons. There are at least nine others alternative splice forms in humans that produce proteins. They range in size from 129 base pairs (2 exons) to 1458 base pairs (7 exons).

Protein
In the C1orf131 protein family, the proteins are between 93 and 450 amino acids long; however, the majority tend to be between 160-295 amino acids long. They have a molecular weight between 10.6 and 49.0 kDa with the majority between 18.6 and 32.7 kDa. They have an isoelectric point between 9.6 and 11.2. Over 30 orthologs from mammals, birds and lizards have been identified as having a poly(A) RNA binding site. All orthologs in this protein family have a domain of unknown function DUF4602. The human protein has been shown to be both phosphorylated and acetylated. These proteins are lysine-rich, charged amino acids (DEHKR), and basic charged amino acids (HKR). The secondary structure of these proteins primarily consist of alpha helices and coils with a small percentage of beta strands. C1orf131 has been shown to interact with ubiquitin through affinity capture followed by mass spectrometry and APP (amyloid beta (A4) precursor protein) through reconstituted complex.



DUF4602
DUF4602 (PF15375) is generally 120+ amino acids long. There is typically only one gene that contains this DUF domain;however, the DUF domain has been identified in two different proteins in several species. In Trichuris suis DUF4602 is found in both hypothetical protein M5114_09117 and tRNA pseudouridine synthase D, and in Echinocuccus granulosus DUF4602 has been found in hypothetical protein EGR 05135 and expressed conserved protein. DUF4602 has been found primarily in eukaryotes; however, DUF4602 has been identified in the virus DRHN1, Bacillus sp. UNC41MFS5, Enterococcus faecalis, and Enterococcus faecalis 13-SD-W-01. In the C1orf131 orthologs the DUF domains are typically located in the middle of the gene toward the C-terminus side in larger proteins (250+ residues) and in smaller orthologs (160-250 residues) the DUF domain is located near the N-terminus. Also in larger orthologs there are regions of low complexity which could indicate that these proteins are intrinsically disordered proteins.

Evolutionary history
This gene family exists only in eukaryotes. There are no paralogs of this gene; however, there are a few pseudogenes of C1orf131. Thus far they have only been found in orangutans, mouse lemurs, and sloths. When this gene family is compared to cytochrome C, a slow evolving gene, and fibrinogen gamma chain, a fast evolving gene it is shown to evolve at a faster rate than fibrinogen.