C5orf34

C5orf34 (chromosome 5 open reading frame 34) is a protein that in humans is encoded by the C5orf34 gene (5p12).

C5orf34 is conserved in mammals, birds and reptiles with the most distant ancestor being the Burmese python, Python bivittatus. The C5orf34 protein contains two mammalian conserved domains: DUF 4520 and DUF 4524. The protein is also predicted to have a polo-box domain (PBD) of polo-like kinase 4 (plk4), which has predicted conservation in distant orthologs from the clade Aves.

Gene
C5orf34 is located on the negative DNA strand of the short arm of chromosome 6 at locus 12. The gene is 28,744 base pairs long and spans from base pair 43,486,701 to base pair 43,515,445. The gene produces a single transcript of 2,540 base pairs long and encodes for 638 amino acids.

Gene neighborhood
The gene PAIP1 is found on the negative strand just downstream of C5orf34 and is a member of the polyadenylate-binding family. PAIP1 extends from base pairs 43,526,267 to 43,557,419. CCL28 is found downstream on the negative strand and extends from base pairs 43378052 to 43413837.

Gene expression
There indication of multiple sources that suggest, in humans, C5orf34 protein is expressed non-ubiquitously in select tissues at low/moderate levels, with the most abundant expression in the tissues of the stomach, small intestine, testis, skeletal muscle and heart muscle. A study of Rho kinase inhibitor effect on primary cell lines also showed that C5orf34 is expressed in dermal fibroblasts of normal human tissue samples.

Promoter
The promoter region for C5orf34 is predicted to be between 43515079 and 43515773 and spans 695 base pairs.

Protein
C5orf34 consists of 638 amino acids, has a weight of 72.7 kDa and an isoelectric point of 7.77 in humans.

Function
Although the precise function of C5orf34 in humans remains unknown, there is evidentiary support based on structure that it is involved in kinase-related cellular functions. In addition, C5orf34 is predicted to be nuclear, thus it has potential involvement in gene regulation and cell proliferation seeing as these are two primary signal transduction pathways involve nuclear kinase proteins.

Structure
In humans, C5orf34 contains two domains of unknown function, DUF 4520 (pfam 15016) and DUF 4524 (pfam 150125), found between residues 6-153 and 444–539, respectively. The protein is serine and threonine rich. The charge distribution of the protein is equally dispersed per there are no positive or negative charge clusters sequestered within the protein.

The predicted secondary structures of the human protein were assessed by multiple bioinformatic tools. All of the programs predicted the protein's structure to consist of alpha helices, extended strands, random coils and beta turns. The Phyre2 server provided a predicted human protein structure that indicated domains of plk polo-box of the serine/threonine-protein kinase plk4. The server predicted with 96.8% confidence of 20% coverage (130 residues) of the protein. The coverage exhibited residues of the conserved polo-box domain and the two DUF domains. The protein was predominantly soluble, with an average hydrophobicity of -0.478.

Post-translational modifications
There is extensive, predicted phosphorylation of C5orf34, with 32 phosphoserines and 7 phosphothreonines being conserved in orthologs of the human C5orf34 protein. This analysis indicates C5orf34 as a phosphoprotein and supports structural predictions of it being a kinase protein. The protein contains only one nuclear export signal residue, found at 481-L; however the NES score was found to be low at 0.515. Structural analysis of the protein indicated it was sequestered in the nucleus with an 87% probability.

Interacting proteins
Databases of protein interactions (MINT, STRING, IntAct, and BioGRID) have not identified any interactions with C5orf34.

Homology and evolution
C5orf34 is highly conserved in primates and mammals and moderately conserved in reptiles. The furthest conserved ortholog is in Python bivittatus, or the Burmese python. Below is a selected list of orthologs to demonstrate the homology of this gene with relation to the reference sequence in Homo sapiens.

Orthologous space
151 organisms have been predicted orthologs with C5orf34. The most distant ortholog is the Burmese python, which diverged from humans 296 million years ago, indicating C5orf34 developed in reptiles and birds.

Paralogous space
There are no predicted paralogs for C5orf34 in both humans and mice.

Conserved regions
Multiple sequence alignments indicated amino acid residue conservation throughout the C5orf34 protein in an array of orthologs, with the most highly conserved regions at both N-terminus and C-terminus where the DUF are located. DUF 4520 (pfam 15016) was found to be conserved in C-terminus and DUF 4524 (pfam 150125) was found to be conserved in the N-terminus. Also, the polo-box domain of plk4 was found to be conserved in the C-terminus in a multiple sequence alignment in both strict and distant orthologs.