C20orf27

UPF0687 protein C20orf27 is a protein that in humans is encoded by the C20orf27 gene. It is expressed in the majority of the human tissues. One study on this protein revealed its role in regulating cell cycle, apoptosis, and tumorigenesis via promoting the activation of NFĸB pathway.

Gene
The UPF0687 Protein C20orf27 has four other aliases, Chromosome 20 Open Reading Frame 27, Hypothetical Protein LOC54976, C20orf27, and FLJ20550. It is located on the minus strand at 20p13. It consists of 7 exons and 12 introns. This most updated annotation shows that gene C20orf27 starts at 3,753,499 bp to 3,768,388 bp on Chromosome 20.

Known isoforms
The C20orf27 gene has 5 transcript isoforms, C20orf27 transcript variant 1, C20orf27 transcript variant 2, C20orf27 transcript variant 3, and C20orf27 transcript variant 4.

Transcript variant 1 encodes for the longest protein isoform, with a size of 1327 bases and 6 exons.

Transcript variant 2 maintains the reading frames and 6 exons compared to transcript variant 1, but it has an alternative spliced site in the coding region. It has a size of 1252 bases.

Transcription variant 3 has a size of 1706 bases and 6 exons. This variant has an alternative spliced site in the coding region and differs in the 5’ UTR, but it still maintains the reading frame seen in transcript variant 1. Despite their differences in size, variant 2 and variant 3 encodes the same protein isoform and this second protein isoform is shorted than the protein isoform encoded by transcript variant 1.

Transcript variant 4 has a size of 1457 bases with 6 exons. Compared to variant 1, it uses an alternative 5’-most exon and an alternative splice site. Because of the presence of an upstream ORF that is predicted to interfere with translation of this variant, the transcription variant 4 does not encode any protein.

The information on transcript variant X1 comes from GRCh38.p13 Primary Assembly. This variant has a size of 1195 bases, and the number of exons in this variant remains unknown.

Physical features
The human gene C20orf27 has three known isoforms.

Isoform 1 has 199 amino acid residues and a domain named DUF4517. Isoform 2 has 174 amino acid residues, and isoform X1 has 154 amino acid residues. All three isoforms contain the same domain DUF4517. The function of domain DUF4517 requires future research.

The predicted isoelectric point of unmodified protein C20orf27 is 6.89.

The percentage of each amino acid residue is about its average percentage among human proteins. Overall, the positively charged amino acid residues in human protein C20orf27 outnumbers the negatively charged amino acid residues. Protein C20orf27 has no high scoring hydrophobic regions, no highly charged regions, and no transmembrane regions.

SPAS predicts two repetitive structures. The first repetitive structure is amino acid alphabet structures with a core block length of 4. The total number of this structure in human protein C20orf27 is 15. The second repetitive structure is an 11-letter reduced alphabet structure with a core block length of 8. This charged alphabet structure predicts to appear 8 times in human protein C20orf27. There are no predicted clusters of amino acid multiples.

Post-translation modifications
The predicted molecular weight of C20orf27 is 21.6 kDa. A Western Blot binding pattern on protein C20orf27 with its polyclonal antibody reveals that the experimental molecular weight of protein C20orf27 is about 22 kDa. This suggests that there are relatively few post-translation modifications on protein C20orf27.

There is no predicted signal peptide or cleavage site. There are many predicted phosphorylation sites along the sequence of protein C20orf27, including four sites for protein kinase A (PKA), two sites for protein kinase C (PKC), three sites for casein kinase 2 (CKII), one site for ribosomal S6 kinase (RSK), one site for cGMP-dependent protein kinase or Protein Kinase G (PKG), and one site for ataxia-telangiectasia mutated (ATM) serine/threonine protein kinase.

Protein C20orf27 is predicted to have other post-translation modification sites including five palmitoylation sites, one c-mannosylation site, and two sumoylation sites.

Structure
Three stretches of beta sheet from amino acid 62 to 67, 76 to 87, and 92 to 100 are predicted with the highest confidence using CFSSP and Phyre2. A model predicted by I-TASSER shows that the tertiary structure of human protein C20orf27 is a combination of many beta sheets. This confirms the predictions made by CFSSP and Phyre2.

Subcellular Localization
This protein is expected to be found in cytosol and nucleus, but not in nuclei. Additional computational analysis predicts that this protein is most likely to be in cytosol.

Expression
Protein C20orf27 is expressed ubiquitously throughout different human tissues. Microarray-assessed tissue expression pattern suggests caudate nucleus has the highest expression of protein C20orf27.

Other than caudate nucleus, protein C20orf27 expression measure ranks at the top 25% among 100 proteins in pons, fetal brain, BM- CD105+ endothelial, BM- CD34+, bone marrow, adipocyte, uterus corpus, 721 BLymphoblast, PB- CD56+NK cells, BM- CD33+ myeloid, colorectal adenocarcinoma, leukemia chronic Myelogenous K-562, leukemia lymphoblastic (MOLT-4), and leukemia promyelocytic-HL-60.

In situ hybridization data has shown that the expression of C20orf27 in airway epithelial cells (AECs) can be correlated to chronic lung diseases. After AECs are treated with IL-13, which is a cytokine expressed by CD4 T helper cells, AECs begin to secrete excess mucous, and excess mucous secretion in the airway is a mark of chronic lung diseases.

Gene level expression
There are three promoter regions in gene C20orf27.

Five transcription factors that bind to the promoter region of gene C20orf27 have been discovered, including MITF, JUN, ZNF282, FOXA1, and TCF7L2.

Using genomatix, more transcription factor binding sites are predicted. Transcription binding matrix, like EGR/nerve growth factor induced protein C & related factors, GC-Box factors SP1/GC, Krueppel like transcription factors, Myc associated zinc fingers, vertebrate homologues of enhancer of split complex, E-box binding factors, E2F-myc activator/cell cycle regulator, and BED subclass of zinc-finger proteins, are predicted to give the highest matrix similarity.

Transcript level regulation
Predicted miRNA binding sites in 3' end of C20orf27 mRNA which sequences are also conserved evolutionarily are hsa-miR-7856-5p, hsa-miR-671-5p, hsa-miR-4768, hsa-miR-6791-3p, hsa-miR-6829-3p, hsa-miR-548d-3p, hsa-miR-548-3p, hsa-miR-548z, and hsa-miR-548h-3p. The formation of three stem loops is conserved in different predicted models. The three stem loops start from the 5' end of C20orf27 mRNA base 1 to base 27, base 56 to base 74, and base 116 to base 130.

The mRNA of C20orf27 has about 23 predicted mRNA binding protein binding sites which sequences are also conserved in evolution. The names of these mRNA binding proteins are BRUNOL5, BRUNOL6, PCBP2, TARDBP, MBNL1, CUG-BP, PCBP3, PTBP1, RBM5, SRSF1, HNRNPH2, FMR1, HNRNPF, LIN28A, CPEB4, HNRNPC, HNRNPCL1, HNRNPM, HuR, RALY, PABPC1, PABPC4, SART3, and SRSF10.

Interacting proteins
Interactors of protein C20orf27 found in Y2H screen are replicase polyprotein 1ab from coronavirus, RAIYL, PHKB, FERMT2 from human. The function of replicase polyprotein 1ab is transcribing and replicating viral RNAs, and it contains the proteinases responsible for the cleavages of the polyprotein. The function of RAIYL, PHKB, and FERMT2 remain unknown.

Other interactors that discovered by pull-down assays include PPP1CA, PPP1CC, PPP1CB, PPP1R7, PSME3, RBFOX2, and DMWD. Interactors PPP1CA, PPP1CB, PPP1CC, and PPP1R7 have similar functions. They involve in the regulation of a variety of cellular processes, such as cell division, glycogen metabolism, muscle contractility, protein synthesis, and HIV-1 viral transcription. PSME3 facilitates the MDM2-p53/TP53 interaction which promotes ubiquitination- and MDM2-dependent proteasomal degradation of p53/TP53, limiting its accumulation and resulting in inhibited apoptosis after DNA damage, and might play a role in cell cycle regulation. RBFOX2 regulates alternative splicing events by binding to 5'-UGCAUGU-3' elements. The function of DMWD is unknown.

The above evidence suggests protein C20orf27 plays a role in cell cycle regulation, cell proliferation and differentiation, and cell survival.

Clinical significance
Human protein C20orf27 and its variants have not been discovered to be associated with any diseases or disorders.

Paralogs
There are no known paralogs.

Orthologs
There are about 281+ known orthologs for this gene, ranging from primates to invertebrates. The closest related orthologs are selected from primates and mammals, and the sequence similarity ranks from 75% to 100%. The moderately related orthologs are selected from fishes and birds, and the sequence similarity ranks from 55% to 75%. The most distantly related orthologs are selected from invertebrates and trichoplax, and the sequence similarity ranks from 40% to 55%. The conserved amino acids are bold in the conceptional translation diagram.