User:MorganW34/sandbox

Overview
A gene called C1orf159, is also refered to as chromosome 1 open reading frame 159. The protein encoded by this gene is called uncharacterized protein C1orf159 isoform 2 precursor. Its size is 198 amino acids and the gene is found in human organ tissue including, the testis, skin, spleen, stomach, prostate, endometrium, fat, and several others. Besides being expressed in humans, it is also present in house mice, cattle, and domesticated chickens. While there is not an answer for the function of C1orf159, it is possible that it is related to alternative RNA splicing, or works to secrete specific molecules or is part of the transmembrane.

Genes
The C1orf159 gene has an alias listed as FLJ20584 on GeneCard. According to NCBI, its locus is NM_017891 and is linear. There are nine exons in total for the gene and the span of C1orf159 is 1832 base pairs. It is located at GRCh37.p13 on the minus strand. In the neighborhood that C1orf159 is in, it is surrounded by three unknown genes downstream. Upstream, it is around lnc-RNF223-4 Gene and ENSG00000285812 gene.

Transcripts
From NCBI, there are three documented transcript variants for the protein C1orf159. Transcript variant 1 is 2432 base pairs contains 12 exons. Transcript variant 3 is 2324 base pairs with 11 exons. Transcript variants 1 and 2 have 100% identities while isoform 3 has a 99% identity. Transcript variant 1 has 100% query cover, transcript variant 2 has a query cover of 75%, and transcript variant 3 has one of 95%. Each of the transcript variants refer to the each of the corresponding C1orf159 protein isoform precursor. The molcular weight is confirmed to be 20.8 kdal. According to these results, the composition of amino acids for C1orf159 is highly composed of alanine (13.1%), glycine (10.1%), and proline (10.6%). This protein does not contain any amino acid seqments with any plusses or minuses. There are two domain motifs that are in the protein, C1orf159 that were found from MotifFinder. The first one called DUF4501, is a family of proteinsand hypothesized to be a single-pass membrane protein but the specific function is still unknown. The family of proteins has many conserved cysteine residues. This family is also found in eukaryotes. The second domain is known as pfam00268 and is associated with ribonucleotide reductase and small protein chains. The predicted secondary structure of C1orf159 has other types of structure besides helix and strands, along with strands and helixes. The presence of helixes are mainly in the amino acids from 5-18 and 112-119. The strands are spread out in the sequence in smaller chunks, like from 61-65 and 130-133. Predicted tertiary structure was gathered from Phyre 2 where it seems that the protein has coils and helixes. The tertiary structure shows that the helixes are in between the coils and there is one set of coils that is significantly longer than the other. The 4D structure can be analyzed to understand the topolgy within the membrane.

Gene Level Regulation
The beginning of the promoter begins at the the first guanine base where the end of the promoter sequence is. There are 20 chosen transcription binding sites out of hundreds due to the high matrix similarity. Some important ones to note are the Human and murine ETS1 factors with the sequence, agaaaccCGGAgggcgccggg, and the CCAAT binding factors encoded by the sequence, tgagCCCAtgagctc. These transcription factors are in the beginning of the DNA and are both on the positive strand with a matrix similiarity of 83% or more. For the gene C1orf159, it seems that in normal human tissue, the tissue where this gene has the highest expression is in the spinal cord. This is because the ranking for the samples of the tissue is higher than the two smaller samples for the pancreas. One sample of pancreas tissue has low gene expression and another has a moderate level. For the sets of RNA seq data on NCBI, C1orf159 is expressed the highest in the testis and skin for normal human tissue. For the RNA sequencing of RNA from 20 human tissues, the gene is most prevalent in the brain as a whole, cerebellum, lung, prostate, stomach, and thymus. According to the Illumina bodyMap2 transcriptome figure, the gene is again expressed the most in the testis. In the figure showing levels of expression for the circular RNA induction during fetal development, C1orf159 shows up the most in the heart, kidney, lung, and stomach at 10 weeks with decreasing expression in later weeks. C1orf159 also had an abundance of 48 for the cell line of humans. In the human brain map, C1orf159 is also shown to be expressed in the human brain, but all at lower levels of expression. The protein is predicted to be localized to the transmembrane.

Transcript Level Regulation
According to NCBI, variant 2 uses an alternative in-frame splice junction to the the first variant. The transcript start region has the base pairs TGT, and is at position 9. The microRNA is located at base pair 1152 and again at 1548.

Protein Level Regulation
The possible cleavage site is between 18 and 19. There is also a cleavage signal peptide from 1 to 18. There is a prediction for the transmembrane region for 112-128 amino acids with one TM found in P Sort ll. However, the ALOM score is negative, which is a sign that the prediction is off. The prediction said that the N-terminal side would be inside. There is a prediction for a cleavage site for mitochondrial preseq with a R-2 motif at 14. The ER membrane retention signal is predicted to be in the N-terminus at ALRH. According to DeepLoc, the areas of the cell with the highest likehihood for the protein to be in are in the cell membrane at 0.5471 and the endoplasmic reticulum at 0.226. The likelihood for C1orf159 being a membrane protein over a soluble is 0.966. Post-transitional modifications in C1orf159 include those connected with SUMO interactions and phosphorylation sites. The PTM associated with the SUMO interaction site goes from position 34-38 and is unnamed. There are many PTM’s associated with phosphorylation for this protein, so three were chosen to indicate where the serine and threonine phosphorylation sites an to attempt getting very different phosphorylation sites. The first one, called AGC/Akt/AKT3 phosphorylation site and is at position 184.[3] The second phosphorylation site is called RGC at position 45.[4] The third predicted phosphorylation site is called CAMK/CAMKL/PASK where it is located at position 97.[5]

Homology/Evolution
The ortholog sequences that were found for C1orf159 were only homologs. These sequences are in the taxonomic groups of fish, mammals, reptiles, birds, and amphibians. The kinds of mammals have the C1orf159 homolog protein include primates, rodents, bats, cattle, and shrew. The protein seemed to be mostly conserved throughout the species that have the gene.

Function/Biochemistry
While the function of this protein is not well-known, there are possibilities that it is associated with the transmembrane or secretion and RNA splicing. Because the RNA splicing can result in genes that code for cancerous cells. The way that RNA splicing works with prostate cancer is still largely unknown but since C1orf159 is expressed in the prostate, this suggests that the protein may have an effect on cancerous cells, specifically the prostate due to the RNA splicing. [1] Ibid.

Interacting Proteins
There are several proteins that interact with C1orf159.

Clinical significance
C1orf159 has a clinical significance in that it may be associated with prostate cancer. Additionally there some SNPs in the DNA that are important in determining any connections to disease.