User:Arahmatullah/sandbox

c7orf26 (Chromosome 7, Open Reading Frame 26) is a gene in humans that encodes a protein known as c7orf26 (uncharacterized protein c7orf26). Based on properties of c7orf26 and its conservation over a long period of time, it's suggested function is targeted for the cytoplasm and it is predicted to play a role in regulating transcription.

Background
Chromosome 7 is one of the 23 pairs of chromosomes in the human body, and spans about 159 million base pairs and represents about 5-5.5% of the total DNA in cells. Changes to the structure of chromosome 7 can result in a number of genetic abnormalities, including Williams Syndrome which causes structural and cosmetic changes to the human body, ultimately resulting in a shorter lifespan. There are hundreds of known open reading frames (ORF) along the domain of chromosome 7, however there is not much known about the 26th reading frame, which is of considerable interest.

Currently, two isoforms are known in Homo Sapiens and are referred to as isoforms 1 and 2, respectively.

Location
c7orf26 (accession: NM_024067 / NP_076972; alias: MGC-2178) is located on the long arm of chromosome 7 (7p22.1), starting at 6590021 and ending at 6608726. The c7orf26 gene spans 2178 base pairs and is orientated on the + strand. The coding region is made up of a protein sequence measuring 449 amino acids long. It is divided into 6 transcripts containing a total of 24 exons on the forward strand and has 5952 unique Single Nucleotide Polymorphisms (SNPs).

Gene Neighborhood
Genes ZDHHC4, ZNF853 and ZNF316 neighbor c7orf26 on chromosome 7. Gene ZDHHC4 is a zinc-finger protein involved with cytochrome-c oxidase activity and protein-cysteine S-palmitoyltransferase activity and has overlapping regions with c7orf26. Gene GRID2IP lies upstream by >2000 bp of c7orf26, and is heavily involved with in synaptogenesis and synaptic plasticity.

Expression
c7orf26 is highly expressed in lymphatic, reproductive, and nervous tissue. These include the brain (frontal and occipital cortex), thymus glands, salivary glands, endometrium, cervix, and prostate. It is intermediately expressed in the lungs.

Paralogs
No paralogs of c7orf26 have been found in the human genome, however, six unique isoforms have been identified. They are c7orf26 isoform (X1, X2, X3, X4) and isoform 2 (two sub-isoforms identified).

Orthologs
Below is a table of a variety of orthologs of the human c7orf26. The table include closely, intermediately and distantly related orthologs. Orthologs of the human protein c7orf26 are listed above in descending order of the date of divergence. c7orf26 is highly conserved throughout all orthologs, this is demonstrated with a 65% identity in the least similar ortholog. c7orf26 has evolved slowly and evenly over time.

General Properties
The molecular weight of c7orf26 is 50 kiloDaltons. The isoelectric point is 7.61. The protein sequence is uniquely rich for leucine at 15.8% of its composition, this may indicate a leucine-zipper. Further analysis from PSORT indicates that a leucine-zipper region is found at amino acid 318 and lasts until position 340 (22 amino acids long). There are no extremes with regards to acidity and alkalinity. c7orf26 has a positive charge cluster from amino acid 245 – 275 and does not have any negative, or mixed charge clusters.

Composition
There is an even distribution of amino acids comprising c7orf26. The percent composition of each amino acid is fairly consistent throughout the orthologs of the protein. The most distant ortholog displays the most variance in amino acid composition. There is a higher percent composition of tyrosine, histidine and leucine and a lower composition of valine and alanine.

Post-Translational Modifications
c7orf26 is highly phosphorylated post modification. There are 66 predicted phosphorylated sites according to the NetPhos predictor of phosphorylation sites. There are 4 unique sumoylation sites according to SUMOplot/SUMOsp programs. Sumoylation sites are involved in a number of cellular processes, including nuclear-cytosolic transport, transcriptional regulation and protein stability.

According DAS-TMFilter Server, c7orf26 has zero predicted transmembrane sites or transmembrane protein coding regions, therefore, it can be inferred with certainty that c7orf26 is not a transmembrane protein.

Secondary Structure
Using the GOR (Garnier-Osguthorpe-Robson) method, it can be inferred that c7orf26 has unique secondary structure comprised of alpha helices, random coil regions and extended strands. Random coil regions are most found in c7orf26, as they comprise 53.23% of the protein, while alpha helices comprise 34.30% and extended strands at 12.47%.

Subcellular localization
According to PSORT, c7orf26 is predicted to be localized in the cytoplasm with 70.6% confidence.

Interacting Proteins
c7orf26 interacts uniquely with 11 different proteins, according to the Mentha interactome browser. In particular, c7orf26 interacts with the entire family of 'INTS' (Integrator Complex Subunit 1-7). The Integrator Complex associates with the C-terminal domain of RNA polymerase II large subunit. It is involved in the transcription and processing of their transcripts. INTS mediates recruitment of cytoplasmic dynein to the nuclear envelope.

Outside of the INTS gene family, c7orf26 interacts with AK5, HDGF, and ASUN.

Clinical Significance
According to Guirato et. al (2018), there may be some evidence that regions on chromosome 7 may be directly linked to a nuclear estrogen receptor (ESR2) that modulates cancer cell proliferation and tumor growth. In another journal article by Fu et. al (2014), there is further indication that regions along chromosome 7, located between open reading frames 20-30, directly correlate to cellular functions of a hepatoma-derived growth factor (HDGF), another way of expressing normal function in tumorigenesis.

Suggested Reading

 * 1) Subaran, Ryan L.; Odgerel, Zagaa; Swaminathan, Rajeswari; Glatt, Charles E.; Weissman, Myrna M. (2016-4). "Novel variants in ZNF34 and other brain-expressed transcription factors are shared among early-onset MDD relatives". American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics: The Official Publication of the International Society of Psychiatric Genetics. 171B (3): 333–341. doi:10.1002/ajmg.b.32408. ISSN 1552-485X. PMC PMCPMC5832964.
 * 2) Stelzl, Ulrich; Worm, Uwe; Lalowski, Maciej; Haenig, Christian; Brembeck, Felix H.; Goehler, Heike; Stroedicke, Martin; Zenkner, Martina; Schoenherr, Anke (2005-09-23). "A human protein-protein interaction network: a resource for annotating the proteome". Cell. 122 (6): 957–968. doi:10.1016/j.cell.2005.08.029. ISSN 0092-8674. PMID 16169070.
 * 1) Subaran, Ryan L.; Odgerel, Zagaa; Swaminathan, Rajeswari; Glatt, Charles E.; Weissman, Myrna M. (2016-4). "Novel variants in ZNF34 and other brain-expressed transcription factors are shared among early-onset MDD relatives". American Journal of Medical Genetics. Part B, Neuropsychiatric Genetics: The Official Publication of the International Society of Psychiatric Genetics. 171B (3): 333–341. doi:10.1002/ajmg.b.32408. ISSN 1552-485X. PMC PMCPMC5832964.
 * 2) Stelzl, Ulrich; Worm, Uwe; Lalowski, Maciej; Haenig, Christian; Brembeck, Felix H.; Goehler, Heike; Stroedicke, Martin; Zenkner, Martina; Schoenherr, Anke (2005-09-23). "A human protein-protein interaction network: a resource for annotating the proteome". Cell. 122 (6): 957–968. doi:10.1016/j.cell.2005.08.029. ISSN 0092-8674. PMID 16169070.