User:Wanederner

Figure 1. NCBI VAST depiction of    TMEM14C with transmembrane regions. TMEM14C

TMEM14C or transmembrane protein 14C, is a transmembrane protein gene found in Homo sapiens, as well as many other animal species spanning from vertebrates to invertebrates, along with various plant species1. Gene TMEM14C is a 1,187 base pair gene that encodes a 112 amino acid transmembrane protein found in the genome of Homo sapiens, along with many other orthologs. The gene is located on the petite arm of chromosome 6, in region 2, band 4, and sub-band 2 (6p24.2), between PAK1IP1 and TMEM14B (a paralog of TMEM14C). It has 6 exons and is highly expressed in fat, adrenal, and 25 other tissue types in Homo sapiens, with the highest expression occurring in fat and lowest measured expression in the pancreas1. It has 5 aliases, including C6orf53, BA421M1.6, HSPC194, MSTP073, and NET26 One pseudogene has been discovered, TMEM14EP, which is located at 3q25.2 and could be involved in overexpression of MTBP cells, which is associated with poor survival in lung adenocarcinoma. TMEM14C is conserved across many Eukaryotic orthologs, spanning from Metazoa (Alligator mississippiensis) to Bacteria (Chlamydia trachomatis)1 Google Patents revealed no active patents for TMEM14C, but when searching for C6orf53 (an alias), four patents are active with possible linkage to colorectal cancer, based on mass value of associated markers6.

Transcripts TMEM14C has one known transcript variant, designated TMEM14C transcript variant 2. Both gene variants encode the same protein and have 6 exons, but variant 2 has 171 less base pairs in the 5’ Untranslated Region of Exon 1 than variant 1.1

Proteins Molecular weight data analysis suggests TMEM14C to weight about 23 kdal. However, when a SAPS protein composition analysis was done, a molecular weight of 11.6 kdal was found. Interestingly, the two differing values are about a magnitude of 2 in difference, but due to the antibody being the only one found for TMEM14C in research, it is more probable that the correct molecular weight was determined by SAPS. No isoelectric point data was found during research on protein analysis. There appears to be significantly high relative G, M and AGP content, and significantly low relative E, D, and T content. I-Tasser depicts a very long and filamentous structure containing loose and flexible alpha helices. This prediction is supported by the amino acid composition analysis data found in SAPS, as TMEM14C is extremely high in Glycine content. This is important as Glycine is the simplest amino acid, and a high G content would suggest loose and flexible alpha helix composition in structure. Figure depicts a bulbous tertiary structure for the protein, which appears very bulky due to the high transmembrane region content of the protein, which prevents a very loose or lengthy structure. Colors separate chemical makeup of amino acids present in structure, with no particular indication or significance of color makeup.

Gene Level Regulation Promoter Homo sapiens TMEM14C Promoter

ZF07 CTTACTCCCCATCCCCCTTTAAGATGCTTCTCAAGCAGCCTCACTCCCTA 50 GAATGAGGGGTAGGGGGAAATTCTACGAAGAGTTCGTCGGAGTGAGGGAT KLFS FKHD                  NF1F AAACAAACGGATATTTGGCTTTGAGCCACATTATTCTGCAGCCCCCGTAT 100 TTTGTTTGCCTATAAACCGAAACTCGGTGTAATAAGACGTCGGGGGCATA NF1F STAT TTTCTTCCAGGCAGGCCCTTCCTGGTGTAGACAAGATCGGGCTTGAGTGA 150 AAAGAAGGTCCGTCCGGGAAGGACCACATCTGTTCTAGCCCGAACTCACT

CDXF CACTCCTCCCTGATGGCCTGCATCTGGTTTATACCCTGTAACTTGTTCCT 200 GTGAGGAGGGACTACCGGACGTAGACCAAATATGGGACATTGAACAAGGA VTBP HBOX                                     AP4R CTATTAATGGGGTCCTTCAAAATCCAGCCTCAGATTCCCTGGTCCCGCAG 250 GATAATTACCCCAGGAAGTTTTAGGTCGGAGTCTAAGGGACCAGGGCGTC HBOX AHRR                       CEBP CGGTGGCCTCACCTCTGGCGTGGCCGAGCTCACGTGGTCCGGCTTGTGCA 300 GCCACCGGAGTGGAGACCGCACCGGCTCGAGTGCACCAGGCCGAACACGT HESF GLIF AGTCCCAGGTCCAACTCCGGGTCTCCTGCTTTTGGCCACTCAGGATTGGA 350 TCAGGGTCCAGGTTGAGGCCCAGAGGACGAAAACCGGTGAGTCCTAACCT CLOX

GATA     Start of Txn           ETSF CCTGGGACTGATACTGGTCGGCCCTGCAGGCGCTGCGGACAGGGGAAGCA 400 GGACCCTGACTATGACCAGCCGGGACGTCCGCGACGCCTGTCCCCTTCGT INRE NACA                                         ZF02 CAGAGATTCCCCGCCGCGTTCCCTGGACTCAGGAGCTCGCCGCGATGCCC 450 GTCTCTAAGGGGCGGCGCAAGGGACCTGAGTCCTCGAGCGGCGCTACGGG MZF1

CGCCCCACTCTCCACCCGCTGA 472 GCGGGGTGAGAGGTGGGCGACT SP1F Figure 2. TMEM14C promoter with annotated restriction enzymes. Data gathered from Genomatix. Expression Pattern

Figure 3. RNA-seq was performed of tissue samples from 95 human individuals representing 27 different tissues in order to determine tissue-specificity of all protein-coding genes.1

Figure 4. Transcription profiling by high throughput sequencing of individual and mixture of 16 human tissues RNA.1

Figure 5. 35 human fetal samples from 6 tissues (3 - 7 replicates per tissue) collected between 10 and 20 weeks gestational time were sequenced using Illumina TruSeq Stranded Total RNA.1

It appears that TMEM14C has high expression among many tissues, and does not appear to be developmentally specific, as expression is heavily present in both fetal and adult tissue samples

Figure 6. Microarray Data of TMEM14C Tissue Expression.

This expression chart of three different samples for each major human tissue shows a very high percentile rank of each tissue, ranging from about 80% to roughly 95%, indicating TMEM14C’s significance as a gene within all tissue types. Although some tissues have low counts compared to others, the abundance relative to other genes found in the same tissue types is very high.

Transcript Level Regulation

mRNA localization

k = 9/23 33.3 %: endoplasmic reticulum 22.2 %: Golgi 22.2 %: cytoplasmic 11.1 %: nuclear 11.1 %: mitochondrial Figure 12. PSORTII mRNA localization analysis sorted by location.

It appears that the mRNA is most likely localized to the Endoplasmic Reticulum, which in the case of TMEM14C, indicates the protein anchoring in the plasma membrane of the ER, with its cytosolic portions in the lumen of the ER, and its non-cytosolic portions outside of the ER.

miRNA Targeting

No observable miRNA were found in a TargetScan program run of TMEM14C.

Stem Loop Structure

Figure 7. Stem Loop structures of TMEM14C. Protein Level Regulation

Human TMEM14C annotated conceptual translation (Variant 1, NM_001165258.1)

gccctgcaggcgctgcggacaggggaagcacagagattccccgccgcgttccctggactc 60 aggagctcgccgcgatgccccgccccactctccacccgctgaatgcagggcgcatgctgc 120 tacttggcggctcaagccccgcccgcaccgtccccattctctgaccgcccctctcccggt 180 acactgcgcaggcacaacagagccgctcccctctcctcgccccgccaccgggacggagag 240 cgcccgccgctgcatttccggcgacacctcgcagtcattcctgcggcttgcgcgcccttg 300 conserved tagacagccggggccttcgtgagaccgcttgttttctgcaggtgcaggcctggggtagtc 360 exon 1/2upstream stop tcctgtctggacagagaagagaaaaatgcaggacactggctcagtagtgcctttgcattg 420 exon 2/3 M Q  D  T  G  S  V  V  P  L  H  W 12  cytoplasmic yinoyang start gtttggctttggctacgcagcactggttgcttctggtgggatcattggctatgtaaaagc 480 signal peptide cleavage F G  F  G  Y  A  A  L  V  A  S  G  G  I  I  G  Y  V  K  A 32 trans.region noncytoplamic sumoylation aggcagcgtgccgtccctggctgcagggctgctctttggcagtctagccggcctgggtgc 540 exon 3/4 G S  V  P  S  L  A  A  G  L  L  F  G  S  L  A  G  L  G  A 52 trans.region nuclear export

ttaccagctgtctcaggatccaaggaacgtttgggttttcctagctacatctggtacctt 600 exon 4/5 Y Q  L  S  Q  D  P  R  N  V  W  V  F  L  A  T  S  G  T  L 72 cytoplasmic yinoyang phosphorylation ggctggcattatgggaatgaggttctaccactctggaaaattcatgcctgcaggtttaat 660 A G  I  M  G  M  R  F  Y  H  S  G  K  F  M  P  A  G  L  I 92trans.region noncytoplasmic phosphorylation tgcaggtgccagtttgctgatggtcgccaaagttggagttagtatgttcaacagacccca 720 exon 5/6 A G  A  S  L  L  M  V  A  K  V  G  V  S  M  F  N  R  P  H 112 8transregion sumoylation

tagcagaagtcatgttccagcttagactgatgaagaattaaaaatctgcatcttccact 780 stop *    attttcaatatattaagagaaataagtgcagcatttttgcatctgacattttacctaaaa 840 aaaaagacaccaaacttggcagagaggtggaaaatcagtcatgattacaaacctacagag 900 gtggcgagtatgtaacacaagagcttaataagaccctcatagagcttgattcttgtatat 960 tgatgttgtcttttctttctgtatctgtaggtaaatctcaagggtaaaatgttaggtgtc 1020 agctttcagggctctgaaaccccattccctgctctgaggaacagtgtgaaaaaaagtctt 1080 ttaggagatttacaatatctgttcttttgctcatcttagaccacagactgactttgaaat 1140 tatgttaagtgaaatatcaatgaaaataaagtttactataaataataaaaaaaaaaaaaa 1200 polyA a                                                           1201 Figure 8. Conceptual translation of TMEM14C protein regulatory features.

Post-Translational Modifications DAS-TMfilter http://mendel.imp.ac.at/sat/DAS/DAS.html 4 Transmembrane Regions found. GPS http://gps.biocuckoo.cn/online.php 487 predicted kinase-specific phosphorylation sites HelicalWheel https://grigoryanlab.org/drawcoil/drawcoil.pl Helical wheel representation HMMTOP http://www.enzim.hu/hmmtop/ N-terminus is outside of cell. NetGlycate http://www.cbs.dtu.dk/services/NetGlycate/ Predicted glycation of epsilon amino groups of lysines 85 and 102 NetNES http://www.cbs.dtu.dk/services/NetNES/ Predicted leucine-rich nuclear export signal at 47 L NetPhos http://www.cbs.dtu.dk/services/NetPhos-3.1/ Conserved phosphorylation sites at 71 T and 83 S Phobius http://phobius.sbc.su.se/ 4 Transmembrane regions and locations Predotar https://urgi.versailles.inra.fr/predotar/ ER predicted putative N-terminal targeting sequence PrePS http://mendel.imp.ac.at/sat/PrePS/index.html 3 predicted protein prenylation sites ProP http://www.cbs.dtu.dk/services/ProP/ Predicted signal peptide cleavage site between 18 A and 19 A PSORTII https://psort.hgc.jp/cgi-bin/runpsort.pl Subcellular localization appears to be primarily in ER. SecretomeP http://www.cbs.dtu.dk/services/SecretomeP/ One signal peptide predicted SOSUI http://harrier.nagahama-i-bio.ac.jp/sosui/ 4 Transmembrane helices found. SUMOplot https://www.abgent.com/sumoplot Predicted sumoylation sites at 31 K and 102 K TargetP http://www.cbs.dtu.dk/services/TargetP/ Subcellular location predicted in secretory pathway TatP http://www.cbs.dtu.dk/services/TatP/ 3 predicted Twin-arginine signal peptides TMHMM http://www.cbs.dtu.dk/services/TMHMM-2.0/ Transmembrane helices locations predicted YinOYang http://www.cbs.dtu.dk/services/YinOYang/ 4 T and 56 S are YINYANG sites Table 1. Findings from table are depicted in Figure 8.

Homology/Evolution Genus Species Common Name Taxonomic group Diverge-nce date (Estimat-ed) Accession Sequence Length (aa) Identity Similarity Oryctolagus cuniculus European Rabbit Mammals 90 MYA XP_002714217.1 114 92% 97% Felis catus Domestic Cat Mammals 96 MYA XP_003985861.1 114 89% 96% Gallus gallus Chicken Bird 312 MYA XP_015131459.1 107 74% 82% Balearica regulorum gibbericeps Grey Crowned Crane Bird 312 MYA XP_010300009.1 107 75% 83% Ophiophagus hannah King Cobra Reptile 312 MYA ETE69889.1 107 78% 89% Anolis carolinensis Green Anole Reptile 312 MYA XP_003223611.1 107 76% 86% Xenopus tropicalis Western Clawed Frog Amphibian 352 MYA NP_001072424.1 107 76% 87% Erpetoichthys calabaricus Reedfish Bony Fish 435 MYA XP_028646691.1 107 76% 87% Denticeps clupeoides Denticle Herring Bony Fish 435 MYA XP_028814336.1 107 76% 84% Callorhinchus milii Australian Ghost Shark Cartilaginous Fish 473 MYA NP_001279149.1 107 68% 85% Nephila clavipes Golden Silk Spider Arthropods 797 MYA PRD35861.1 124 51% 64% Orussus abietinus Parasitic Wood Wasp Arthropods 797 MYA XP_012283827.1 109 58% 70% Pseudomyrmex gracilis Elongate Twig Ant Arthropods 797 MYA XP_020285229.1 110 60% 74% Exaiptasia pallida Sea Anemone Cnidaria 824 MYA XP_020914767.1 114 54% 67% Rhizopus azygosporus N/A Fungi 1105 MYA RCI00732.1 100 57% 71% Choanephora cucurbitarum N/A Fungi 1105 MYA OBZ90553.1 104 52% 68% Acanthamoeba castellanii str. Neff N/A Protist 1480 MYA XP_004336177.1 149 51% 62% Theobroma cacao Cacao Tree Plant 1496 MYA EOY30104.1 234 47% 57% Chlamydia trachomatis Chlamydia Bacteria 4290 MYA CQB89653.1 110 48% 61% Table 2. TMEM14C orthologs orgainized by relative divergence and clade type.

Phylogenetic Tree Figure 9. Unrooted phylogenetic tree of 20 orthologs. Species names can be found in Table 2.

Divergence

Figure 10. TMEM14C Ortholog Molecular Clock plot. Corrected % Divergence is plotted against Date of Divergence in Million Years Ago. (m/100 = -ln(1-n(100)). It appears that TMEM14C has diverged over time at about the same rate as Cytochrome c has. (0.0324/0.0314) It is a slow evolving gene.1

MSA (Distant)

Figure 11. Multiple sequence alignment of TMEM14C distant orthologs. Color represents conserved amino acid chemistry, and full species information can be found in Table 2. Conserved amino acids appear to be present mostly in transmembrane regions of TMEM14C.1, 14

Function One PubMed article mentions the TMEM14C gene by name in the title, published as “TMEM14C is required for erythroid mitochondrial heme metabolism”. In the article, researchers claimed to have determined that the inner mitochondrial membrane protein is essential for erythropoiesis (red blood cell synthesis) and heme synthesis in vivo through gene expression profiling. Further support for heme biosynthesis is provided by an article that identified TMEM14C as a mitochondrial protein whose transcript coexpress the core mechanisms of heme biosynthesis consistently. Another article depicts the NET26 gene alias as being a type 2 membrane protein having possible functions in both nuclear envelope and Endoplasmic Reticulum, but failing to appear in microsomal membranes (potentially due to microsomal membranes being devoid of nuclear envelope due to intact nuclei sediment readily via centrifugation. Additionally, TMEM14C has been shown to interact with an LNX1 PDZ domain in mouse embryos, particularly in beta-galactosidase activity. LNX1 is a RING finger domain containing four PDZ domains, which are protein interaction domains found in signal transduction associated molecules. This activity suggests a role in glycosidic bond breakage in hydrolysis producing monosaccharides. Interacting Proteins

Figure 12. STRING visualization of closely associated functional protein partners.

Table 3. Associated protein partners found in Figure 12.18

Clinical Significance Mutations (SNPs)

Table 6. SNPs of TMEM14C.