SMIM23

SMIM23 or Small Integral Membrane Protein 23 is a protein which in humans is encoded by the SMIM23 or c5orf50 gene. The longer mRNA isoform is 519 nucleotides which translates to 172 amino acids of a protein. In recent advancements, researchers have identified this gene, along with a few others, could potentially play a role in how facial morphology arises in humans.

Gene
SMIM23 is a protein-encoding gene. Basic information about its aliases and chromosome location are given in the table. The schematic of the chromosome helps to visualize the location of the gene.

mRNA
While the gene has two splice isoforms (isoforms X1 and X2), it has three exon/exon boundaries indicating four exons (nucleotide 1-105, 106-157, 158-225, and 226-519).

Physical features
SMIM23 notably has a transmembrane domain.

The predicted isoelectric point for the unmodified/unprocessed protein in mice is 5.779 while only the transmembrane region in humans has an isoelectric point of 5.928

The gene appears to be Leucine and Glutamic Acid rich though not at any usually high number. It is also weak in all other amino acids besides Alanine, Serine, and Glutamine.

The region underlined in the conceptual translation was predicted to be an Involucrin repeat.

Post-Translational modifications
The transmembrane region is 1674.2 daltons while the whole protein is 200008.51 Da. This is very similar to what was found with UniProt where predicted molecular weight was 20.025 kDa. Antibody kits were investigated to see banding pattern and weight changes that may have occurred post translation. C5orf50 Polyclonal Antibody from ThermoFisher Scientific has a Western Blot banding pattern at 40 kDa. This predicts that there is a significant amount of post-translational modification by addition of large components.

There are many phosphorylation sites along its sequence including two protein kinase C phosphorylation sites, cAMP- and cGMP-dependent protein kinase phosphorylation site, and a tyrosine kinase phosphorylation site. There is also a confident potential C-terminal GPI-Modification Site.

Secondary structure
There are two stretches of alpha helices from amino acid 33 to 49 and 89 to 136 based on evidence from various programs that predict secondary structure. The most informative of all the programs from the ones investigated is PELE on Biology Workbench. A 3D protein structure was predicted to look like a series of helices, similar to what was predicted by other programs.

Subcellular localization
This human integral membrane protein is predicted to be found in the endoplasmic reticulum. The same kind of investigation of protein localization in other types of species returned conflicting results. Many programs predicted the protein to be present in the cytosol. This suggests the possibility of incorrect naming, i.e. the protein may not be integral membrane due to other predicted locations. This type of conclusion will require further information.

Expression
Not enough consensus exists as to where in the body SMIM23 is expressed. Databases indicate mainly in the testes, but this may be due to the lack of data.

Regulation of Expression
The promoter region of SMIM23 is approximately 1192 nucleotides long with various predicted transcription factors.

Regulation in the secondary structure is a predicted stem-loop in the 5' UTR region with a few areas of conservation across species.

Function and clinical significance
Novel research has suggested that how face shape arises in individuals may be influenced by a set of genes. This set includes SMIM23. Though in the paper the gene is referred to by an alias (C5orf50), it is clear that the scientists have gathered a list of five genes that likely determine facial shape. This is specifically people of European descent. These findings are supported by replicating phenotypes of each specific gene and statistical analysis. Just like findings elsewhere, the article mentions SMIM23 that likely codes for an unknown transmembrane protein. There have also been studies where a set of genes including SMIM23 may influence human height. Furthermore, a great deal of research is being done on chromosome 5 in general to understand roles of certain genes on it including SMIM23. This could one day provide insight into this gene’s specific roles on the chromosome itself.

Interacting proteins
The following proteins are predicted to interact with SMIM23.

Cilia And Flagella Associated Protein 43 also known as CFAP43 or WDR96 is the most confident of the predicted functional partners and is a tryptophan-aspartic acid repeat domain.

SFR1 is SWI5-dependent recombination repair 1 which is a component of the SWI5-SFR1 complex, a complex required for double-strand break repair via homologous recombination.

COL17A1 is collagen. Specifically type XVII, alpha 1. This may play a role in overall protein structure.

PRDM16 binds to DNA and acts as a transcriptional regulator. It functions in the differentiation between white and brown adipose tissue. It can also be a repressor of transforming growth factor-beta signaling.

Homology and evolution
There are no known paralogs.

There are around 100+ known orthologs which range from primates to small ground animals. From these investigations and that of sequence similarity, an ortholog space can be discussed. The closest relatives to humans with the SMIM23 gene were in primates so two types of monkeys were picked which diverged around 29.4 million years ago and had sequence similarities in the high 70s. Slightly more distant relatives with the gene come from a wide variety of animals from horses, to sea mammals, to bats, and more which all have similarities between 62-69%. Lastly, some distantly related orthologs were included like the Tasmanian devil and various scavenger animals which have similarities between 40-61%.

It is interesting to see how some portions are still highly conserved (see conceptual translation above). The most interesting motif is tryptophan 124, leucine 125, and aspartic acid 126. Lastly, in BLAST a protein family of unknown function was returned. There are two small conserved sequences part of the DUF4635 motif (LEQ and DLE). So though not completely conserved in the alignments done with SMIM23, these were labeled in the conceptual translation.

Orthologs
The protein was not found in bacteria, archaea, protists, plants, fungi, invertebrate, reptiles, and birds. All the found orthologs were under mammals. An unrooted phylogenetic tree of SMIM23 was created with a few close, moderately related, and distant orthologs (listed in table). Here, larger the distance (length of line), longer the time to last common ancestor. Sequence identity refers to similar amino acids while similarity refers to amino acid match.

Suggested Reading

 * Liu F, van der Lijn F, Schurmann C, Zhu G, Chakravarty MM, Hysi PG, et al. (2012) A Genome-Wide Association Study Identifies Five Loci Influencing Facial Morphology in Europeans. PLoS Genet 8(9): e1002932. https://doi.org/10.1371/journal.pgen.1002932
 * Lowe JK, Maller JB, Pe'er I, Neale BM, Salit J, Kenny EE, et al. (2009) Genome-Wide Association Studies in an Isolated Founder Population from the Pacific Island of Kosrae. PLoS Genet 5(2): e1000365. https://doi.org/10.1371/journal.pgen.1000365
 * Greliche N, Germain M, Lambert J-C, et al. A genome-wide search for common SNP x SNP interactions on the risk of venous thrombosis. BMC Medical Genetics. 2013;14:36.
 * Schmutz J et al. (2004). The DNA sequence and comparative analysis of human chromosome 5. Nature, 431(7006), 268-74. https://dx.doi.org/10.1038/nature02919
 * Lango Allen H, Estrada K, Lettre G, et al. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature. 2010;467(7317):832-838.
 * Rose JE, Behm FM, Drgon T, Johnson C, Uhl GR. Personalized Smoking Cessation: Interactions between Nicotine Dose, Dependence and Quit-Success Genotype Score. Molecular Medicine. 2010;16(7-8):247-253.