SNED1

SNED1 (Sushi, Nidogen, and EGF-like Domains) is an extracellular matrix (ECM) protein expressed at low levels in a wide range of tissues. The gene encoding SNED1 is located in the human chromosome 2 at locus q37.3. The corresponding mRNA isolated from the spleen and is 6834bp in length, and the corresponding protein is 1413 amino-acid long. The mouse ortholog of SNED1 was cloned in 2004 from the embryonic kidney by Leimester et al. SNED1 present domains characteristic of ECM proteins, including an amino-terminal NIDO domain, several calcium binding EGF-like domains (EGF_CA), a Sushi domain also known as complement control protein (CCP) domain, and three type III fibronectin (FN3) domains in the carboxy-terminal region.

Locus
SNED1 is located on the plus strand of chromosome 2 at locus 2q37.3. The Refseq identification number is NM_001080437.3 The genomic DNA sequence of SNED1 contains 98,159bp and the longest spliced mRNA as predicted by AceView is 7048bp and contains 31 exons. There are 9 predicted splice variants of SNED1 that exhibited protein structure matches using the Phyre 2 database which is discussed under "Tertiary and Quaternary Structure".

Common aliases
SNED1 is an acronym for Sushi, Nidogen, and EGF-like Domains 1. Obsolete aliases for SNED1 include Snep, SST3, and IRE-BP1.

Homologs and phylogeny
SNED1 is highly conserved throughout evolutionary history and is shown to exhibit this conservation across vertebrates including fish, reptiles, amphibians, birds, and mammals. It is unclear that SNED1 is conserved in invertebrates, but protein domains found in SNED1 are also found in invertebrates. It may be worth noting that the abundance of cysteine residues, mostly located within EGF-like domains where they form disulfide bonds, appears to be very highly conserved, suggesting that the cysteine richness is a very important feature of this protein.

Paralogs
SNED1 has several paralogs within the human genome, which cover small portions of the entire peptide sequence. Genes encoding proteins sharing domains (EGF-like, Sushi) with SNED 1 include the neurogenic locus notch homolog (NOTCH) proteins, the jagged proteins, eyes shut homolog proteins, the crumbs homolog proteins, delta and notch-like epidermal growth factor receptors, the sushi von Wilebrand factor A protein (SVEP1), and slit homolog three protein.

Primary sequence
The Protein Knowledge Database, UniProt, reports that the full length SNED1 protein is 1413 amino-acid long (UniProt Q8TER0).

The full sequence obtained by an NCBI BLAST search can be accessed with the reference ID NP_001073906.1. One presumably important feature of this protein that is worth noting is that it is extraordinarily cysteine rich, with 107 cysteines total, giving an overall cysteine composition of 13.2%.

Domains and motifs
SNED1 is a secreted protein of the extracellular matrix. It contains a signal peptide (amino acid 1-24) directing the protein to the secretory pathway.

Precise prediction of domain boundaries can be obtained using the InterPro domain database or SMART.

There are various interesting domains in this protein. The first in the annotated sequence above shown in pink, is the NIDO domain, also found in the Nidogen-1 protein, also known as Entactin. Other than SNED1, this domain is shared with only four human proteins: the basement membrane proteins nidogen-1, nidogen-2, and alpha-tectorin; and mucin-4, which has been demonstrated to play a role in promoting pancreatic cancer metastasis.

The second regions of interest shown by an underline are calcium-binding EGF domain (EGF-CA). There are many of these domains in the sequence and they are often present in a large number of membrane bound and extracellular proteins. These EGF-CA domains may suggest a "sticky" nature to this protein as oftentimes extracellular matrix (ECM) proteins require calcium cations to form homo- and hetero-dimeric complexes between other ECM proteins. The Sushi domain or complement control protein (CCP) motif is annotated in green in the figure and this domain has been identified in many proteins involved in the complement system. Other aliases for this domain include short consensus repeats (SCRs) and the Sushi domain, from which the protein gets its name. The Fibronectin type III domain (FN3) is annotated in blue and the presence of this domain may suggest one of the properties of this protein as being involved in cell adhesion. SNED1 contains an RGD and a LDV sequence, important in the binding of other ECM proteins to integrins that are proteins found in cell membranes, an mediate cell-ECM interactions.

Post-translational modifications
13 N-glycosylation sites are predicted in the sequence of SNED1, and the presence of N-linked sites has been determined experimentally. SNED1 also has several predicted attachment sites for O-linked glycans and glycosaminoglycans, but these have not yet been validated experimentally at this time.

There was only a few post-translational kinase dependant phosphorylation sites worth noting that resulted in a score of >0.8 by the NetPhosK program in the ExPASy Bioinformatics suite proteomics tools. These sites are annotated with yellow highlight in the conceptual translation above. All of these sites are predicted to be phosphorylated by either Protein kinase A (PKA) or Protein kinase C (PKC). Experimental evidence exists for phosphorylation at 12 residues: 5 serine, 5 threonine, and 2 tyrosine residues.

Secondary structure
The amino acid sequence of the longest variant is incredibly cysteine rich, presumably resulting in a large amount of disulfide bond formation. The beta sheets are annotated as purple text in the conceptual translation and the alpha-helices are annotated as red text.

The percentage of intrinsic disorder of processed human SNED1 (residues 25–1413) predicted by IUPred2A is 15.3%. A large proportion of random coil (73%) was predicted in SNED1 together with 26% of β-strands, and 1% of helix corresponding to a sequence found in the amino-terminal region of SNED1

Tertiary and quaternary structure
[This section needs referencing to figures and experimental demonstration] The program Phyre2 was used to construct predictions of both the conserved domain regions NIDO, CCP, and FN3, as well as each of the splice variants. There were some interesting results consistent with the proposed function of an extracellular "sticky" protein possibly involved in cell-cell adhesion or in clotting. Protein matches found in Phyre2 comprise an array of proteins with functions of; clotting, hydrolysis, plasminogen activation, hormone/growth factor, protein binding, cell-adhesion, and ECM proteins. Splice variants a, b, and e, ihave >99% structural similarity to the protein neurexin 1-alpha (NRXN1). Neurexins are cell adhesion molecules and often contain EGF binding domains, enhancing intracellular junction forming between cells. NRXN1 is also proposed to play a role in angiogenesis. Alpha-neurexins interact with neurexophilins and possibly function in the synaptic junctions of the vertebrate nervous system. Alpha neurexins often utilize alternate promoters and splice sites, resulting in many different transcripts from one gene, may be an explanation of this gene's abundance of alternative transcripts. Splice variant d has a 100% structural match to Low density lipoprotein receptor-related protein 4 (LRP4). This protein is involved in SOST-mediated bone formation inhibition and inhibition of Wnt signaling. LRP4 plays an important role in the formation of neuromuscular junctions. Splice variants f and g have >99% similarity to fibrillin-1, an ECM protein that is a structural component of calcium binding microfibrils. Splice variant i and conserved domain CCP are >99% structurally similar to t-plasminogen activator (PLAT). PLAT is secreted by vascular endothelial cells and acts as a serine protease that converts plasminogen to plasmin. Plasmin is a fibrolytic enzyme that aids in the breakdown of blood clots and is used clinically for that exact purpose. The conserved domain NIDO, was >99% similar to coagulation factor IX, also known as Factor IX (F9). F9 is a secreted coagulation factor involved in the clotting cascade that required activation by multiple other coagulation factors within the cascade. The 3 consecutive conserved FN3 domains together are 100% similar with 100% coverage to anosmin 1. Anosmin-1 is an ECM glycoprotein responsible for normal neural development of the brain, spinal cord and kidney.

Interacting proteins
Computational prediction by several databases, focusing on secreted proteins and membrane proteins, resulted in the prediction of 114 unique interactions by at least one algorithm, including SNED1 auto-interaction. More than half of the protein partners of SNED1 were annotated as membrane proteins in UniProtKB. 47 extracellular proteins were identified as SNED1 binding partners, including 30 core matrisome proteins, 10 matrisome-associated proteins, and seven secreted proteins. Among the 30 matrisome proteins are 6 collagens: COL6A3, found in basement membranes and other ECMs, COL7A1, and the Fibril-Associated Collagens with Interrupted triple-helices (FACITS), all containing a thrombospondin domain, COL12A1, COL14A1, COL16A1, COL20A1); and a number of ECM glycoproteins: 4 tenascins (TNC, TNN, TNR, and TNXB), fibronectin (FN1), the latent-TGFβ binding protein 2 (LTBP2), and the basement membrane glycoproteins nidogens 1 and 2.

Independently, the STRING-Known and Predicted Protein Interaction database was used to determine proteins that may be interacting and the following proteins were candidates for interaction: somatostatin (SST), somatostatin receptor 2 (SSTR2)as well as a variety of other somatostatin receptors, spermine synthase (SMS), and TMEM132C. All of the somatostatin related proteins are involved in the inhibition of hormones. There is very little known about TMEM132C and all publications related to the protein are mass genome screens. The protein expression profiles of TMEM132C and SNED1 are very similar to SNED1, with protein abundance found in blood plasma, platelets, and liver. All of the interacting proteins described are expressed in these three common areas.

Expression
SNED1 is ubiquitously expressed at low to intermediate levels in adult tissues, making it unclear from RNA expression profiles, which cells are secreting SNED1 in tissues. Experimental data obtained in mice have shown that the Sned1 promoter is broadly active during embryogenesis, particularly in the limb buds, tail, sclerotome, vertebrate and ribs, lung, kidney, adrenal gland, cerebellum, choroid plexus, and head mesenchyme. The protein expression profiles of SNED1 predicted with MOPED-Multi-Omics Profiling Expression Database and PaxB-Protein Abundance Across Organisms database indicate that the protein is found in blood serum, blood plasma, blood T-lymphocytes, platelets, kidney Hek-293 cells, liver, and low levels in the brain.

Transcript variants
The program Aceview was used to predict transcript variants, shown in Figure 6. There are 9 spliced forms and 3 unspliced forms. Three of the transcript variants, b, c, and e, contain green regions that represent uORFs which indicate that they contain regulatory elements within the coding region of the transcript. All of the spliced transcript variants a-i were analyzed with the Phyre2 server to predict protein structure. See, "Tertiary and Quaternary Structure". The existence of the splice variants are has not been yet validated experimentally.

Promoter
The promoter was predicted and analyzed for transcription factor binding sites using the ElDorado software on the Genomatix software suite. There were alternative promoters downstream of the selected 845bp promoter.

Transcription factors
The following transcription factors were found with a matrix similarity of 1.00 and the entire binding domain was matched in the ElDorado predicted promoter.

Protein functions and Clinical significance
A select cases on NCBI's GeoProfiles highlighted some clinically relevant expression data regarding SNED1 expression levels in response to certain conditions. In aldosterone producing adenoma versus control lung tissue, SNED1 expression decreased about 25 fold in the adenoma tissue. In a development study on the transition from oligodendrocyte precursors to mature oligodendrocytes, expression decreased almost 100 fold upon differentiation into mature oligodendrocytes. It may be interesting to explore the expression in clotting disorders or other blood related diseases. A seminal study published in 2014 has demonstrated that SNED1 was a promoter of breast cancer metastasis.

The recent generation of a Sned1 knockout mouse model is also shedding light on the multiple roles of SNED1 in development and physiology. The global Sned1 knockout leads to early post-natal lethality and severe craniofacial and skeletal anomalies, indicating that Sned1 is an essential gene.