SLC46A3

Solute carrier family 46 member 3 (SLC46A3) is a protein that in humans is encoded by the SLC46A3 gene. Also referred to as FKSG16, the protein belongs to the major facilitator superfamily (MFS) and SLC46A family. Most commonly found in the plasma membrane and endoplasmic reticulum (ER), SLC46A3 is a multi-pass membrane protein with 11 α-helical transmembrane domains. It is mainly involved in the transport of small molecules across the membrane through the substrate translocation pores featured in the MFS domain. The protein is associated with breast and prostate cancer, hepatocellular carcinoma (HCC), papilloma, glioma, obesity, and SARS-CoV. Based on the differential expression of SLC46A3 in antibody-drug conjugate (ADC)-resistant cells and certain cancer cells, current research is focused on the potential of SLC46A3 as a prognostic biomarker and therapeutic target for cancer. While protein abundance is relatively low in humans, high expression has been detected particularly in the liver, small intestine, and kidney.

Gene
The SLC46A3 gene, also known by its aliases solute carrier family 46 member 3 and FKSG16, is located at 13q12.3 on the reverse strand in humans. The gene spans 18,950 bases from 28,700,064 to 28,719,013 (GRCh38/hg38), flanked by POMP upstream and CYP51A1P2 downstream. SLC46A3 contains 6 exons and 5 introns. There are two paralogs for this gene, SLC46A1 and SLC46A2, and orthologs as distant as fungi. So far, more than 4580 single nucleotide polymorphisms (SNPs) for this gene have been identified. SLC46A3 is expressed at relatively low levels, about 0.5x the average gene. Gene expression is peculiarly high in the liver, small intestine, and kidney.

Transcript Variants
SLC46A3 has multiple transcript variants produced by different promoter regions and alternative splicing. A total of 4 transcript variants are found in the RefSeq database. Variant 1 is most abundant. * Lengths shown do not include introns.

Isoforms
3 isoforms have been reported for SLC46A3. Isoform a is MANE select and most abundant. All isoforms contain the MFS and MFS_1 domains as well as the 11 transmembrane regions. * Lengths shown are for the precursor proteins.

Properties
SLC46A3 is an integral membrane protein 461 amino acids (aa) of length with a molecular weight (MW) of 51.5 kDa. The basal isoelectric point (pI) for this protein is 5.56. The protein contains 11 transmembrane domains in addition to domains MFS and MFS_1. MFS and MFS_1 domains largely overlap and contain 42 putative substrate translocation pores that are predicted to bind substrates for transmembrane transport. The substrate translocation pores have access to both sides of the membrane in an alternating fashion through a conformational change. SLC46A3 lacks charged and polar amino acids while containing an excess of nonpolar amino acids, particularly phenylalanine (Phe). The resulting hydrophobicity is mostly concentrated in the transmembrane regions for interactions with the fatty acid chains in the lipid bilayer. The transmembrane domains also have a shortage of proline (Pro), a helix breaker. The protein sequence contains mixed, positive, and negative charge clusters, one of each, which are high in glutamine (Glu). The clusters are located outside the transmembrane regions, and thus are solvent-exposed. Two 0 runs that run through several transmembrane domains in addition to a +/* run in between two transmembrane domains are also present. The protein contains a C-(X)2-C motif (CLLC), which is mostly present in metal-binding proteins and oxidoreductases. A sorting-signal sequence motif, YXXphi, is also found at Tyr246 - Phe249 (YMLF) and Tyr446 - Leu449 (YELL). This Y-based sorting signal directs the trafficking within the endosomal and the secretory pathways of integral membrane proteins by interacting with the mu subunits of the adaptor protein (AP) complex. The signal-transducing adaptor protein 1 (STAP1) Src homology 2 (SH2) domain binding motif at Tyr446 - Ile450 (YELLI) is a phosphotyrosine (pTyr) pocket that serves as a docking site for the SH2 domain, which is central to tyrosine kinase signaling. Multiple periodicities typical of an α-helix (periods of 3.6 residues in the hydrophobicity) encompass transmembrane domains. 3 tandem repeats with core block lengths of 3 aa (GNYT, VSTF, STFI) are observed throughout the sequence.

Secondary Structure
Based on results by Ali2D, the secondary structure of SLC46A3 is rich in α-helices with random coils in between. More precisely, the protein is predicted to be composed of 62.9% α-helix, 33.8% random coil, and 3.3% extended strand. The regions of α-helices span the majority of the transmembrane domains. The signal peptide is also predicted to form an α-helix, most likely in the h-region. The amphipathic α-helices possess a particular orientation with charged/polar and nonpolar residues on opposite sides of the helix mainly due to the hydrophobic effect. Membrane topology of SLC46A3 shows the 11 α-helical transmembrane domains embedded in the membrane with the N-terminus oriented toward the extracellular region (or lumen of the ER) and the C-terminus extended to the cytoplasmic region.

Tertiary Structure
Model for the tertiary structure of SLC46A3 was constructed by I-TASSER based on a homologous crystal structure of the human organic anion transporter MFSD10 (Tetran) with a TM-score of 0.853. The structure contains a cluster of 17 α-helices that spans the membrane and random coils that connect those α-helices. Multiple ligand binding sites are also predicted to reside in the structure, including those for (2S)-2,3-dihydroxypropyl(7Z)-pentadec-7-enoate (78M), cholesterol hemisuccinate (Y01), and octyl glucose neopentyl glycol (37X).

Promoter
SLC46A3 carries 4 promoter regions that lead to different transcript variants as identified by ElDorado at Genomatix. Promoter A supports transcript variant 1 (GXT_2836199). * The coordinates are for GRCh38.

Transcription Factors
Transcription factors (TFs) bind to the promoter region of SLC46A3 and modulate the transcription of the gene. The table below shows a curated list of predicted TFs. MYC proto-oncogene (c-Myc), the strongest hit at Genomatix with a matrix similarity of 0.994, dimerizes with myc-associated factor X (MAX) to affect gene expression in a way that increases cell proliferation and cell metabolism. Its expression is highly amplified in the majority of human cancers, including Burkitt's lymphoma. The heterodimer can repress gene expression by binding to myc-interacting zinc finger protein 1 (MIZ1), which also binds to the promoter of SLC46A3. CCAAT-displacement protein (CDP) and nuclear transcription factor Y (NF-Y) have multiple binding sites within the promoter sequence (3 sites for CDP and 2 sites for NF-Y). CDP, also known as Cux1, is a transcriptional repressor. NF-Y is a heterotrimeric complex of three different subunits (NF-YA, NF-YB, NF-YC) that regulates gene expression, both positively and negatively, by binding to the CCAAT box.

Expression Pattern
RNAseq data show SLC46A3 most highly expressed in the liver, small intestine, and kidney and relatively low expression in the brain, skeletal muscle, salivary gland, placenta, and stomach. In fetuses of 10 – 20 weeks, the adrenal gland and intestine report high expression while the heart, kidney, lung, and stomach demonstrate the opposite. Microarray data from NCBI GEO present high expression in pancreatic islets, pituitary gland, lymph nodes, peripheral blood, and liver with percentile ranks of 75 or above. Conversely, tissues among the most lowly expressed levels of SLC46A3 include bronchial epithelial cells, caudate nucleus, superior cervical ganglion, smooth muscle, and colorectal adenocarcinoma, all with percentile ranks below 15. Immunohistochemistry supports expression of the gene in the liver and kidney, as well as in skin tissues, while immunoblotting (western blotting) provides evidence for protein abundance in the liver and tonsils, in addition to in papilloma and glioma cells. In situ hybridization data show ubiquitous expression of the gene in mouse embryos at stage E14.5 and the adult mouse brain at postnatal days 56 (P56). In the spinal column of juvenile mouse (P4), SLC46A3 is relatively highly expressed in the articular facet, neural arch, and anterior and posterior tubercles. The dorsal horn shows considerable expression in the cervical spine of adult mouse (P56).

RNA-binding Proteins
RNA-binding proteins (RBPs) that bind to the 5' or 3' UTR regulate mRNA expression by getting involved in RNA processing and modification, nuclear export, localization, and translation. A list of some of the most highly predicted RBPs in conserved regions of the 5' and 3' UTRs are shown below.

miRNA
Several miRNAs have binding sites in the conserved regions of the 3' UTR of SLC46A3. The following miRNAs can negatively regulate the expression of the mRNA via RNA silencing. Silencing mechanisms include mRNA cleavage and translation repression based on the level of complementarity between the miRNA and mRNA target sequences.

Secondary Structure
The secondary structure of RNA holds both structural and functional significance. Among various secondary structure motifs, the stem-loop structure (hairpin loop) is often conserved across species due to its role in RNA folding, protecting structural stability, and providing recognition sites for RBPs. The 5' UTR region of SLC46A3 has 7 stem-loop structures identified and 3' UTR region a total of 10. The majority of the binding sites of RBPs and miRNAs given above are located at a stem-loop structure, which is also true for the poly(A) signal at the 3' end.

Subcellular Localization
The k-Nearest Neighbor (k-NN) prediction by PSORTII predicts SLC46A3 to be mainly located at the plasma membrane (78.3%) and ER (17.4%), but also possibly at the mitochondrion (4.3%). Immunofluorescent staining of SLC46A3 shows positivity in the plasma membrane, cytoplasm, and actin filaments, although positivity in the latter two is most likely due to the process of the protein being transported by myosin from the ER to the plasma membrane; myosin transports cargo-containing membrane vesicles along actin filaments.

Post-Translational Modification
The SLC46A3 protein contains a signal peptide that facilitates co-translational translocation and is cleaved between Thr20 and Gly21. The resulting mature protein, 441 amino acids of length, is subject to further post-translational modifications (PTMs). The sequence has 3 N-glycosylation sites (Asn38, Asn46, Asn53), which are all located in the non-cytoplasmic region flanked by the signal peptide and the first transmembrane domain. Ridigity of the N-terminal region close to the membrane is increased by O-GalNAc at Thr25. O-GlcNAc at sites Ser227, Thr231, Ser445, and Ser459 are involved in the regulation of signaling pathways. In fact, Ser445 and Ser459 are also subject to phosphorylation, where both sites are associated with casein kinase II (CKII), suggesting a crosstalking network that regulates protein activity. Other highly conserved phosphorylation sites include Thr166, Ser233, Ser253, and Ser454, which are most likely targeted by kinases protein kinase C (PKC), CKII, PKC, and CKI/II, respectively. Conserved glycation sites at epsilon amino groups of lysines are predicted at Lys101, Lys239, and Lys374 with possible disrupting effects on molecular conformation and function of the protein. S-palmitoylation, which help the protein bind more tightly to the membrane by contributing to protein hydrophobicity and membrane association, is predicted at Cys261 and Cys438. S-palmitoylation can also modulate protein-protein interactions of SLC46A3 by changing the affinity of the protein for lipid rafts.

Paralogs
SLC46A1: Also known as the proton-coupled folate transporter, SLC46A3 transports folate and antifolate substrates across cell membranes in a pH-dependent manner.

SLC46A2: Aliases include thymic stromal cotransporter homolog, TSCOT, and Ly110. SLC46A2 is involved in symporter activity and is a transporter of the immune second messenger 2'3'-cGAMP.

Orthologs
SLC46A3 is a highly conserved protein with orthologs as distant as fungi. Closely related orthologs have been found in mammals with sequence similarities above 75% while moderately related orthologs come from species of birds, reptiles, amphibians, and fish with sequence similarities of 50-70%. More distantly related orthologs have sequence similarities below 50% and are invertebrates, placozoa, and fungi. The MFS, MFS_1, and transmembrane domains mostly remain conserved throughout species. A selected list of orthologs obtained through NCBI BLAST is shown in the table below.

Evolutionary History
The SLC46A3 gene first appeared in fungi approximately 1105 million years ago (MYA). It evolves at a relatively moderate speed. A 1% change in the protein sequence requires about 6.2 million years. The SLC46A3 gene evolves about 4 times faster than cytochrome c and 2.5 times slower than fibrinogen alpha chain.

Function
As an MFS protein, SLC46A3 is a membrane transporter, mainly involved in the movement of substrates across the lipid bilayer. The protein works via secondary active transport, where the energy for transport is provided by an electrochemical gradient.

A proposed function of SLC46A3 of rising importance is the direct transport of maytansine-based catabolites from the lysosome to the cytoplasm by binding the macrolide structure of maytansine. Among the different types of antibody-drug conjugates (ADCs), maytansine-based noncleavable linker ADC catabolites, such as lysine-MCC-DM1, are particularly responsive to SLC46A3 activity. The protein functions independent of the cell surface target or cell line, thus is most likely to recognize maytansine or a moiety within the maytansine scaffold. Through transmembrane transport activity, the protein regulates catabolite concentration in the lysosome. In addition, SLC46A3 expression has been identified as a mechanism for resistance to ADCs with noncleavable maytansinoid and pyrrolobenzodiazepine warheads. Although subcellular localization predictions have failed to identify the lysosome as a final destination of the protein, the YXXphi motif identified in the protein sequence has shown to direct lysosomal sorting.

SLC46A3 may be involved in plasma membrane electron transport (PMET), a plasma membrane analog of the mitochondrial electron transport chain (ETC) that oxidizes intracellular NADH and contributes to aerobic energy production by supporting glycolytic ATP production. The 3' UTR region of SLC46A3 includes a binding site for ENOX1, a protein highly involved in PMET. The C-(X)2-C motif in the protein sequence also suggests possible oxidoreductase activity.

Interacting Proteins
SLC46A3 has been found to generally interact with proteins involved in membrane transport, immune response, catalytic activity, or oxidation of substrates. Some of the most definite and clinically important interactions include the following proteins.


 * CD79A: An interaction with CD79A was identified in a yeast-two hybrid (Y2H) screen with a confidence score of 0.632 by the human binary protein interactome (HuRI). Also known as B-cell antigen receptor complex-associated protein alpha chain, CD79A, together with CD79B, forms the B-cell antigen receptor (BCR) by covalently associating with surface immunoglobulin (Ig). The BCR responds to antigens and initiates signal transduction cascades.
 * LGALS3: High-throughput affinity purification-mass spectrometry (AP-MS) identified an interaction between SLC46A3 and LGALS3 with an interaction score of 0.761, classified as high-confidence interacting proteins (HCIPs) by CompPASS-Plus. Also known as galectin-3 (Gal3), LGALS3 participates in various cellular functions including apoptosis, innate immunity, cell adhesion, and T-cell regulation. The protein is involved in antimicrobial activity against bacteria and fungi and has been identified as a negative regulator of mast cell degranulation. LGALS3 is highly upregulated in glioblastoma tissue and brains of Altzheimer's disease patients.
 * NSP2: A high-throughput Y2H screening of the SARS-CoV ORFeome and host proteins isolated a single-hit interaction between NSP2 and SLC46A3 with a LUMIER z-score of -0.5. Short for non-structural protein 2, NSP2 is one of the many non-structural proteins encoded in the orf1ab polyprotein. NSP2 alters the host cell environment rather than contribute directly to viral replication. The protein interacts with prohibitin 1 (PHB1) and PHB2.

Variants
SNPs are a very common type of genetic variation and are silent most of the time. However, certain SNPs in the conserved or functionally important regions of the gene may have adverse effects on gene expression and function. Some of the SNPs with potentially damaging effects identified in the coding sequence of SLC46A3 are shown in the table below. f*The coordinates/positions are for GRCh38.p7.

Cancer/Tumor
The clinical significance of SLC46A3 surrounds the protein's activity as a transporter of maytansine-based ADC catabolites. shRNA screens employing two libraries identified SLC46A3 as the only hit as a mediator of noncleavable maytansine-based ADC-dependent cytotoxicity, with q-values of 1.18×10−9 and 9.01×10−3. Studies show either lost or significantly reduced SLC46A3 expression (-2.79 fold decrease by microarray with p-value 5.80×10−8) in T-DM1 (DM1 payload attached to antibody trastuzumab)-resistant breast cancer cells (KPL-4 TR). In addition, siRNA knockdown in human breast tumor cell line BT-474M1 also results in resistance to T-DM1. Such association between loss of SLC46A3 expression and resistance to ADCs also applies to pyrrolobenzodiazepine warheads, signifying the important role of SLC46A3 in cancer treatment.

CDP, one of SLC46A3's transcription factors, works as a tumor suppressor where CDP deficiency activates phosphoinositide 3-kinase (PI3K) signaling that leads to tumor growth. The loss of heterozygosity and mutations of CDP are also associated with a variety of cancers.

Prostate Cancer
Microarray analysis of SLC46A3 in two different prostate cancer cell lines, LNCaP (androgen-dependent) and DU145 (androgen-independent), show SLC46A3 expression in DU145 to be about 5 times as high as in LNCaP for percentile ranks and 1.5 times as high for transformed counts, demonstrating an association between SLC46A3 and accelerated cell growth of prostate cancer cells. SLC46A3 possibly contributes to the androgen-independent manner of cancer development.

Hepatocellular Carcinoma (HCC)
SLC46A3 was found to be down-regulated in 83.2% of human HCC tissues based on western blot scores and qRT-PCR results on mRNA expression (p < 0.0001). Overexpression of the gene also reduced resistance to sorafenib treatment and improved overall survival rate (p = 0.00085).

Papilloma & Glioma
Western blot analysis supports substantially strong expression of SLC46A3 in papilloma and glioma cells when compared to expression in the liver, one of the organs where the gene is most highly expressed.

Obesity
A genome-wide association study on obesity identified 10 variants in the flanking 5′UTR region of SLC46A3 that were highly associated with diet fat (% energy) (p = 1.36×10−6 - 9.57×10−6). In diet-induced obese (DIO) mice, SLC46A3 shows decreased gene expression following c-Jun N-terminal kinase 1 (JNK1) depletion, suggesting possible roles in insulin resistance as well as glucose/triglyceride homeostasis.

SARS-CoV & SARS-CoV-2
Understanding the interaction between SLC46A3 and NSP2 in addition to the functions of each protein is critical to gaining insight into the pathogenesis of coronaviruses, namely SARS-CoV and SARS-CoV-2. The NSP2 protein domain resides in a region of the coronavirus replicase that is not particularly conserved across coronaviruses, and thus the altering protein sequence leads to significant changes in protein structure, leading to structural and functional variability.