Protein FAM46B

Protein FAM46B also known as family with sequence similarity 46 member B is a protein that in humans is encoded by the FAM46B gene. FAM46B contains one protein domain of unknown function, DUF1693. Yeast two-hybrid screening has identified three proteins that physically interact with FAM46B. These are ATX1, PEPP2 (encoded by RHOXF2) and DAZAP2.

Overview
FAM46B is the most common name used for the gene encoding FAM46B. The aliases MGC16491 and RP11-344H11 have also been used to describe the same gene. FAM46B a 7,283 base pair gene located on the antisense strand of DNA on the short arm of chromosome 1 at the specific locus 1p36.11. Because it is on the antisense strand, the direction FAM46B is transcribed in is opposite to the standard numbering of nucleotides along the chromosome. FAM46B starts at base 27,339,333 and ends at 27,331,522.

The El Dorado program through Genomatix predicts the promoter region to be 1028 bases long, spanning bases 27,339,962 to 27,338, 935.

Exon structure and splice variants


The FAM46B gene contains two exons, both of which are found in FAM46B protein. There is one main protein isoform indicating no alternative splicing of FAM46B mRNA.

Paralogs
FAM46B has three paralogs in Homo sapiens: FAM46A, FAM46C, and FAM46D. Multiple sequence alignments of the four members of the FAM46 show high levels of conservation particularly toward the C-terminus. Amino acids conserved in all four paralogs indicate residues which make up the core of the FAM46 family.





Orthologs
FAM46B is present in the common ancestor to animals and is only found in eukaryotes. Although strict orthologs of FAM46B are only found in a relatively small range of animals such as insects and vertebrates, orthologs of FAM46 paralogs have been identified in a broader range of species. Within vertebrates, FAM46B is highly conserved in fish, amphibians and mammals. Common model organisms that FAM46B has been identified in are Danio rerio, Xenopus tropicalis, and Mus musculus. A strict ortholog of FAM46B is not found in reptiles or birds; however both the FAM46A and the FAM46C paralogs are found in the Anolis carolinensis, and the FAM46C paralog is found in birds such as Gallus gallus.

Distant homologs
Distant homologs of FAM46B are present in Drosophila and nematodes such as Caenorhabditis elegans. There are no orthologs of FAM46B in plants, protists, or fungi.

Phylogeny


The phylogenetic tree of FAM46B mirrors a standard phylogenetic tree. As should be expected, the mammals are grouped together with the primates clustered most tightly. The more distant homologs such as Drosophila and Caenorhabditis are on the left, representing greater divergence between the gene sequences.

Protein
The function of FAM46B has not yet been determined. The information below is based on bioinformatic analyses and predictions.

Properties/characteristics
The human form of FAM46B contains 425 amino acid residues, has an isoelectric point of 8.093, and a molecular mass of 46,888 daltons. FAM46B is a soluble protein predicted to be located in the cytosol.

Domains and motifs
FAM46B contains only one identified domain: Domain of Unknown Function 1693 (DUF1693). DUF1693 has been identified as part of the nucleotidyltransferase superfamily and contains four nematode prion-like proteins, but the exact function remains unknown. A SAPS protein analysis does not predict any unusual protein characteristics based on amino acid composition, internal repeats, charge clusters, or periodicities.

Post-translational modifications


FAM46B is not predicted to contain a signal peptide cleavage site, Glycophosphatidylinositol (GPI) anchors, or transmembrane regions. The absence of a signal peptide supports the prediction that FAM46B is located in the cytosol.

Tools at ExPASy were used to predict phosphorylation sites, O-linked glycosylation sites, and N-linked glycosylation sites. Although two sites in FAM46B are predicted as potential sites of N-linked glycosylation, FAM46B lacks a signal peptide and thus, does not enter the lumen of the endoplasmic reticulum where N-linked glycosylation occurs. Five sites were identified as possible O-linked glycosylation sites. These are marked in the Conceptual Translation section below.

The most common post-translational modification predicted in FAM46B is phosphorylation. The program, NetPhos 2.0 predicts 23 phosphorylation sites. The majority of predicted phosphorylation are predicted on serine residues (14), but there are 6 predicted on threonine and 3 on tyrosines. These tend to be clustered together within the protein sequence. A comparison of predicted phosphorylation sites in human, mouse and zebrafish shows that all three species have approximately the same number and distribution of phosphorylation sites (on serines vs. threonines vs. tryrosines).

Secondary structure
The exact structure of FAM46B has not been characterized. Predictive programs available though Biology Workbench such GOR4, PELE, CHOFAS were used to predict secondary structure. The results obtained through programs at Biology Workbench were compared to the results obtained using Phyre2. Since these programs are predictive and rely on different algorithms, each provides slightly different output. Consensus between programs suggests that FAM46B contains mainly alpha helix and random coils. Although present, FAM46B appears to contain only a few small sections pre predicted to form beta sheets. Annotated results of both PELE and PHYRE2 secondary structure predictions are outlined in the figure below.

Expression


Expression can be assessed in a variety of ways. Both expressed sequence tags and GEO profiles show the number of transcripts of a gene present in a certain tissue type and relative to the total gene transcripts. Microarrays are also useful in quantifying gene expression. Protein in-situ hybridization is a more accurate measure of expression than mRNA or cDNA based methods, as probes can be fused directly to the protein.



According to some available microarray data, FAM46B is highly expressed in the tongue (levels 10x above mean gene expression for the tissue). Outside of the tongue, FAM46B seems to be uniformly expressed across most tissues. In addition to gene expression in healthy tissues, EST data also highlights gene expression by health state. It appears FAM46B expression is elevated in cases of skin cancer and gliomas.

Transcription factors that bind to regulatory sequences
The El Dorado program through Genomatix was used to predict this list of transcription factors that are likely to bind to the promoter region of FAM46B. Numerous E2F sites are predicted, in addition to numerous Zinc Finger transcription factor sites, several E-box binding factors and TWIST homologs. The binding sites are not evenly distributed within promoter region. The largest clustering of binding sites was located around base 177 of the promoter, which is about 600 base pairs upstream from the start of transcription for FAM46B. The image below shows selected transcription factor binding sites for the top twenty matches identified by El Dorado that are on the antisense strand.



Confirmed protein-protein interactions and possible clinical significance
Yeast two hybrid screening indicates FAM46B physically interacts with the ataxin-1 protein, which is encoded by ATXN1. The exact function of ATXN1 is not known, but it is thought to be involved in regulating aspects of protein production, particularly transcription. Since FAM46B physically interacts with ATXN1, it is possible that FAM46B also plays a role in the regulation of protein production and regulation of transcription.

A second protein shown to physically interact with FAM46B is DAZAP2, is a proline-rich brain expressed protein. In combination with the information about ATXN1 above, it appears that FAM46B interacts with brain-specific proteins. A third protein identified by yeast two-hybrid screening as a physical interactant of FAM46B is PEPP2, a paired-like homeobox protein. If this interaction is significant, the interaction between FAM46B and PEPP2 may play a role in development and morphogenesis.

However, the protein interactome is not yet well understood. Not every program identified interacting proteins in the same ways. As an example, STRING identified ATXN-1 as a strong interaction partner with FAM46B, but did not identify PEPP2 nor DAZAP2. The prediction network from STRING is shown in the adjacent image.