User:LarsonGCD/sandbox

Family with sequence similarity 149, member A is a protein that in humans is encoded by the FAM149A gene (also known as MSTP119, MST119 and DKFZP564J102). It is well conserved in primates, dog, cow, mouse, rat, and chicken. It has one paralog, FAM149B.

Overview
FAM149A is found in normal cardiac tissue of Homo sapiens and has been submitted to the Molecular Medicine Center for Cardiovascular Disease in 1999. Thus, this indicates it must play an important role in normal heart regulation. However, no variation report or information of clinical significance has been found for this gene, according to NCBI. According to the Basic Local Alignment Search Tool (BLAST), FAM149A is similar to cDNA FLJ32604 (98% query cover), which is found in stomach tissue and has no known function. FAM149A is also similar to cDNA FLJ58677 (86% query cover), which is found in fetal kidney tissue with no known function.

Information acquired from:

http://www.ncbi.nlm.nih.gov/

Gene
FAM149A consists of 2721 base pairs and 482 amino acids and is located on chromosome 4q35.1. It runs on the positive strand of chromosome 4. Other genes are also found nearby on the same chromosome, including TLR3, CYP4V2, FLJ38576, ORAOV1P1, and SORBS2.



Paralogs & Orthologs
FAM149A possess one major paralog, FAM149B. Not much is currently known about FAM149B besides its membership in the overall FAM149 family of genes. Aways.

Orthologs of FAM149A include BRTD and its four isoforms, ECCHC11 and ALMS1. These genes are all found in Humans and have conserved areas with FAM149A.

Conserved Domain
FAM149A has a conserved Domain of Unknown Function (DUF) 3719. The DUF 3719 has very little information. It is only found in eukaryotic organisms and is made of 70 amino acids. There is a conserved HLR sequence motif found in DUF 3719. Below is an image showing the DUF3719 on FAM149A.



From the Sanger Institute, the following image shows the species in which this family exists in. The purple color indicates that DUF3719 is only existent in eukaryotic organisms. Colors, such as green, would indicate that DUF3719 exists in bacteria. When this diagram is used interactively on the website, it states that 23 species in Eukaryota have the domain.

Phylogeny


FAM149A diverged from amphibians around 400 million years ago, Birds 300 million years ago and mammals, not including primates, 94 million years ago. Divergence from Primates last occurred around 5 million years ago.

Primary Sequence
As previously stated, FAM149A is made up of 482 amino acids. The amino acids which play a part in the translation of the FAM149A gene into the FAM149A protein are shown below, along with matching base pairs. The protein is located between bp 534 and bp 1982.



Post-Translational Modifications
There are some programs used to determine post-translational modifications in FAM149A. The tests and results for each are listed below.

NetPhos: This will provide predicted phosphorylation sites within your protein, occurring on serines, tyrosines, and threonines. Scores are provided that indicate the quality of the predicted site. A “good” score is closer to 1.0, while a low score is closer to zero. Results: Phosphorylation sites predicted:	Ser: 20 Thr: 16 	Tyr: 2 All of these predicted sites had scores above 0.514, most between 0.8-0.9. Image generated:

Sulfinator: This is used to predict tyrosine sulfation sites made as proteins go through secretory pathway. There were no results for FAM149A. Therefore, there aren’t any tyrosine sulfation sites.

NetAcet: Predicts N-terminal acetylation sites.

Here are the results:



According to NetAcet, there are no N-terminal acetylation sites for FAM149A.

SUMOplot/SUMOsp: Used to predict potential sumoylation sites. These may explain larger molecular weights than expected on SDS gels due to attachment of SUMO proteins.

The results can be seen below:



Secondary Structure
The secondary structure of the FAM149A protein is based on a local three dimensional structure. The structures analyzed include the α-helix, β-strand, β-turn, and random coil. Results were obtained using GOR4 and PELE from Biology WorkBench. GOR4 is a simplified version, and PELE compares predicted structures from other programs.

Promoter
Here is the promoter for the FAM149A gene provided by ElDorado and the sequence extracted from the information.

The following is a FASTA formatted version of the FAM149A promoter.



Conservation of Gene Structure Across Species
Through the NCBI website, an additional 1000 basepairs were added to the selected region on chromosome 4 containing FAM149A. Once the start and end positions were established, the positions were transferred to the ECR Browser to create an alignment across other species.

According to the results, there are 14 exons within FAM149A, which are conserved in the monkey, dog, mouse, and opossum. The chicken, frog, and fish show little to no conservation. Within the first 1000 base pairs prior to the start of the transcription, there appears to be no notable conservation across species. Only the dog contains what is considered as an Evolutionary Conserved Region (ECR).

Expression
Based on the graphs on the right, the highest levels of expression occur in the trigeminal ganglion, superior cervical ganglion, atrioventricular node (heart), and kidney. However, at least a small amount seems to be expressed in almost all tissues in the human body. Using the same micro arrays provided by Bio GPS, expression of FAM149A was found to vary through the shedding of the endometrium during menstruation. This opens a new avenue for possible exploration of the function of the gene.

A search was performed on the Allen Brain Atlas using FAM149A. According to the levels of expression provided by the Atlas, FAM149A is not expressed in notable levels within the mouse brain. However, with visual observation of the figure, FAM149A could be found in the ventral posterior complex of the thalamus. This can be seen as the dark vertical line in the center of the saggital brain slice in the image below. As a comparison, the expression of the protein, actin, is used to demonstrate what a mouse brain appears like with high levels of expression.

EST Profile
The data from the figure below indicates that FAM149A is highly expressed in the brain, nerves, pancreas, adrenal gland, and kidney. Interestingly, there is no expression in the heart. From the information in the second table, common complications involving FAM149A expression include adrenal tumors, pancreatic tumors, colorectal tumors, and ovarian tumors.

Transcription Variants
FAM149A has two transcription variants, transcript variant 1 and transcript variant 2. Both code for the same FAM149A protein. Differences include additional base pairs in the 5' untranslated area as well as the 3' untranslated region. One of two differences in the actual translated area of the protein is a G instead of an A at bp 1590 in Variant 1 and bp 1337 in Variant 2. The other difference consists of a C instead of an A at bp 2214 in TV1 and bp 1961 in TV2.

Composition
As stated above, FAM149A is made up of 482 amino acids. The most common amino acid is Serine which makes up 9.8% of the gene. The least common amino acids are Tryptophan and Cysteine which each make up only 1.2% of the gene. The only recurring combination of amino acids in the protein is SLAS which occurs from amino acids 234-237 and from 324-327. In addition, the Isoelectric Point of FAM149A is 9.891999

Transcription factor binding sites
The following is an analysis of the promoter region for FAM149A. It shows a number of transcription factor binding sites that may have strong contribution to regulating the genetic expression. The image below shows the locations of the binding sites. The binding sites were analyzed to find any possible unique functions.



There were many results, but the ones with the highest similarity and highest abundance were chosen, as they are most likely to be present on the actual gene. Matrix families of interest include the Huntington’s disease gene regulatory region, nerve growth factor, nuclear respiratory factor, pleomorphic adenoma gene, zinc finger transcription factors, and an E2F-myc activator/cell cycle regulator. Many of them had interactions revolving the zinc finger complex, which suggests this may be important for FAM149A.

Protein Interactions
FAM149A has potential interactions with ZNF385D, C10orf10, PNMAL1, CPN2, C10orf72, VPS13D, and RBMS3. Based on previous research on binding sites, many were frequently involved with zinc finger proteins. According to the results from STRING, the second strongest associating protein is zinc finger protein 385D. However, we cannot conclude these are the only interacting proteins, as it seems there is little to not research involving FAM149A interactions. The Molecular Interaction Database (MINT) was used as an additional source for protein interactions. However, FAM149A was not in the database. Based on the list of functional partners by STRING, the top 5 are also not in the MINT database. Another interaction database, I2D Protein-Protein Interaction showed possible interaction with the Protein PRKAG1, however interaction was weak.

Below is the list of proteins that potentially interact with FAM149A.

Disease Association
While not conclusivly linked, FAM149A has been found to be one of 15 candidate genes for the contribution of development of cancer and dysplastic lesions. The same paper also noted the down regulation of the gene during oral cancer, providing a possible route of study.