User:Amyqdinh/sandbox

FAM89A
FAM89A (Family with Sequence Similarity 89 Member A) protein is encoded by the human FAM89A gene (accession number NM_940954.1). It has the alias C1orf153 which specifies its location on the human genome; located on chromosome 1 in the open reading frame 153. Expression of FAM89A is found to be highest in placenta and adipose tissue. FAM89A's function has yet to be determined, but its expression has been linked to pathologies such as atherosclerosis and glioma cell expression, the ability to diagnose bacterial infections, and is notable for its response to interleukin exposure.

Gene
FAM89A’s most popular alias name C1orf153 specifies its location on the human genome; located on chromosome 1 in the open reading frame 153. The gene also has less commonly used aliases MCG15887 and RP11-423F24.2 LOC37061.2. FAM89A’s exact location is the minus strand of chromosome 1, map position 1q42.2, starting at 231,018,958 bp and ending at 231,040,254 bp; therefore, the gene is 21,297 base pairs long. In total, FAM89A has two exons and one large intron region and the primary transcript is 1,503 base pairs long. FAM89A does not have any transcript variants.

Neighboring Genes
TRIM67 (Tripartite Motif Containing 67) is located downstream from FAM89A on chromosome 1 while ARV1 (Acyl-coA aceyltransferase-related enzyme 2 required for viability) is located upstream of FAM89A. Both are on the plus strand of chromosome 1.

General Properties
FAM89A protein (NP_940954.1) is 184 amino acids long. Its predicted molecular mass is 18.6kDa and predicted isoelectric point (pI) is 5.64. Two small repetitive sequences were found twice within the protein sequence; GARAA and ASGG. Composition of FAM89A protein is notable for its abundance of four amino acids; Leucine (14.1%), Glycine (12.0%), Alanine (11.4%) and Serine (11.4%). Composition analysis also computed that the value of basic amino acids minus acidic amino acids (KR-ED) is a value of -3, signifying that there are 3 more acids than bases which supports the slightly acidic isoelectric point mentioned previously.

Conserved Domains & Motifs
Five periodic repeats of leucine residue at every seventh amino acid position from base pairs 81 until 115 is characteristic of a leucine zipper structural motif. MotifFinder tool determined the region of 84-122 bp to encode a leucine-rich adapter protein (LURAP) called PF14854. LURAP superfamily of proteins activate the canonical NF-kappa-B pathway,  promotes proinflammatory cytokine production, and promotes the antigen presenting and priming functions of dendritic cells.

Secondary Structure
FAM89A protein is predicted to be composed of an estimate of 40% alpha-helices, 11% extended stands, and 49% random coils. No transmembrane helices or N-terminal signal peptide exist for the protein. The LURAP domain is predicted to form an alpha helix.

Tertiary Structure
The tertiary structure of FAM89A protein is not yet well understood due to the lack of testing with X-ray crystallography. I-TASSER software predicts that the protein has a dimerization of alpha helix monomers which is indicative of the leucine zipper motif.

Interacting Proteins
STRING Interaction Network has experimentally observed interaction between FAM89A and UBXN2B (UBX domain-containing protein 2B). UBXN2B is an adapter protein that is required for Golgi and endoplasmic reticulum biogenesis. It is involved in Golgi and endoplasmic reticulum maintenance during interphase of the cell cycle and in reassembly at the end of mitosis.

Microarray hybridization data shows that FAM89A and Interluekin 13 may be linked to one another. FAM89A's airway epithelial cells were exposed to IL-13 in vitro, and the response was a decrease in expression. FAM89A expression also decreased when CD8+ T lymphocyte was exposed to Interluekin 10.

FAM89A gene expression in response to BRAF inhibition using vemurafenib in the melanoma (skin cancer that occurs in melanocytes) cell line was found to decrease.

Promoter & Transcription Factor Binding Sites
Genomatrix ElDorado genome annotation tool identifies the length of FAM89A's promoter to be 1,104 base pairs long. Various transcription factors were found within including TFIIB (RNA-Polymerase II transcription binding factor IIB), MZF1 (myeloid zinc finger 1 factors), and SPI (GC-Box factors SP1/GC).

Tissue Expression
Tissues within the human body that have the highest levels of FAM89A expression are the placenta and adipose tissue. Moderate levels of expression can be found in the adrenal gland, lungs, and breasts. Microarray hybridization patterns further support high FAM89A expression levels in the placenta but additionally notes moderate expression in the lungs, skin, spleen, spinal cord, pancreas, and retina

Protein Localization
Immunofluorescent staining of the human cell line RH-30 from the Human Protein Atlas (HPA) shows localization of FAM89A to the nucleoplasm, Golgi apparatus, and vesicles of the cell. Reinhardt’s method for cytoplasmic/nuclear discrimination in PSORT II search results predict nuclear localization with a reliability score of 89. Prediction for localization of FAM89A is highest in the nucleus (52.2%) followed by the mitochondria (34.8%), then the cytoskeleton (8.7%), followed by the cytoplasm having the lowest score (4.3%). PredictProtein tool supports the prediction of subcellular localization in the nucleus.

Phosphorylation
FAM89A has possible phosphorylation at 13 serine amino acids in its protein sequences according to NetPhos. These phosphorylations are predicted to occur at position 30, 32, 37, 58, 65, 106, 117, 129, 148, 150, 168, 173, and 175 by the kinases PKA, CDC2, CKI, CKII, PKC, P38MAPK, SRC, EGFR, and DNAPK. GPS program also is a useful tool that identifies possible phosphorylation sites with their cognate protein kinases. GPS results for FAM89A protein produced 1175 hits, thus over-predicting phosphorylation sites. SIB MyHits Motif Scan results include casein kinase II phosphorylation predictions for 6 amino acid sites but gives vague amino acid range predictions that are ranked with a question mark (?), signifying questionable or weak matches.

GalNac O-glycosylation & O-linked β-N-acetylglucosamine
NetOGlyc analyzation searched for mammalian mucin type GalNAc O-glycosylation sites and predicted five positive results at amino acids 2, 39, 154, 168, and 173. Mucins are a group of heavily O-glycosylated proteins that line the GI and respiratory tract to protect them from infection; they serve a protective function as they lubricate these tracts to prevent bacteria from binding. It is also important to note that O-GalNAc modifications may compete with phosphorylation for control of a protein’s activation site.

The tool YinOYang was able to predict five possible O-beta-GlcNAc attachment sites in FAM89A protein at serine amino acids 2 and 172 (+++ confidence; +0.45 potential) and also at 129, 168, and 173 (++ confidence; +0.6 potential).

Glycation
Glycation of epsilon amino groups of lysine were analyzed for in FAM89A protein, and three results were found to predict the attachment of monosaccharides at lysine 57, lysine 82, and lysine 95. These residues are conserved in distant orthologs. Glycation of these lysines is linked to being an important factor in atherosclerosis due to its production of advanced glycation end products (AGEs) which are engulfed by macrophages and taken into the arterial wall.

SUMOylation
SUMP plot analysis program predicts SUMO (Small-Ubiquitin-like Modifier) protein sites at position 83. The residue is conserved in distant orthologs.

Paralogs
FAM89A is known to have two paralogs; FAM89B and TRANK1. FAM89B is located on human chromosome 11 at map position 11q13.1 and has the common aliases, Leucine Repeat Adaptor Protein 25 (LRAP25) and Mammary Tumor Virus Receptor Homolog 1 (MTVR1). TRANK1 (Tetratricopeptide Repeat and Ankyrin Repeat Containing 1) also goes by the alias of LBA1 and is located on human chromosome 8 at map position 3p22.2. FAM89B is more closely related to FAM89A with a 92.31% similarity while TRANK 1 is distantly related with only a similarity of 3.00%. Paralogs of FAM89A were likely to split around.

Orthologs
FAM89A orthologs can be found in mammals, amphibians, reptiles, birds, fish, and various insects. FAM89A is conserved all the way back to cartilaginous fish which diverged from homo sapiens 465 million years ago.

Evolutionary Divergence
From a Date of Divergence vs M (amino acid changes/100 residues) graph, it can be determined that FAM89A’s line of best fit falls closer to the line of Fibrinogen Alpha Chain, a rapidly evolving gene, rather than Cytochrome C, a slowly evolving gene. The slope of FAM89A's line in the graph is almost identical to that of Fibrinogen Alpha Chain’s. These results bring forward the assumption that FAM89A is more likely diverging along with Fibrinogen Alpha Chain than Cytochrome C and therefore is diverging at a rapid rate of mutation.

Pathology & Disease Association
Research studies that investigate FAM89A are limited due to lack of knowledge in FAM89A protein’s function(s), but current understanding is that FAM89A could possibly be linked to atherosclerosis, methylation sites that causes gliomas , and the ability to diagnose bacterial infections. By filling this gap in knowledge, a deeper understanding of why gene expression causes disabilities and disorders can be achieved, and possible application of this knowledge can advance studies in various fields of science and health.

In 2014, a study was published on the possible linkage of atherosclerosis caused by smoking with particular gene variants specific to the Hispanic population. FAM89A was identified to be a nearby gene to an SNP that revealed an interaction with smoking on carotid plaque area in a discovery sample. Of the 11 SNP's (single nucleotide polymorphisms) identified to cause atherosclerosis, 1 of them is located within the FAM89A gene; SNP (rs6700792). The authors conclude that more studies are needed to clarity of the role of the protein since there is no information regarding functionality of the FAM89A gene in humans.

A 2019 study concerning FAM89A was directed on genes that possess methylation sites that relate to causing gliomas. The researchers found that abnormal expression of FAM89A correlated with glioma gene expression profiling studies.

Another study involving FAM89A was published in 2019 regarding FAM89A and the gene IFI44L working in partnership to assist in differentiating viral and bacterial infections in febrile children. The researchers found that while ILFI44L gene has elevated expression in viral febrile children, FAM89A gene expression was elevated in febrile children with bacterial infections.