User:RW9939/sandbox

=CFAP97D2= Cilia- and flagella-associated protein 97 domain-containing 2 (CFAP97D2) also known as KIAA1430 (previous name) is a protein encoded by the CFAP97D2 gene. It plays a vital role in the nucleus.

Gene
CFAP97D2 gene is 50,003bp long. It is located at 114173082bp to 114223032bp on human chromosome 13 and contains 8 exons. It belongs to the cilia-and flagella-associated 97 (CFAP97) gene family, which has three genes: CFAP97, CFAP97D1, and CFAP97D2.

Transcripts
There are three known transcript variants: transcript variant X1, transcript variant 1, and transcript variant 2.

Transcript expression
CFAP97D2 transcripts have a low abundance and are found to be expressed in a wide array of organs enriched with connective tissues, such as the brain, testes, ovary, fallopian tube, white blood cells, bone marrows.

In immune cells, CFAP97D2 is specifically enriched in naive CD8 T cells. This specificity suggests its critical role in the immune system contributing to the complex network of cells.

Proteins
There are three known CFAP97D2 protein isoforms: isoform X1, isoform 1, and isoform 2.

The longest protein isoform X1 consists of 166 amino acids, with a molecular weight of 19kDa and an isoelectric point of 10.4. There is a relatively high content of lysine, phenylalanine, and leucine in CFAP97D2, resulting in a relatively high total charge.

Domains and motifs
CFAP97D2 has a domain named “KIAA1430” that spans residues 27 to 111. This domain is highly conserved across vertebrates, invertebrates, and fungi, and it is also considered to be specifically related to motile cilia.

An analysis through PSORT II found a conserved mitochondrial processing peptidase cleavage site is found in CFAP97D2 at residue 16. And a conserved nuclear localization signal is found at residue 27. This indicates the vital role of CFAP97D2 in the nucleus. Additionally, human CFAP97D2 is predicted to have a unique leucine-zipper-pattern of 63 DNA binding motif at residue 37, which can help distinguish nuclear proteins.

Structure
The secondary structure of CFAP97D2 is highly conserved among species, and most are composed of alpha-helices with beta-sheets and coiled connected. The KIAA1430 domain consists of two helices and connected coils. There is a coiled coil conserved in human CFAP97D2 from residues 48 to 109.

There is no disulfide bond and transmembrane domains found in CFAP97D2.

Subcellular localization
CFAP97D2 has a high likelihood to localize in mitochondria and nucleus, and to be soluble in the cytoplasm.

Gene level regulation
Two promoters are found in the CFAP97D2 gene via Genomatix. Promoter A (GXP_6735018) encodes for transcript X1 while promoter B (GXP_7530125) encodes for transcripts 1 and 2. Several different transcription factors within promoter A regulate the expression of the CFAP97D2 gene (function found on Genomatix promoter annotation).

Post-translational modifications
There are sixteen residues along CFAP97D2 protein that are likely phosphorylation modification sites, and eleven internal lysines are likely acetylation modification sites. These two types of sites are highly conserved in the KIAA1430 domain across species. There are also three SUMOylation consensus sites overlapping with acetylation sites. Each of these post-translational modifications is expected to have an effect on the protein. The phosphorylation sites can reduce the isoelectric point of CFAP97D2 to 9.77 if all conserved sites were modified. Acetylation of internal lysines may influence the intermolecular interactions eventual degradation of CFAP97D2. SUMOylation sites are residues that SUMO (small ubiquitin-like modifier) proteins can attach to CFAP97D2 and affect its nuclear-cytosolic transport and transcriptional regulation.

Homology and evolution
CFAP97D2 is determined to be a homolog of uncharacterized protein C17orf105, and has orthologs widely among invertebrates, vertebrates, and fungi. So far, no existing paralog is detected. (*Only partial sequence is shown on NCBI, BLAT is used to make up for the full sequence.) CFAP97D2 is inferred to first appear in a fungus, Chytridiales (Chytriomyces confervae), at around 1105 MYA. It was highly likely to split from the LOC106699411 gene in the little brown bat (Myotis lucifugus) at around 100 MYA.

The most distantly related species detected so far with CFAP97DF2 ortholog is a fungus called Bsal (Batrachochytrium salamandrivorans) with no isoform. The evolutionary rate of CFAP97D2 is determined to be lower compared to the Fibrinogen alpha gene and slightly faster compared to the Cytochrome c gene. This suggests that the CFAP97D2 gene may have a relatively low mutation rate in fungi and other species.

Interacting Proteins
There was only one known interacting protein: E3 ubiquitin-protein ligase TRIP12 isoform X1. The gene fusion that occurs between trip12 and CFAP97D2 joined them together, allowing these two genes to be transcribed and translated as a single unit. The activity of E3 ubiquitin-protein ligase is strictly regulated by post-translational modifications including phosphorylation and SUMOylation, which may also affect these modification sites on CFAP97D2 protein.

Pathological Significance
It was indicated the expression of CFAP97D2 is affected by Parkinson’s disease. It shows that both protein levels and RNA levels of CFAP97D2 and other tubulin genes undergo a decline, supporting that CFAP97D2 may be highly related to tubulin in human tissue compositions.

Another protein in the CFAP97 gene family, CFAP97D1, functions in both structural components of mammalian sperm flagella and sufficient motility of sperm and affects subsequent fertilization. As CFAP97D1 and CFAP97D2 may have certain interactions with each other, CFAP97D2 is initially expected to be related to sperm motility in mammals.

One literaturely significant variation is found in the 5’-untranslational region, which is a variation in CDC16 gene, cell division cycle 16 homolog in S. cerevisiae. It is a component of the anaphase promoting complex/cyclosome (APC/C), a cell cycle regulated E3 ubiquitin ligase that controls progression through mitosis and the G1 phase of the cell cycle. The variation in CDC16 was found that have a negative effect dependent on DNA methylation on gene’s expression level.

Single nucleotide polymorphism (SNPs) found in the coding region mostly led to missense. So far, no significant effect was found to connect with human phenotype or conditions.