FAM166C

Family with Sequence Similarity 166, member C (FAM166C), is a protein encoded by the FAM166C gene. The protein FAM166C (aliases c2orf70, LOC339778) is localized in the nucleus. It has a calculated molecular weight of 23.29 kDa. It also contains DUF2475, a protein of unknown function from amino acid 19–85. The FAM166C protein is nominally expressed in the testis, stomach, and thyroid.

Gene
The FAM166C gene, also known as C2orf70, is located on the positive-sense strand of locus 2p23.3. It has 9 exons, however due to overlap only 4 are distinguishable in the human genome. FAM166C spans from 26,562,565 to 26,581,166 for a total length of 18.6 kpb.

Gene neighborhood
The gene neighborhood for FAM166C consists of DRC1, LOC112840921, OTOF, CIB4 and LOC122756675. LOC112840921 and LOC122756675 are both predicted transcriptional regulatory regions. DRC1 (dynein regulatory complex subunit 1) encodes a central component of the nexin-dynein complex, a regulator of ciliary diene. Mutations in this gene can lead to ciliary dyskinesia. OTOF encodes the protein otoferlin which has been suggested to be involved in vesicle membrane fusion. Mutations can lead to neurosensory nonsyndromic recessive deafness, DFNB9. CIB4 (Homo sapiens calcium and integrin binding family member 4) encodes the CIB4 protein which regulates integrin alphaIIb subunit activation.

Transcripts
FAM166C has 2 different transcript variants. The most abundant variant is FAM166C transcript variant 1, which is 718 nucleotides in length.

Protein
The FAM166C protein is 201 amino acids in length with a predicted molecular weight of 23 kDA and an isoelectric point of 10. It has higher than normal levels of tyrosine and proline and lower than normal levels of isoleucine.

Domains and structure
The FAM166C protein has one domain of unknown function called DUF2475 from amino acids 19–85. FAM166C isoform 1 secondary structure appears to be primarily alpha helical in nature with only short segments predicted to be beta sheets. Tertiary structure predictions shows 5 distinct alpha helices with high confidence.

Isoforms
FAM166C has 2 different splice isoforms. The most abundant isoform is FAM166C protein isoform 1 which is 201 amino acids in length.

Promoter
FAM166C has 3 possible promoters that produce complete protein isoforms, however Isoform 1 is only encoded by GXP_1493451. Isoform 2 is also encoded by GXP_1493451.

Transcription Factor Binding Sites
GXP_1493451 contains over 250 transcription factor binding sites. The most conserved and likely to bind include a forkhead box protein factor (V$FOXP2.01), a collagen krox domain factor (V$CKROX.01) and an E2F transcription factor(V$E2F3.01).

Expression pattern
FAM166C has overall low levels of expression compared to other proteins but within the tissues it is expressed in, it appears most prominently in the testes, stomach and thyroid. Within the cell, FAM166C is localized to the nucleus and contains 2 nuclear localization signals. Protein antibody staining is highly indicative of nuclear membrane localization specifically.

Transcript level regulation
The 5' UTR of FAM166C transcript variant 1 is 29 bp in length. Analysis of potential 3d structures identifies one hairpin structure, however, the 5' UTR differs heavily among orthologs indicating this is unlikely to be an important region for transcriptional regulation.

The 3' UTR is 89 bp in length and contains one polyadenylation signal at 699 bp. It is conserved among human transcript variants, but only small segments are well conserved among orthologs. It contains 2 predicted mi-RNA binding sites in areas of moderate conservation at 631 bp (has-miR-3184-3p) and at 641 bp ( has-miR-4539, has-miR-12113). 3D predictions identify two stem loop structures.

Protein level regulation
FAM166C is predicted to have 7 phosphorylation sites, 2 acetylation sites and one O-GlcNAc site, which are well conserved among orthologs. The above image is a conceptual translation of FAM166C transcript variant 1/ protein isoform 1. Phosphorylation sites are highlighted in green, N-linked acetylation sites are highlighted in indigo, internal acetylation sites are highlighted in pink, O-ß-GlcNAc sites are highlighted in yellow, nuclear localization signals are highlighted in light blue and the poly A signal is highlighted in red. The start and stop of transcription are marked with colored green and red text respectively. DUF275 is marked with brackets and amino acids conserved among all known orthologs are bolded.

Paralogs
The human FAM166C gene has two paralogs called FAM166A and FAM166B. They are located at 9q34.3 and 9p13.3 respectively. The function of both proteins is not currently well understood.

Orthologs
FAM166C has orthologs in species as distant as insects. Mammalian orthologs are moderately similar to human FAM166C, with percent identity greater than 70%. Orthologs in reptiles, birds and amphibians range from 65% to 40%. In fish and invertebrates, identity ranges from 40% to 20%. No orthologs were found in fungi, bacteria or plants.

Evolution
The FAM166C gene appears most distantly in insects which diverged from humans approximately 797 million years ago. Orthologs of FAM166A and FAM166B also occur in insects. FAM166C evolves at a moderate rate; a 1% change in amino acid sequence required around 10 million years. Based on sequence similarity of orthologs, FAM166C evolves at a rate in the middle of cytochrome c and fibrinogen alpha.

Colorectal cancer
Several studies have evaluated FAM166C as a potential target for colorectal cancer treatment. In one study, researchers evaluated FAM166C for drug treatment viability for G12A colorectal cancer. FAM166C was one of 11 genes that had a significantly different twofold change between KRAS G12 (mutated oncogene suppressor) colorectal cancer patients and wild type colorectal cancer patients. Another study identified FAM166C as one of four potential targets for CVB-D, an autophagy cell death inducer of colorectal cancer cells, based on its over-expression in colon adenocarcinoma.

Mutations (SNPs of interest)
Using GWAS, a FAM166C SNP was identified as being correlated with high levels of bacterial colonization, a trait that may be associated with periodontitis.

Using whole exome sequencing and the human reference genome as a comparison, a novel FAM166C SNP was identified as the only gene mutation having a polyphen score of 0.954 indicating it was likely deleterious and may be involved in one of the patient's bilateral cleft lip and palate.