WD Repeat and Coiled Coil Containing Protein

WD Repeat and Coiled-coiled containing protein (WDCP) is a protein which in humans is encoded by the WDCP gene. The function of the protein is not completely understood, but WDCP has been identified in a fusion protein with anaplastic lymphoma kinase found in colorectal cancer. WDCP has also been identified in the MRN complex, which processes double-stranded breaks in DNA.

Gene
WDCP is located in chromosome 2, specifically locus 2p23.3 on the minus strand, in humans. The total gene is 20,235 bp long, from 24,029,340 – 24,047,575. WDCP is located in between the MFSD2B and FKBP1B genes. The total gene contains 4 exons, the details of which can be seen in the table below.

Table 1. Exons of WDCP and their various lengths.

Common aliases of the gene include chromosome 2, open reading frame 44 (c2orf44), MMAP, and PP384.

mRNA
The WDCP isoform 1 is encoded by mRNA-WD repeat and coiled-coil containing, transcript variant 1. The total RNA transcript is 18,045 bp long and is transcribed from the WDCP gene from nucleotides 24,029,347 - 24,047,391. The coding DNA sequence is 3848 nucleotides long. The 5’ UTR contains 7,897 nucleotides, and the 3’ UTR contains 1,597 nucleotides.

There are two known transcript variants of WDCP: WDCP transcript variant 2 and WDCP transcript variant X1. Information about the two transcripts can be seen below.

Table 2. Transcript Variants of WDCP with their alternative splicing pattern in comparison to WDCP transcript variant 1.

Primary sequence
WDCP protein isoform 1 is 721 amino acids in length. Its molecular weight is 79 kDa and the theoretical isoelectric point is 6.2. The protein sequence for WDCP Protein Isoform 1 is shown below.

1 MELGKGKLLR TGLNALHQAV HPIHGLAWTD GNQVVLTDLR LHSGEVKFGD SKVIGQFECV 61 CGLSWAPPVA DDTPVLLAVQ HEKHVTVWQL CPSPMESSKW LTSQTCEIRG SLPILPQGCV 121 WHPKCAILTV LTAQDVSIFP NVHSDDSQVK ADINTQGRIH CACWTQDGLR LVVAVGSSLH 181 SYIWDSAQKT LHRCSSCLVF DVDSHVCSIT ATVDSQVAIA TELPLDKICG LNASETFNIP 241 PNSKDMTPYA LPVIGEVRSM DKEATDSETN SEVSVSSSYL EPLDLTHIHF NQHKSEGNSL 301 ICLRKKDYLT GTGQDSSHLV LVTFKKAVTM TRKVTIPGIL VPDLIAFNLK AHVVAVASNT 361 CNIILIYSVI PSSVPNIQQI RLENTERPKG ICFLTDQLLL ILVGKQKLTD TTFLPSSKSD 421 QYAISLIVRE IMLEEEPSIT SGESQTTYST FSAPLNKANR KKLIESLSPD FCHQNKGLLL 481 TVNTSSQNGR PGRTLIKEIQ SPLSSICDGS IALDAEPVTQ PASLPRHSST PDHTSTLEPP 541 RLPQRKNLQS EKETYQLSKE VEILSRNLVE MQRCLSELTN RLHNGKKSSS VYPLSQDLPY 601 VHIIYQKPYY LGPVVEKRAV LLCDGKLRLS TVQQTFGLSL IEMLHDSHWI LLSADSEGFI 661 PLTFTATQEI IIRDGSLSRS DVFRDSFSHS PGAVSSLKVF TGLAAPSLDT TGCCNHVDGM 721 A Figure 1. Protein sequence of WDCP protein isoform 1.

Compositional analysis of WDCP Isoform 1 shows no extremely high or low levels of particular amino acids. The protein contains no positive, negative, or mixed charged clusters.

There are two isoforms of WDCP, as seen in the table below. Table 3. Table of WDCP protein Isoforms and Protein Information.

Secondary structure
The secondary structure of WDCP Protein Isoform 1 consists of 47 random coils (429 residues, 59.5%), 19 alpha-helices (160 residues, 22.19%), and 31 extended strands (132 residues, 18.31%).

Tertiary and quaternary structure
There are two predicted disulfide bonds in WDCP, one between cysteine residues 574 and 623, and the other between cysteine residues 713 and 714.

Domains and motifs
WDCP protein domains include two tryptophan-aspartic acid repeat sites, multiple phosphorylation sites, and a domain that interacts with the hemopoietic cell kinase.

Tissue expression
Across various tissue types, WDCP shows increased mRNA expression in white blood cells (3.0 RPKM), thymus (3.6 RPKM), lymph nodes, bone marrow, and testes. WDCP exhibits increased protein expression in endocrine tissues, and well as the kidney and urinary bladder. Across multiple tissue lines in the GTEx database, WDCP expression seemed to be highest in Epstein-Barr Virus transformed lymphocytes and lowest in the pancreas. NCBI GEO Records reveal that overall WDCP expression is in the 65-70th percentile according to the Universal Human Reference RNA.

In fetal tissue, WDCP mRNA expression is highest in the lung at 17 weeks at 3.75 RPKM, the heart at 10 weeks at 3.5 RPKM, and in the intestine at 11 weeks 3.0 RPKM. At 17 weeks, WDCP expression in the intestine drops down from 3.0 RPKM to 0.75 RPKM. The fetal kidney at 20 weeks exhibits the lowest WDCP expression, at 0.5 RPKM.

Epigenetic
WDCP does not have any CpG islands associated with its promoter. WDCP has relatively low levels of H3K27ac, but higher levels of H3K4me1 and H3K4me3 across various cell types, including HeLa, HUVEC, and leukemia cell lines.

Transcriptional
The GeneHancer promoter for WDCP is listed as GH02J024045. The transcription factor binding sites associated with this promoter and confirmed with a ChIP signal include HNF4A, CEBPB, ERG1, FOS1, ETS1, and E2F6. The binding sites for FOS, EGR1, and ETS1 are located in a DNase hypersensitive site.

Post-transcriptional
There are two transcript variants of WDCP detailed in the table in the mRNA section.

Translational and mRNA stability
The mRNA secondary structures of the UTR regions exhibited a high number of predicted stem-loop structures in the WDCP transcript. The 5' UTR region closest to the start codon contained about 22 predicted loops. Stem loops in the 5' UTR near the start codon could indicate lower levels of expression. There are 108 predicted loops in the 3' UTR region. There are no known miRNA targets in the 3' UTR.

Post-translational modifications
WDCP Isoform 1 contains the following post-translational modifications:

Glycation is the addition of a sugar molecule to an amino acid and is associated with pathologies including renal failure and diabetes. Glycation is predicted to occur at lysine residues: 5, 7, 83, 189, 244, 262, 294, 325, 389, 405, 407, 461, 552, and 617.

Acetylation is the addition of an acetyl group at the starting methionine residue. This is usually associated with metabolic-relating pathways. WDCP has one confirmed acetylation site at the starting methionine residue.

Phosphorylation is the addition of a phosphate group to amino acids. It is mainly associated with cellular signaling pathways and can instigate tumor development. Serine, Threonine, and Tyrosine phosphorylation sites were identified in 27 residues at a NetPhos threshold of 0.9. Phosphorylation was detected at:
 * Serine residues 43, 97, 147, 181, 267, 274, 278, 416, 417, 441, 504, 505, 528, 529, 535, 550, 590, 630, 676, 678, 686, 688, and 690.
 * Threonine residues 73 and 385.
 * Tyrosine residues 422 and 555.

Possible kinases that interact with WDCP include Casein kinase 1, Casein kinase 2, cAMP, cGMP, P38MAPK, DNAPK, Protein kinase A, and Protein kinase C.

SUMOylation is the addition of a small ubiquitin-like modifier to lysine residues in proteins. SUMOYlation sites in WDCP include lysine residues 47, 152, 298, 310, 709, with lysine residues 47 and 152 having the highest probability of SUMOylation. SUMOylation can affect protein-protein interactions and affect protein ubiquitination.

Palmitoylation is the addition of a fatty acid chain to cysteine residues. There is one confirmed site of palmitoylation at cysteine residue 714.

GalNAc O-Glycosylation is the addition of a sugar molecule to a serine or threonine residue, which possibly increases structural stability. Some of these residues overlap with phosphorylation sites, indicating that these residues can switch between a phosphorylation site. These sites were detected at:
 * Serine residues 137, 243, 267, 271, 274, 295, 485, 528, 529, 688, and 695. Sites 267, 274, 528, 529, and 688 overlap with a phosphorylation site.
 * Threonine residues 412, 440, 447, 484, 534.

N-glycosylation is the addition of a sugar molecule to an asparagine residue. Asparagine residue 483 is the only detected N-glycosylation site in WDCP.

There were no sites of amidation, C-linked mannosylation, GPI modification sites, non-classical protein secretion, transmembrane helices or regions, prediction of R and K cleavage sites, lipoprotein sites, sulfonated tyrosines, or Twin Arginine signal peptides.

Subcellular localization
WDCP Isoform 1 has no transmembrane domains, actin-binding motifs, ER retention motifs, or Golgi transport signals. The protein is most likely located in the nucleus, with a reliability score of 47.8%, and a 30.4% chance of being located in the cytoplasm. Close orthologs of WDCP Isoform 1 have shown similar results for orthologous proteins, where the protein is most likely located in the nucleus. In addition, there are two predicted nuclear localization sequences in WDCP, starting at residues 401 and 581.

Immunostaining of WDCP has shown localization in the nucleoli of osteosarcoma cells, as well as the cytoplasm of kidney cells.

Function
The function of WDCP is currently not well-understood, but due to increased expression levels in the bone marrow and thymus, the protein could have possible relations to immune function and development. Its location in the nucleus, relation to the MRN complex, an abundance of phosphorylation sites, and associations with various cancers could indicate a role in cell growth regulation or a proto-oncogenic function.

Interacting proteins
WDCP has known interactions with HCK, where a proline-rich region of WDCP binds to the Src homology 3 domain of HCK. As mentioned before, WDCP was known to exist in a fusion with ALK. This fusion changes the structure of ALK, which results in constitutive signaling.

Studies have confirmed interactions between WDCP and RuvB-like proteins 1 and 2 in human embryonic kidney cells, which belong to a family of AAA proteins associated with ATPase activity, C1q and tumor necrosis factor related protein 2 and DYNLT1.

Based on the transcription factor binding sites listed in the transcriptional regulation section, WDCP could have possible interactions with the following transcription factors:
 * HNF4A: Elevations in expression levels are associated with colorectal cancer.
 * CEBPB: Involved in regulating immune responses
 * EGR1: Tumor suppressor gene involved with neuronal plasticity and memory formation
 * FOS: Proto-oncogene associated with osteosarcomas.
 * ETS1: Aids in pathogen and tumor defense.
 * E2F6: Associated with the oncogenic Bmi1 polycomb complex.

Clinical significance
Studies have linked WDCP to various cancers, including colorectal cancer, leukemia, and osteosarcomas. WDCP levels are higher in colorectal cancer metastases compared to the primary tumor. GEO Records show elevated levels of WDCP in leukemia cell lines, which are regulated with Imatinib, a drug used to treat chronic myelogenous leukemia. This pattern is also seen in HeLa cell lines when treated with Casiopenias, small molecules with an active Cu2+ that allow the molecule to bind to tumors and induce apoptosis.

Homology
There are no paralogs of WDCP, but orthologs of this gene were found in primates, rodents, reptiles, birds, fish, amphibians, echinoderms, and possibly fungi. There are no orthologs in prokaryotes or plants. There were no organisms with proteins containing homologous domains.

The graph to the right shows the rate of evolution of WDCP in comparison to the evolution rate of the fibrinogen alpha-chain (NCBI: NP_068657) and cytochrome c (NCBI: NP_061820). As seen in the graph to the right, the evolution rate of WDCP is faster than that of cytochrome c, but slower than the evolution of the fibrinogen alpha-chain.

While there are some sequences in WDCP that are conserved (which can be seen in the conceptual translation), there are very few known conserved domains among the various orthologs. There is one conserved glycation site detected through a multiple sequence alignment, lysine 389. The table below shows a list of orthologs, the evolutionary date of divergence between the organism and humans, and the % identity between WDCP Isoform 1 and the orthologous protein sequence.

Table 4. Table of organisms with a WDCP orthologous protein.