User:Bostanict/Sandbox

Cysteine Motif and Prediction Database (CMPD) is a database of cysteine flanking motifs. It entails all the Cysteine flanking and pairing motif Cysteines extracted from Protein Databank (PDB) and UniProt.

Computational approaches to disulfide bonding state and its connectivity pattern prediction are based on various descriptors. One generated descriptor is based on the sequence’s amino acid composition and flanking residues around the cysteine residue. These immediate residues have been shown to influence the cysteine redox potential and the cysteine’s steric accessibility. Since its proposal as a descriptor in 1990, these sequence motifs have been fed into various prediction methods such as machine learning approaches (i.e statistical methods, neural networks (NNs), support vector machine (SVM) and has been the basis of various prediction tools such as DiaNNA, DISULFIND, DCON and CysView.

However, there is currently no database that stores these disulphide motifs. Motivated by this absence and its usefulness in predicting cysteine bonding state and connectivity prediction, we have developed Cysteine Motif and Prediction Database (CMPD) as a database to store cysteine flanking motifs. Creation of a motif miner in CMPD will allow the extraction of flanking motifs, store and study its bonding and connectivity propensities. Examining these sequences would allow researchers to study the composition propensity and its role in determining the bonding state. The expansion of RCSB and the increase of PDB files have significantly increased the number of motif beyond what has been utilised in prior research. We extracted 878000 cysteine motifs from which the users can now query more than 77,000 unique cysteine motifs and cysteine pairing motifs generated from PDB and UniProt files. CMPD query types include PDB ID, UniProt ID, sequence and motifs. These datasets are downloadable and parseable using web service and API. In addition, we included a prediction tool based on the cysteine motifs. We plan to present CMPD as a publicly available tool that would complement existing prediction tools and composition analysis that uses similar motifs scheme.