User:Tiffykt/Assignment 2

Assignment 2 Full name: Tiffany Tong Organism: N/A Assigned: Monday September 27, 2010 Due date: Monday October 18, 2010

= Retrieve = Objectives:
 * Use the organism Saccharomyces cerevisiae as a model to study the concept of function and sequence synteny in fully sequenced fungi genomes.
 * Find evidence on the presence of Mbp1 in other fully sequenced fungi genomes.

Procedure:
 * 1) Go to the NCBI site and search Mbp1 AND "saccharomyces cerevisiae"[organism].
 * 2) From the results given, click on Protein. On the following page, on the right hand side click on the RefSeq record link.
 * 3) Retrieve the FASTA record, which can be found in the results section.
 * 1) Go to the UniProt ID-Mapping site and search for the Mbp1 protein record using the RefSeq ID (NP_010227.1). The corresponding UniProtKB Accession number is: D6VRU0. Navigate to the  PIR ID-mapping service site to locate the UniProtKB Accession number using the RefSeq ID for Mbp1. The same UniProtKb Accession number was obtained (D6VRU0).
 * 2) Go to the UniProt site and search the SequenceClusters (UniRef) database using the UniProtKB Accession number D6VRU0. Under the Results section, click on 100% to display clusters with 100% identity. There should be two members in this cluster (D6VRU0 and P39678).

Results: FASTA sequence of Mbp1: >gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c] MSNQIYSARYSGVDVYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASPPPAPKHHHASKVDRKKAIRSASTSAIMET KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA

Note down the Swiss-Prot ID and UniProtKB Accession number of Mbp1.
 * The UniProtKB/TrEMBL accession number and entry name for Mbp1 are D6VRU0 and D6VRU0_YEAST, respectively.
 * The UniProtKB/Swiss-Prot ID accession number and entry name for Mbp1 are P39678 and MBP1_YEAST, respectively.

Comparison between the Swiss-Prot and TrEMBL record for Mbp1.

Conclusions:
 * The UniProt ID-Mapping tool can be used to find the UniProtKB accession number of an organism with the NCBI RefSeq identifier.
 * It is important to look at the a UniProtKB record type to ensure if it's reviewed (SwissProt) or unreviewed (TrEMBL).

=Analyse=

saccharomyces cerevisiae Mbp1 - domain annotations
Objectives:
 * 1) Annotate the domains of the S. cerevisae Mbp1 protein.

Procedure:
 * 1) Go to the SMART site and search for the S. cerevisae Mbp1 protein using its Swiss-Prot ID (P39678), check off the following options: PFAM domains, internal repeats, and intrinsic protein disorder.

Results:
 * Domain features of the S. cerevisae Mbp1 protein are highlighted:
 * >gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
 * MSNQIYSARYSGVD VYEFIHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
 * GKYQGTWVPLNIAKQLAEKFSVYDQLKPLFDFTQTDGSASP PPAPKHHHASK VDRKKAIRSASTSAIMET
 * KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
 * PSIRSTMGPQSPTLGILEEERHDSR QQQPQQ NNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVP QQ
 * QSSLIQTQQTESMATSVSSSPSLPTSP GDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
 * NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDP ELHTAFHWACSMGNLPIAEALYEAGTS
 * IRS TNS QGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDS QSQTV IHHIVKRKSTTPSAVYYLDVVL
 * SKIKDFSPQYRIELLLNTQDK NGDTALHIASKNGDVVFFNTLVKMGALTTI SNK EGLTANEIMNQQYEQM
 * MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
 * MA SIYNDLHEQHDNEIKSLQKTLKS ISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNT K
 * KLRKRLIRYKRLIKQKL EYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
 * LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA


 * Domain features: Pfam:KilA-N, low complexity , low complexity , ANK , ANK , ANK , coiled-coil , and low complexity.
 * Overlap features: low complexity, Pfam: Ank , Pfam: Ank , and Pfam: Ank.

Conclusions:
 * SMART can be used to find the domain features of a given protein using a Swiss-Prot ID.
 * Proteins can have features that overlap or below threshold values.

APSES (KilA-N) domains
Objectives:
 * 1) Explore the functions of the Conserved Domain Database.

Procedure:
 * 1) Search the Conserved Domains database for KilA-N. By clicking on the pfam04383 result, 10 aligned sequences will appear.
 * 2) Go the RefSeq record for the S. cerevisae Mbp1 protein. Under the All links from this record menu, select the link named CCD Search Results.
 * 3) Click on the <tt>[+]</tt> to expand the record for <tt>KilA-N</tt>, to reveal the PFAM sequence alignment with Mbp1.

Results:

Check whether the NCBI and the SMART definition of the KilA-N (APSES) domain in Mbp1 coincide.
 * The NCBI and SMART definition of the KilA-N (APSES) domain in Mbp1 do coincide such that the sequences of the two definitions do align. The only difference is that the defined boundaries are not the same.

Highlight in suitable colors the domain boundaries as defined by the NCBI and by SMART in your FASTA sequence.
 * <tt>>gi|6320147|ref|NP_010227.1| Mbp1p [Saccharomyces cerevisiae S288c]
 * MSNQIYSARYSGVD VYEF<font style="font-weight:bold;text-decoration:underline;color:#FC8B91">IHSTGSIMKRKKDDWVNATHILKAANFAKAKRTRILEKEVLKETHEKVQGGF
 * <font style="font-weight:bold;text-decoration:underline;color:#FC8B91">GKYQGTWVPLNIAKQLAEKFSVY DQLKPLFDFTQTDGSASP PPAPKHHHASKVDRKKAIRSASTSAIMET
 * KRNNKKAEENQFQSSKILGNPTAAPRKRGRPVGSTRGSRRKLGVNLQRSQSDMGFPRPAIPNSSISTTQL
 * PSIRSTMGPQSPTLGILEEERHDSRQQQPQQNNSAQFKEIDLEDGLSSDVEPSQQLQQVFNQNTGFVPQQ
 * QSSLIQTQQTESMATSVSSSPSLPTSPGDFADSNPFEERFPGGGTSPIISMIPRYPVTSRPQTSDINDKV
 * NKYLSKLVDYFISNEMKSNKSLPQVLLHPPPHSAPYIDAPIDPELHTAFHWACSMGNLPIAEALYEAGTS
 * IRSTNSQGQTPLMRSSLFHNSYTRRTFPRIFQLLHETVFDIDSQSQTVIHHIVKRKSTTPSAVYYLDVVL
 * SKIKDFSPQYRIELLLNTQDKNGDTALHIASKNGDVVFFNTLVKMGALTTISNKEGLTANEIMNQQYEQM
 * MIQNGTNQHVNSSNTDLNIHVNTNNIETKNDVNSMVIMSPVSPSDYITYPSQIATNISRNIPNVVNSMKQ
 * MASIYNDLHEQHDNEIKSLQKTLKSISKTKIQVSLKTLEVLKESSKDENGEAQTNDDFEILSRLQEQNTK
 * KLRKRLIRYKRLIKQKLEYRQTVLLNKLIEDETQATTNNTVEKDNNTLERLELAQELTMLQLQRKNKLSS
 * LVKKFEDNAKIHKYRRIIREGTEMNIEEVDSSLDVILQTLIANNNKNKGAEQIITISNANSHA</tt>


 * Domain boundaries defined by SMART, and <font style="font-weight:bold;text-decoration:underline;color:#FC8B91">NCBI . The domain as defined NCBI falls within the boundaries set by SMART, and its <font style="font-weight:bold">bold , <font style="text-decoration:underline">underlined , <font style="color:#FC8B91">pink font overlaps the yellow highlight colour of SMART.

Conclusions:
 * The CDD can be used to display conserved domain families, which can be compared to conserved domains as defined by SMART.

APSES domain structure
Objectives:
 * 1) Use the PDB to compare and select protein records for the most appropriate protein file.

Procedure:
 * 1) Go to PDB site and click on Advanced Search and select UniProtKB Accession Number(s). Enter the UniProtKB accession number (P39678) for Mbp1 and click Result Count. There should 3 structure results, click on this link to see the results. The procedure used to identify the most appropriate file to study APSES can be found in the Results section.

Results: Comparison of 3 potential files to be used to study the APSES domain. Identify and download the most appropriate coordinate file to study the structure, function and conservation of APSES domains from the PDB.
 * I selected the 1MB1 file as the most appropriate coordinate file based on this file's properties. For instance, it's resolution (2.10Å) and RSMD bond length (0.016Å) values are closest to the "good structure" value for resolution (2.00Å) and RSMD bond length (0.02Å) compared to 1BM8. This file also analyzes the DNA-binding domain of APSES, whereas 1BM8 looks at the N-terminal region of this protein, which accounts for its smaller length (99 amino acids). Moreover, 1BM1 lacks fewer atoms compared to the 1L3G structure.

Conclusions:
 * The factors influencing the choice of the most appropriate coordinate file to study was based on criteria such as having a resolution of 2.00Å, RMSD bond length of 0.020Å and minimal RMSD bond angle.

DNA binding site
Objectives:
 * 1) Use VMD to create protein representations.

Procedure:
 * 1) Create a VMD image using the Mbp1 protein that displays the DNA recognition domain (residues 50-74).
 * 2) Create another representation that emphasizes the protein's secondary structure using a colouring method by structure and drawing method such as tube or cartoon.
 * 3) Generate a third image that displays the structure's (i) backbone, (ii) side chains that might contact DNA, and (iii) transparent surface.

Results: VMD image displaying the Mbp1 DNA recognition domain:

VMD image displaying the Mbp1 secondary structure:

VMD image displaying the Mbp backbone, sidechain, and transparent protein surface:

''DNA binding interfaces are expected to comprise a number of positively charged amino acids that might form salt-bridges with the phosphate backbone. Report whether this is the case here and which residues might be included.'' ''Do the DNA binding residues form a contiguous surface that is compatible with a binding interface? Justify your conclusions.'' Conclusions:
 * Each of the following residue pairs form a salt bridge: Asp31 and Arg28, Glu57 and Lys89, and Glu64 and Arg28. Arg28 and Lys89 are positively charged amino acids, thus at least one amino acid in each pair is positively charged. Glu57 and Glu64 are negatively charged amino acids in the proposed DNA recognition domain that are found to form salt-bridges.
 * Yes, the DNA binding residues form a contiguous surface that is compatible with a binding interface. In the DNA binding domain, the negatively charged residues Glu57 and Glu64 form salt bridges with the positively charged residues Arg28 and Lys89, and negatively charged residue Asp31, in the rest of the protein.
 * The following residues: Arg50, Arg52, Arg56, Arg60, His63, Lys65,and Lys72 form a positively charged area in the APSES domain close to the N-terminal of the protein. This group of residues face away from the protein.
 * The uncharged residue Gln67 is presumed to be involved in binding to the minor groove of the protein. The positively charged residue His67 in the proposed recognition helix is assumed to be significant in pH dependant binding affinity. These proteins may help in positioning the Mbp1 protein to facilitate DNA base-specific interactions.
 * VMD is a useful tool in visualizing a protein and helps in analyzing structural features.
 * In VMD, the Salt Bridges tool can be used to identify the salt bridges in a given protein.