User:Dgary0202/sandbox

Proteogenomics is a field of biological research that utilizes a combination of proteomics, genomics, and transcriptomics to aid in the discovery and identification of peptides. Protegenomics is used to identify new peptides by comparing MS/MS spectra against a protein database that has been derived from genomic and trancriptomic information. In this way, proteogenomics often refers to studies that use proteomic information, often derived from mass spectrometry, to improve gene annotations. Geonomics deals with the genetic code of entire organisms, while transcriptomics deals with the study of RNA sequencing and transcripts. proteomics utilizes tandem mass spectrometry and liquid chromatography to identify and study the functions of proteins. Proteomics is being utilized to discover all the proteins expressed within an organism, known as its proteome. The issue with proteomics is that it relies on the assumption that current gene models are correct and that the correct protein sequences can be found using a reference protein sequence database. This is not always the case as some peptides cannot be located in the database.In addition, novel protein sequences can occur through mutations.these issues can be fixed with the use of both proteomic, geonomic, and trancriptomic data.The utilization of both proteomics and geonomics led to proteogeonmics which was first coined as a field in 2004.

Methodology
The main idea behind the proteogenomic approach is to identify peptides by comparing MS/MS to protein databases that contain predicted protein sequences. The protein database is generated in a variety of ways through the utilization of genomic and transcriptomic data. Below are some of the ways in which protein databases are generated:

Six-frame translation
Six-frame translations can be utilized to generate a data base that predicts protein sequences. The limitation of this method is that the databases will be very large due to the number of sequences that are generated, some of which do not exist in nature.

Ab initio gene prediction
In this method, a protein base is generated by gene predicting algorithms that enables the identification of protein coding regions.The database is similar to one generated through six-frame translation in regards to the fact that the databases can be very large.

Expressed sequence tag data
Six-frame translations can utilize an expressed sequence tag (EST) to generate protein databases. EST data provide transcription information which can aid in the creation of the database. The database can be very large and has the disadvantage of having multiple copies of a given sequence present; however, this problem can be circumvented by compressing the protein sequence generated through computational strategies.

Other methods
Protein databases can also be created by using RNA sequencing data, annotated RNA transcripts, and variant protein sequences. There are also other more specialized protein databases that can be made to appropriately identify the peptide of interest.

Another method in the identification of proteins through proteogenomics is comparative proteogenomics. Proteogenomics compares proteomic data from multiple related species concurrently and exploits the homology between their proteins to improve annotations with higher statistical confidence.

Applications
Proteogenomics can be applied in different ways. Proteogeonomics has been has been applied to improve the gene annotations of various organisms. Gene annotation involves discovering genes and their functions Proteogenomics has become especially useful in the discovery and improvement of gene annotations in prokaryotic organisms. For example, various microorganisms have had their genomic annotation studied through the proteogenomic approach including, Escherichia coli, Mycobacterium, and multiple species of Shewanella bacteria.

Besides improving gene annotations, proteogenomic studies can also provide valuable information about the presence of programmed frameshifts, N-terminal methionine excision, signal peptides, proteolysis and other posttranslational modifications. Proteogenomics has potential applications in medicine, it has been especially applicable to oncology research. Cancer occurs through genetic mutations such as methylation,translocation, and somatic mutations. Research has shown that both the genomic and proteomic information are needed to understand the molecular variations that lead to cancer. Proteogenomics has aided in this through the identification of protein sequences that may have functional roles in cancer. A specific example of this occurred in a study involving colon cancer that resulted in the discovery of potential targets that for cancer treatment. Proteogenomics has also led to personalized cancer targeting immunotherapies, where antibody epitopes for cancer antigens are predicted using proteogenomics to create medicines that act on the patient's specific tumor. In addition to treatment, proteogenonomics may provide insight into cancer diagnosis. In studies involving colon and rectal cancer, proteogenomics was utilized to identify somatic mutations. The identification of somatic mutations in patients could be used to diagnose cancer in patients. In addition to direct applications in cancer treatment and diagnosis,a proteogenomic approach can be used to study proteins that result in resistance to chemotherapy.

Challenges
Proteogenomics may offer methods of peptide identification without having the disadvantage of the incomplete or inaccurate protein databases faced by proteomics,however there are incurring challenges with the proteogenomic approach. One of the biggest challenges of proteogeonomics is the sheer size of protein databases generated. statistically, a large protein database is more likely due to the incorrect matching of the data from the protein database to the MS/MS data, this issue can hinder the identification of new peptides. False positives are also an issue through proteogenomic approaches. false positives can occur as a result of extremely large protein data bases that result in miss-matched data leading to incorrect identification. Another possibility is the incorrect matching of MS/MS spectra to protein sequence data that corresponds to a similar peptide instead of the actual peptide. Another possibility is receiving data of a peptide located at multiple gene sites, this will lead to data that can be interpreted in different ways. Despite these challenges, there are ways to reduce many of the errors that occur.For example, when dealing with a very large protein database, one could compare the identified novel peptide sequences to all of the sequences within the database and then compare the post transnational medications of the two sequences to determine whether or not they represent the same peptide.

Polymer Chemistry Article Selections
1. Superabsorbent polymer 2. Thermosetting polymer 3. Polymer brush 4. Interpenetrating polymer network