User:Tanecious/sandbox

Mascot is a software search engine that uses mass spectrometry data to identify proteins from primary sequence databases.[1][2]

Many research facilities utilize Mascot, either by private or public server for database searching.

Mascot uses a probabilistic scoring algorithm for protein identification. The algorithm is adapted from the MOWSE algorithm, developed at the Imperial Cancer Research Fund, and licensed from Cancer Research Technology.

Mascot is freely available to use on the website of Matrix Science (ref). A License is required for in-house use where more features can be incorporated.

History

MOWSE was one of the first algorithms developed for protein identification using [peptide mass fingerprinting] [3]. It was originally developed in 1993 as a collaboration between Darryl Pappin of the Imperial Cancer Research Fund (ICRF) and Alan Bleasby of the Science and Engineering Research Council (SERC). It stood apart from other protein identification algorithms in that it produced a probability-based score for identification. It was also the first to take into account the non-uniform distribution of peptide sizes, caused by the enzymatic digestion of a protein that is needed for mass spectrometry analysis. With the help of David Perkins MOWSE was re-coded and updated in 1997 to incorporate tandem mass spectrometry (MS/MS) fragment ion searches and searches which combined peptide mass data with amino acid sequence information. To reach a wider audience an external bioinformatics company named Matrix Science was employed to develop and distribute MOWSE. MOWSE was renamed Mascot and was coded for parallel execution on a variety of multiprocessor platforms. Mascot has been a free and unrestricted service on Matrix Science webpage since 1999 [Matrix Science ref]. Matrix Science still continues to work on improving Mascot’s functionality.

Applications

Mascot identifies proteins by interpreting mass spectrometry data. The prevailing experimental method is a bottom-up approach, where the protein sample is typically digested with Trypsin to form smaller peptides. Peptides usually fall within the limited mass range that a mass spectrometer can measure. The mass spectrometer measures the molecular weights of the particles in the sample. Mascot compares these molecular weights against a database of known peptides. The program cleaves every protein in the specified search database In silico according to specific rules depending on the cleavage enzyme. The more peptides Mascot identifies from one protein, the better the score for that protein.

Features


 * Peptide Mass Fingerprint search: Identifies proteins from an uploaded peak list using a technique known as Peptide mass fingerprinting.


 * Sequence query: Combines peptide mass data with amino acid sequence and composition information usually obtained from MS/MS Tandem Mass spectrometry data. Based on the sequence tag Peptide sequence tag approach.


 * MS/MS Ion Search: Identify fragment ions from uninterpreted Tandem Mass spectrometry data of one or more peptides.

(parameters)


 * Modifications can be specified as fixed or variable. Fixed modifications are applied universally to every residue of the specified type or the terminus. The mass for the modification is added to each of the respective residues. Variable modifications can be specified and the program tries to match every different combination of residues with modification present or not. This can increase the number of comparisons dramatically and lead to lower scores and longer search time.


 * By setting a taxonomy, a search can be restricted to certain species or groups of species. This will reduce search time and ensure that only relevant protein hits are included.

Scoring Mascot’s fundamental approach to identifying peptides is to calculate the probability whether an observed match between experimental data and peptide sequences found in a reference database has occurred by chance. The match with the lowest probability of occurring by chance is returned as the most significant match. However, the significance of the match depends on the size of the database that is being queried. Mascot employs the widely used significance level of 0.05, meaning that in a single test the probability of observing an event at random is less than or equal to 1 in 20. In this light, a score of 10^-5 would sound very promising. If the database being searched contains 10^6 sequences we would have to expect several scores of this magnitude by chance alone because the algorithm carried out 10^6 individual comparisons. For a database of that size, by applying a Bonferroni correction to account for multiple comparisons, the significance threshold drops to 5*10^-8.