Similarity Matrix of Proteins

Similarity Matrix of Proteins (SIMAP) is a database of protein similarities created using volunteer computing. It is freely accessible for scientific purposes. SIMAP uses the FASTA algorithm to precalculate protein similarity, while another application uses hidden Markov models to search for protein domains. SIMAP is a joint project of the Technical University of Munich, the Helmholtz Zentrum München, and the University of Vienna.

Project
The project usually got new work units at the beginning of each month. More recently, (2010), inclusion of environmental sequences into the database has required longer periods of activity, several months of continuous work for example. Typically, these updates occurred twice each year.

In the fourth quarter of 2010, the project relocated to the University of Vienna due to the failing electrical infrastructure at the Technical University of Munich. Part of this exercise involved the creation of a project specific URL requiring existing volunteers and users to detach/reattach to the project.

On May 30, 2014, it was announced by project administrators that after a 10-year history, SIMAP would be leaving BOINC by the end of 2014. SIMAP research, however, will go forward with the use of local hardware consisting of "ordinary multi-core CPUs (some hundreds), crunching a SSE-optimized version of the Smith-Waterman algorithm."

Computing platform
SIMAP used the Berkeley Open Infrastructure for Network Computing (BOINC) distributed computing platform.

Application performance notes
Work unit CPU times varied widely, ranging between 15 minutes and 3 hours. Work units varied in size from 1.5 to 2.2 MB each, averaging around 2 MB. SIMAP provided client software optimized for SSE enabled processors and x86-64 processors. For older processors non SSE applications are provided but require manual installation steps to be taken. Operating Systems supported by SIMAP are Linux, Windows, Mac OS, Android, and other UNIX platforms. Since the database had sometimes been completed with all publicly known protein sequences and metagenomes having been precalculated by the project, the work available consisted of newly published protein sequences and metagenomes that needed to be precomputed for SIMAP.