Talk:BLOSUM

BLOSUM62: more or less than 62% identity?
"The Henikoffs took a big database of trusted alignments (their BLOCKS database), and (in effect) only counted pairwise sequence alignments related by less than some threshold percentage identity. A threshold of 62% identity or less resulted in the target frequencies for the BLOSUM62 matrix. An 80% threshold gave the more highly conserved target frequencies of the BLOSUM80 matrix, and a 45% threshold gave the more divergent BLOSUM45 matrix."

Source: Sean R. Eddy, Where did the BLOSUM62 alignment score matrix come from? Nature Biotechnology 22, 1035--1036 (2004) doi:10.1038/nbt0804-1035

http://www.nature.com/nbt/journal/v22/n8/full/nbt0804-1035.html

"In order to avoid over-weighting closely-related sequences, the Henikoffs replaced groups of proteins that have sequence identities higher than a threshold by either a single representative or a weighted average. The threshold of 62% produces the commonly used BLOSUM62 substitution matrix."

Source: Arthur M. Lesk, Introduction to Bioinformatics Oxford University Press, 2002, p.175

Winterschlaefer 15:52, 14 February 2007 (UTC)


 * For what I know a BLOSUM62 matrix is good for alignements which have 62% or MORE identity XApple 00:32, 25 February 2007 (UTC)

I agree with Winterschlaefer. For the BLOSUM62, the Henikoffs weighted all the sequences with similarity 62% or more as one single sequence, thus contributing less to the matrix. As the paper reads,

"To reduce multiple contributions to amino acid pair frequencies from the most closely related members of a family, sequences are clustered within blocks and each cluster is weighted as a single sequence in counting pairs. This is done by specifying a clustering percentage in which sequence segments that are identical for at least that percentage of amino acids are grouped together."

Also, as I can read in the history of this article, the following statement used to be part of the references section: "BLOSUM62 is for sequences of 62% OR GREATER sequence identity, not less than 62% (Voet, D., Voet,J., 2005)" and this may well be what Voet & Voet claim. However, this is different from the following statement, which is now referenced with Voet & Voet: "BLOSUM62 is the matrix calculated by using the observed substitutions between proteins which have 62% or more". What I'm saying is that this reference does not support this claim. The BLOSUM62 matrix actually is calculated (primarily) from sequences which have 62% and less sequence identity. Still, IMHO, BLOSUM62 is designed for sequences with similarities around 62%, not more. If I'ld want to compare sequences with a similarity of 80%, I'ld choose BLOSUM80.

Source: Henikoff & Henikoff Amino acid substitution matrices from protein blocks PNAS 89, pp. 10915-10919 134.34.4.5 21:09, 28 May 2007 (UTC)


 * It is definitely the case that the BLOSUM62 is based only on sequences that have 62% or more identity while the BLOSUM80 is based on sequences with 80% or more identity. Which one you use is up to your personal taste but as far as I know you would use a BLOSUM that is around your sequence identity where I agree with the speaker above. The error was fixed here. Greetings--hroest 03:39, 4 June 2008 (UTC)

NO! Look at the original paper. If you read the Henikoff & Henikoff paper it is clear that the earlier comments that BLOSUM62 means that sequences with > 62% identity were averaged is correct, i.e. BLOSUM62 mostly represents changes in sequences with less than 62% conservation. Hroest's assertion above is without a reference and is incorrect. I'm going to revert the article.

Source: Henikoff & Henikoff Amino acid substitution matrices from protein blocks PNAS 89, pp. 10915-10919

http://www.pnas.org/content/89/22/10915.full.pdf

Jnmaloof (talk) 23:46, 9 November 2017 (UTC)

Illustration
This badly needs a picture of a typical Blosum matrix XApple 14:52, 12 February 2007 (UTC)
 * It did get one. --hroest 05:50, 7 March 2008 (UTC)

"BLOSUM matrix" is correct
Some smart people say that one must say "BLOSUM" instead of "BLOSUM matrix" because the "M" in BLOSUM already means "matrix". The latter is correct, but the term BLOSUM is by now a name, not just an abbreviation. BLOSUM is a technical term. It is common sense in the scientific community to speak of "BLOSUM matrices". Just saying "BLOSUM" is counterintuitive and not colloquial.

Furthermore, if we wanted to get it linguistically really right, the article itself contained mistakes. It wrote: "To calculate a matrix for BLOSUM, ...". This is grammatically wrong, whatever opinion one has about BLOSUM matrices. 134.76.81.25 (talk) 10:49, 22 September 2010 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on BLOSUM. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20130309015755/http://birec.org/sandbox/omamasaudtutorial to http://www.birec.org/sandbox/omamasaudtutorial

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 09:31, 14 September 2017 (UTC)

Confusing grammar
I speak and read correct English and yet the following sentences do not seem to make sense to me:

"Scores for each position are obtained frequencies of substitutions in blocks of local alignments of protein sequences."

" By using the block, counting the pairs of amino acids in each column of the multiple alignment."

"Two major forces drive the amino-acid substitution rates away from uniformity: substitutions occur with the different frequencies, and lessen functionally tolerated than others."

Overall pretty confusing