Talk:Position weight matrix

confusion in PWM definition
The use of the perceptron algorithm is stated in the "background" section.

In the "creation" section, we find:

Both PPMs and PWMs assume statistical independence between positions in the pattern

There is something unclear in this article: the perceptron does not make any prior assumption on the statistical relationship between input features.

It seems that the "creation" section is unrelated to the perceptron algorithm. I feel like there is a confusion between a weighted function (i.e. perceptron scheme) in "background" and the PWM in "creation".

reference translation
Does anyone have an English translation of this article that is referenced: Aleksandrushkina NI, Egorova LA. "[Nucleotide makeup of the DNA of thermophilic bacteria of the genus Thermus.]" Mikrobiologiia. Mar 1978, 47(2):250-2.

I have believed that my thesis work, part of which was published here: Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. Stormo GD, Schneider TD, Gold L, Ehrenfeucht A. Nucleic Acids Res. 1982 May 11;10(9):2997-3011. was the introduction of position weight matrices (or PWMs, PSSMs, etc) to the field of biological sequence analysis. If this Russian paper actually used such matrices that would predate my work, but I have not seen this reference before. Gary Stormo —Preceding unsigned comment added by 128.252.234.247 (talk) 14:50, 27 February 2008 (UTC)
 * I think this paper was referencing the sentence "the GC-content of DNA of thermophilic bacteria range from 65.3 to 70.8...", rather than the use of PWMs. I'm rewriting the article at the moment so I'll have a look for an English source for a similar statement and try to word that section to avoid confusion. Thanks, --Amkilpatrick (talk) 10:05, 16 November 2013 (UTC)

The difference between log-odds and log-likelihood needs clarification
At least, I'm not able to properly understand the difference. The intro is fine, it just defines PSSM as a matrix of scores, without saying much about what the scores may be. Somehow, the log-likelihood section seems to think this implies log likelihoods, without bothering with definitions. To me it looks like the difference between log likelihood and log odds is the use of a background model, that can't be right? Ketil (talk) 10:33, 6 September 2011 (UTC)


 * You're right, this needs clarifying - in fact, the whole page could be doing with a good tidy up and references added - I'll put it on my list of things to do! The whole mention of log likelihoods is confusing, especially as the basic definition of PSSMs/PWMs doesn't include logs at all. Use of a background model lets you construct a log odds matrix by doing something like log(p/f), where p is the probability of a given letter at a given position and f is the relative frequency of that letter in your dataset, but this isn't really clear from the article. What might also be useful is examples of how these are used in practice (basic PWMs representing sequence motifs, log odds matrices are often used to find the most likely position for a motif within a longer sequence, given a motif model as a PWM). As I say, I'll get round to this, but if you want to make a start, feel free to go ahead --Amkilpatrick (talk) 10:28, 16 January 2012 (UTC)

Recent edits, ideas for future
I've recently made a number of edits to this article to try and improve it and make things a bit clearer. I don't have enough time at the moment to finish the article as I'd imagined, but future edits should probably revise the Information Content and Using PWMs sections. The latter section almost certainly needs an explanation on how PWMs are used to scan sequences for the most likely motif occurrences, etc. I'd also suggest a section on representing PWMs (as sequence logos) - this could be short, as there's already an article on sequence logos. Finally, it would be good to have some discussion of alternative representations (some recent papers have suggested using HMMs to represent motifs, for instance) etc. It might be a while before I can get round to this but I'm happy to advise if someone else wants to take this on... --Amkilpatrick (talk) 11:13, 6 January 2014 (UTC)

Assessment comment
Substituted at 18:33, 17 July 2016 (UTC)