Talk:Smith–Waterman algorithm

Wrong formula for matrix?
Referring back to the 1981 paper of Smith and Waterman (ftp://150.128.97.71/pub/Bioinformatica/msw-042.pdf, formula (1)), the definition formula for the cells of the matrix H is not the same as in this article. The gap scores do not only depend on the row above and the column to the left; they depend on all rows above and all columns to the left. There are examples where this makes a difference to the final score, so the definition here is not correct.

An example: Reference: AACCCGAGTGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAATGGAAAGCAATGGCAAGAATTGGAATGGAAAGGAATC Read: GAATGGAATGGAATGCAATGGAATGGAATCATCCGTAATGGAATGGAAAGCAATGGAATG

Scoring +2 for match, -6 for mismatch, -5 for gap open, -3 for gap extend, the overall score is 66 by Smith-Waterman's algorithm and 62 by the algorithm presented here. — Preceding unsigned comment added by 217.33.180.66 (talk) 13:15, 17 March 2014 (UTC)

I have now corrected this on the page. — Preceding unsigned comment added by BarrierBank (talk • contribs) 10:49, 21 March 2014 (UTC)

I updated the algorithm using the original notation. I also explained why this difference exists in the gap penalty section. Yz cs5160 (talk) 07:20, 8 January 2017 (UTC)

WPMED
I am not sure this article falls within the scope of WP:MED. Anyone who agrees, please delete the WPMED tag on this page. --Una Smith (talk) 04:29, 29 December 2007 (UTC) Biology - Yes, Medicine - well, there is very remote connection - but no. Crenshaw (talk) 02:03, 19 January 2010 (UTC)

Pseudocode of Algorithm and Remarks to Code are missing
There is the simple but efficient pseudocode of the algorithm missing. It is tailored to fit to the properties of genes in common genomes. Maybe someone can link the properties of common (for example eucaryotic cell) genes having introns and exons to the properties of Smith-Waterman or pairwise Smith-Waterman pSW --84.157.243.81 (talk) 09:32, 24 July 2008 (UTC)

Eq. -- why the 0?
Shouldn't $$H(i,j) = \max \begin{Bmatrix} 0 & \\ H(i-1,j-1) + \ w(a_i,b_j) & \text{Match/Mismatch} \\ H(i-1,j) + \ w(a_i,-) & \text{Deletion} \\ H(i,j-1) + \ w(-,b_j) & \text{Insertion} \end{Bmatrix} ,\; 1\le i\le m, 1\le j\le n $$ be $$H(i,j) = \max \begin{Bmatrix} H(i-1,j-1) + \ w(a_i,b_j) & \text{Match/Mismatch} \\ H(i-1,j) + \ w(a_i,-) & \text{Deletion} \\ H(i,j-1) + \ w(-,b_j) & \text{Insertion} \end{Bmatrix} ,\; 1\le i\le m, 1\le j\le n $$

What's that zero doing in the B-matrix? —Preceding unsigned comment added by 129.170.215.143 (talk) 23:38, 18 June 2009 (UTC)

Smith-Waterman is for local alignment, which means that if H(i, j) is 0, then it's best for the alignment to start there. I think the example is really confusing, because it doesn't illustrate this fact, so you get an optimum local alignment that is also a global alignment. 69.218.212.242 (talk) 17:20, 8 August 2009 (UTC)

See section "Comparison with the Needleman–Wunsch algorithm" for the explanation. I also redid the example completely. Yz cs5160 (talk) 07:35, 8 January 2017 (UTC)

Performance reports
I find this article similar to post with "me faster" signs. As someone who actually takes part in "competition" of speeding up Smith-Waterman for different architectures I think it's a bit NPOV. (disclaimer: i haven't add mine implementation yet to the article but I intend to do so).

Performance results should be reported not as a single value but a graph with performance plotted against query size and database used. For example Fastflow implementation achieves 35GCUPS for __very__ long queries that are barely if ever used in practice. I think it's a bit misleading.

I guess that I'll rewrite this article and put pefrormance graph for different algorithms. But it would be nice if someone reviewed the article afterwards (coz I'll add my own work on SW too) to eliminate potential NPOV.

Crenshaw (talk) —Preceding undated comment added 22:20, 30 January 2010 (UTC).


 * Agree. The "754x speedup" is meaningless - compared to what? Ketil (talk) 21:32, 11 December 2011 (UTC)

Example needs to totally be redone
As a commenter above points out, this example is not good. The example in the article currently results in a local alignment which is also a global alignment. As the whole point of Smith-Waterman is to find the best local alignment, the example should find the best alignment of the best matching substring(s) in the 2 strings. I don't have time to fix this right now, but wanted point that out. --Rajah (talk) 03:46, 23 March 2010 (UTC)


 * I'd also like a better example, or perhaps two examples, a simple one, and a longer one. I am having a hard time understanding how the algorithm works. What I have gathered so far is that it will build a matrix, and this matrix has the optimal alignment. But from this part, how to get the optimal sequence, is not completely clear to me. 2A02:8388:1641:8D00:BE5F:F4FF:FECD:7CB2 (talk) 15:44, 28 May 2016 (UTC)

The example was totally redone. An animated image was also added to demonstrate the algorithm. Please let me know if it helps. Yz cs5160 (talk) 07:39, 8 January 2017 (UTC)

Acceleration
It seems to me that "shows considerable promise for both speed of the algorithm and a more accessible programming model" is just marketing with no numbers and no substance.

en-dashes
Please: As you see, and en-dash rather than a hyphen is used in the title of this article, and of course should also be used in the body of the article. I just cleaned up a lot of that. Similarly, "Needleman-Wunsch" is wrong and "Needleman–Wunsch" is right. (Also in ranges of pages, years, etc., e.g.: pp. 305–42. See WP:MOS. Michael Hardy (talk) 17:25, 7 August 2012 (UTC)

Dead reference
The first reference to the paper in PDF seems to be dead, it looks like the university blocked it. There is another copy in here http://www.cmb.usc.edu/papers/msw_papers/msw-042.pdf but I'm not sure if putting that (or for that matter the link to the PDF that was here before) would involve copyright infringement. Can somebody enlighten me about this? Jorgeng87 (talk) 19:21, 13 November 2012 (UTC)

External links modified
Hello fellow Wikipedians,

I have just added archive links to 1 one external link on Smith–Waterman algorithm. Please take a moment to review my edit. If necessary, add after the link to keep me from modifying it. Alternatively, you can add to keep me off the page altogether. I made the following changes:
 * Added archive https://web.archive.org/20080513152726/http://www.genomequest.com/contact-bioinformatics-ht.html to http://www.genomequest.com/contact-bioinformatics-ht.html

When you have finished reviewing my changes, please set the checked parameter below to true to let others know.

Cheers.—cyberbot II  Talk to my owner :Online 23:37, 12 February 2016 (UTC)

Alignment construction explanation needs improvement
"Then, go backwards to one of positions (i − 1,j), (i, j − 1), and (i − 1, j − 1) depending on the direction of movement used to construct the matrix. " This doesn't explain which direction the movement should go. I assume that the movement is diagonal if the value in the cell comes from a match. I'm trying to figure out this algorithm for some work I'm doing so if I figure it out, I'll contribute a clearer explanation. Stochphon (talk) 05:35, 8 October 2016 (UTC)


 * I updated the explanation. The movement depends on the source of each score, i.e., if match/mismatch from $$H_{i-1,j-1}$$ gives the highest score, then go diagonal; if adding gap(s) horizontally or vertically gives the highest score, then go left or up; if none of them are positive, then this cell gets 0 and has no source, and traceback will end here if this cell is encountered. Note that:
 * Alternative paths are generated if a cell being traced back has more than one source.
 * Scores come not necessarily from cells directly adjacent to them ($$H_{i-1,j}$$ and $$H_{i,j-1}$$). Instead, the gaps may come directly from all positions on that row or column that are left ($$H_{i,j-l}$$) or up ($$H_{i-k,j}$$) to the cell being scored.
 * Yz cs5160 (talk) 08:12, 8 January 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 5 external links on Smith–Waterman algorithm. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20080705191338/http://ft.ornl.gov/~olaf/pubs/OlafRSSI2July07.pdf to http://ft.ornl.gov/~olaf/pubs/OlafRSSI2July07.pdf
 * Added archive https://web.archive.org/web/20080705191314/http://ft.ornl.gov/~olaf/pubs/CUG07Olaf17M07.pdf to http://ft.ornl.gov/~olaf/pubs/CUG07Olaf17M07.pdf
 * Added archive https://web.archive.org/web/20110720080855/http://ft.ornl.gov/~olaf/pubs/RSSIOlafDave.pdf to http://ft.ornl.gov/~olaf/pubs/RSSIOlafDave.pdf
 * Corrected formatting/usage for http://www.genomequest.com/contact-bioinformatics-ht.html
 * Added archive https://web.archive.org/web/20070811101052/http://www.clccell.com/download.html to http://www.clccell.com/download.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 09:37, 4 October 2017 (UTC)

Implementation
The implementation section needs to be redone (maybe merged with the section "Accelerated versions"). Unfortunately I'm not familiar with those implementations. I would appreciate it if someone can help rewrite those two sections. It'll help people better use the algorithm. Yz cs5160 (talk) 18:17, 23 October 2017 (UTC)

Running time: quadratic or qubic??
The intro paragraph says "cubic", which seems to match the pseudocode. The box on the right says $$O(mn)$$. The history mentions cubic again, but points out to *different* algorithms that run in quadratic time (Gotoh and Altschul et al.). Finally, under "Gap Penalty" one can see that the latter algorithms only solve an (important) special case of the problem, and the former doesn't even necessarily solve it! Aviad.rubinstein (talk) 21:28, 7 September 2018 (UTC)