Talk:Timsort

Needs algorithms and code examples
Ideally in several programming languages so that its as accessible as possible.

This article is a stub. —Preceding unsigned comment added by Veganfanatic (talk • contribs) 20:09, 12 August 2010 (UTC)

As of algorithm it can be hard, as timsort have many additional tricks, like galloping mode when merging, two-phase marging, it also switches beetwen few modes adaptativly. I am looking at it as I like merge-sort, and was looking at timsort as practical and fast mergesort. I will keep code as small as possible. —Preceding unsigned comment added by 149.156.82.207 (talk) 15:22, 7 September 2010 (UTC)

Quality Problems
I was reading this page, and it just struck me as being very poor in an encyclopædic sense. Here's the two key things that struck me: Right now, it's not useful as a source for practitioners or the general public, and nor does it indicate where appropriate material exists. –Donal Fellows (talk) 14:16, 9 August 2010 (UTC)
 * This page does not actually present the algorithm, nor does any of its links; all that exists is either discussion about it or a link to a patch to Java that includes an implementation of it, and that's both huge and unclear due to the large quantity of extraneous information (and the patch does a lot of other things too). You would not want to have to implement the algorithm in some other language from that starting point (which is what I'll define as the key requirement for a “practitioner-grade” algorithm description).
 * There wrong notation is used for describing upper-bound complexities (in both space and time); for example, upper bounds should use $$\Omicron(n\log n)$$ rather than $$\Theta(n\log n)$$ (after all, it's an upper bound).


 * These issues have been fixed. Ztothefifth (talk) 21:21, 23 December 2010 (UTC)
 * Indeed. I am very impressed with the article now, with all its descriptions and illustrations. Well done! :D Jesse V. (talk) 18:02, 28 July 2012 (UTC)
 * Still too bad quality. At least the Merge memory section needs rewriting. The image for the stack and runs X, Y, Z is incorrect! It shows X as the last element on the stack (dots come after Z) whereas, per Tim, Z should be the last element (dots should come before X). From original explanation (using letters A, B and C): "A, B and C are the lengths of the three righmost not-yet merged slices". I.e. rightmost, so dots are to the left of them, i.e. BEFORE A, B, C (or X, Y, Z).

Runtime problem?
I noticed that the worst case runtime is listed as O(n), but the average case is O(NlogN). I'm not an expert, but shouldn't the worst case be equal to or greater than the average case? --99.187.241.11 (talk) 00:54, 30 November 2010 (UTC)
 * Looks like it had been noticed by the person above me, sorry for double-posting this. --99.187.241.11 (talk) 01:20, 30 November 2010 (UTC)

Image
Timsort needs a better image. Right now, it looks exactly like insertion sort. Ztothefifth (talk) 18:26, 24 December 2010 (UTC)

sort what, now?

 * Depending on the size of the run, different optimization techniques may be used to sort the particular run. Simply put, the efficiency of a sorting technique depends heavily on the size of a run. To take advantage of the regularities in data, Timsort works on natural run lengths. A natural run length is the length of a sub-array which is already ordered, that is a natural run. Timsort boasts of high efficiency as it utilizes an optimum technique for each different type of run; thus explaining why it is termed as adaptive sorting.

In the first sentence, run must mean array the second time (a run is that which need not be sorted), but does it also mean array the first time? That is, does optimization depend most on the length of the array, the length of the runs, or the number of runs?

This paragraph wanders and needs rewriting anyway. I may get to it soon. —Tamfang (talk) 23:59, 26 October 2011 (UTC)

Edited the paragraph and tried to remove any ambiguity. - Charanya.vish (talk) 09:24, 30 October 2011 (UTC)

lg(n!)
In the Performance section, there's the following statement: "In case of random data, there are no partially ordered subarrays to take advantage of. In this case, timsort reaches the theoretical limit, which is lg(n!)." However, the citations state that the worst case scenario is O(n log n). Which statement is correct? Additionally, there's the statement, "In the case where the data contains many duplicates, it is found that timsort runs faster on certain platforms while slower on others." This is meaningless, surely? All code runs faster on certain platforms and slower on other platforms. Perhaps I'm misreading the statement? --Yamla (talk) 15:31, 9 November 2011 (UTC)


 * lg(n!) is in O(n log n). I suppose the article could say this. rspεεr (talk) 10:38, 16 November 2011 (UTC)

We have edited the sentence and also cited a source. What we were trying to say is that timsort gets close to the theoretical best, for the average case, which is lg(n!). We also apologize for the second sentence about the platforms. We have removed it. -- Snehasapte (talk) 04:47, 16 November 2011 (UTC)

Why use Omega for best-case performance?
Why does the table use Ω for its complexity classes, such as describing the best-case performance of Timsort as Ω(n)? That says it takes "linear time or longer" in the best case. I find this unnecessarily vague. Not only may readers be unfamiliar with Ω, it's making a weaker statement than required. The best case simply takes linear time. The entire algorithm takes linear time or longer. But this is a row in the table that says "best case", so we're talking about the best case.

So, shouldn't we be maximally specific and use Theta for all of the table entries?

I realize that Donal Fellows, above, raised the objection that probably caused the table to be written this way. I disagree with him. Talking about a lower bound does not obligate us to use Ω in a confusing and redundant way, and likewise talking about an upper bound does not obligate us to use O, when Θ is both specific and correct.

rspεεr (talk) 10:48, 16 November 2011 (UTC)

Timsort bug
Apparently Timsort has a bug. It should be mentioned somewhere. Proving that Android’s, Java’s and Python’s sorting algorithm is broken (and showing how to fix it) 109.118.25.252 (talk) 08:10, 25 February 2015 (UTC)
 * From that page, "we discovered a bug in TimSort’s implementation". That is, the bug appears to be in the implementation, not in the algorithm itself. That said, I think it affects the section, "Merge memory", and regardless, a bug in two such prolific implementations is probably worth mentioning, even if the bug is only in the implementation rather than the algorithm. --Yamla (talk) 15:36, 25 February 2015 (UTC)
 * It is indeed a bug in the **algorithm**. You may argue that there is still a thing called "timsort", but it should be noted that it doesn't sort any arbitrary list. The fixed algorithm should be treated separately to avoid confusion. --Ysangkok (talk) 13:18, 26 February 2015 (UTC)

It should definitely be mentioned though I think the interesting part about it is that it was discovered by using formal verification software so on purely theoretical grounds, not because someone stumbled over it. In fact, in Python the bug can not be demonstrated on any existing computer because no current system has enough memory to fit the necessary input data (in Java it is possible due to a slight implementation difference). As the page says: "There are a few observations that can be drawn from this exercise beyond the immediate issue of the bug.", in particular the first observation. Whether you consider it implementation or algorithm, IMO, is in the eye of the beholder.Wolma (talk) 21:56, 25 February 2015 (UTC)

I am undoing the last update to this section, which was:
 * On February 2015 a bug in Java and CPython implementation was discovered. The implementation only inspected the invariant for the last 3 runs which is not sufficient for any arbitrary array. However at the time there was no machine in existence with enough memory to hold a large enough array for a contrived input to trigger an overflow of the pending-runs stack.

since it is too inaccurate to be of value. It is not just CPython and Java affected, but any language that relies on the original CPython implementation. Also, as said above, the overflow error could well be triggered in the Java, but not the CPython implementation. The given link provides an example of how to do this in Java. It should also be mentioned that the bug has been fixed now in CPython and Java (albeit very differently -> discuss ??).Wolma (talk) 09:35, 2 March 2015 (UTC)

Stable if strictly descending
I removed this sentence: "This method is stable if the elements are present in strictly descending order." It is meaningless (technically, it has meaning but is trivially true). A sorting method is stable if elements that compare equal are left in the same place. But if the elements are in strictly descending order then none of them compare equal! Quietbritishjim (talk) 07:38, 25 March 2015 (UTC)

Comparison table
I trimmed the Performance section, removing the unsourced comparison table which I considered undue:


 * No other sorting algorithm article has such a table, except Sorting algorithm, where it belongs. A different selection of algorithms could have been made to show different results.
 * The best case time for both quicksort and introsort is actually O(1) if properly engineered, but the best case is different: it occurs when there are O(1) distinct elements in the input.
 * "the space complexity of both Timsort and merge sort can be reduced to log n at the cost of speed" would need a source. I've never seen a presentation of in-place Timsort.

There's still a problem: Alex Martelli is cited for Timsort doing "far fewer than Θ(n log n) comparisons". That's vague, and what Martelli says is even vaguer. Q VVERTYVS (hm?) 15:23, 26 November 2015 (UTC)

India Education Program course assignment
This article was the subject of an educational assignment at College of Engineering, Pune supported by Wikipedia Ambassadors through the India Education Program&#32;during the 2011 Q3 term.&#32;Further details are available on the course page.

The above message was substituted from by PrimeBOT (talk) on 20:12, 1 February 2023 (UTC)

Powersort
I believe this page should mention powersort (used in Python since 3.11, better theoretical guarantees), either as a variant of timsort described on this page, or as a different algorithm with its own page. Mglisse (talk) 23:53, 1 January 2024 (UTC)

Algorithm or pseudocode
A concise statement of the algorithm and/or pseudocode would help the article a lot. Bubba73 You talkin' to me? 02:05, 4 February 2024 (UTC)

Merge direction clarification
I've been using the information in this article to implement my own Timsort (yeah, probably not the best source, but it's unironically better than anything else I've found). I saw the merge direction section and was confused. My first thought was, "I get that this can be done, but why would I want to do that?" Eventually, I found my answer while I was working on the temporary space. It's needed when the second run is used as a temporary buffer. The first run would still be needed and thus, cannot be overwritten. Merging backwards would solve this dilemma.

If my reasoning is correct, I'd like to see that updated in the article. I'd update it myself, but I think that might be original research. Botaeditor (talk) 21:15, 22 June 2024 (UTC)