Talk:Radix sort

Possible errors
Maybe I am missing something here, but in the C-based iterative implementation, the line  looks like a bug to me. (If all the integers in the array were negative, the loop would never execute, as m / exp would be negative.) I think we need  here, but I won't change it in case I'm wrong.

quicksort is not n (log n), its n^2, I fixed this "- quicksort utilizing a random pivot can be shown to be (Big Theta)(n*log n) as instead of one decision tree for the program there exist many. As one makes n arbitrarily large, the probability of picking a tree which perfectly matches the sortedness of the data set such that the pivot is always 'worst case' (and thus yields a n^2 run) becomes increasingly small. Since when we perform algorithmic analysis we are looking at the limit as n goes to infinity we can state that this probability goes to 0 and thus that quicksort is indeed (Big Theta)(n*(log n))." Nonsense! You must state in advance whether you like to discuss/consider/analyze average, or worst case, or what-ever behaviour!! BTW: When talking Radix-sort one should also mention its generalization, Hash-sort, which is on average asymptotically the fastest known sorting method, O(n), due to the fact that it uses knowledge about the to-be-expected distribution of the keys. With a good chosen hash-function it's run time is C*n + o(n). Enlighting example: sort N uniform and independend distributed random numbers in the halv-open intervall [0,1[ --- or even more instructiv: sort a permutation of the first N natural numbers! Last can be done optimally because of sufficient knowledge of the to-be-expected data.

This character seems to be off-base on a number of counts. Worst case is the conventional way of discussing algorithms.~JMK

std::sort is based on quicksort and heapsort, and is O(n(log(n)). Stop knocking quicksort; yes, it's really O(n*n), but std::sort fixes that problem, and for real problems even raw quicksort is theta(nlog(n)).

You are incorrect. Not all std implementations use introsort, and in fact introsort is a rather sophisticated algorithm that is still not widely known. Further, quicksort doesn't generally degrade all the way to n*n but it can often perform somewhat worse than N Log(N). I love quicksort dearly, but in many cases, or in most, a well implemented radix is strictly better time-wise, and strictly worse space-wise. ~JMK

Hashsort is O(2k(n)), which is not O(n) for large k. When you compare realistic sorting algorithms that involve radix or hash-based sorting, you must assume both large n and large k.  Bucketsort obviously is O(n) when log(n) > k, but it's impractical for k >> log(n), which a major set of problems (some would argue most problems). Hashsort is slower than std::sort for large k with collisions. Now, if you would like a comparison of hashsort vs. Spreadsort or some other practical general algorithm, I'd be happy to make one.Cuberoot31 00:07, 2 December 2006 (UTC)

This is also wrong. Radix can be implemented with histograms to make it much faster than your imagined "worst" case in practice and in theory. Radix settles snugly at around 8 for its K even with crude optimizations. This is both worst and best case, and is regarding a floating point implementation. Please pay more attention to external research. ~JMK

Radixsorting with histograms is O(n(k/C)), where C is usually around 8. This is quite fast for small k (say 32), but as k grows, the Radixsort becomes slower. std::sort has used introsort for years, and you might be surprised at how well it compares with Radixsort, when you add in the runtime overhead of memory allocation and random access that you can't avoid with a radix sort. In my testing, std::sort is roughly 3 times as fast as quicksort, depending on the compiler and OS.Cuberoot31 (talk) 21:41, 20 May 2008 (UTC)

Huh! This test only shows there is no advantage over quicksort, really, because a constant multiplier of 3 means std::sort=O(quicksort) Alex 10:52, 19 Aug 2009 (UTC) —Preceding unsigned comment added by 193.9.13.137 (talk)

- Does the sample implementation given depend on the endianness used by the computer? I'm not quite sure what exactly the >> operator does. --AxelBoldt

The value of x >> y is floor(x / 2y), provided x and y are non-negative, and y is not too big. So it doesn't depend on endianness in this case. But if x is negative, then the value is undefined, so this is a problem. Another problem with the code in the article is that it assumes that CHAR_BIT is 8. --Zundark, 2001 Dec 11

So maybe we should make them unsigned ints to take care of the sign issue? Is there a clean way to deal with the CHAR_BIT thingy?

I think we should probably have an implementation that deals with keys that are (arbitrary l ength) strings, since that seems to be the more interesting case. --AxelBoldt

I changed the int array to unsigned. The CHAR_BIT thing is more problematic, so I just added a note pointing it out. On most compilers for desktop computers CHAR_BIT is 8, so this isn't too bad a restriction. --Zundark, 2001 Dec 11

Having looked at it more closely, I see that it has more serious problems than those mentioned above, so I've moved it here until it gets fixed. --Zundark, 2001 Dec 11

Sample radix sort of an array of integers

This sample implementation is written in the C programming language.

/* * This implementation sorts one byte at a time, so 32 bit integers get sorted in 4 passes. * It uses a simplified bucket sort to sort on each byte, which requires O(n) memory. * It assumes that CHAR_BIT is 8. */

struct listnode                              /* a simple linked list data structure */ {                                            /* used for the bucket sort */ unsigned val; struct listnode * next; };

void sort(unsigned * array, int length) {  int i, j;   unsigned char key; struct listnode nodes[length];            /* an array of preallocated listnodes */ struct listnode * buckets[0x100];         /* the buckets for the bucket sort */ memset(buckets, 0, 0x100 * sizeof(struct listnode *)); /* zero the memory */ for(i = 0; i < sizeof(unsigned); i++)     /* iterate over the bytes in an unsigned int */ {     for(j = 0; j < length; j++)             /* iterate over the array */ {	 key = (char)((array[j] >> (i * 8)) & 0xFF); /* grab the byte */

nodes[j].val = array[j];            /* put the byte in the bucket */ nodes[j].next = buckets[key]; buckets[key] = &(nodes[j]); }     j = length - 1; for(key = 0xFF; key >= 0; key--)       /* loop over the buckets */ while(buckets[key] != 0) {          array[j] = buckets[key]->val;      /* pull the values out of the buckets */ buckets[key] = buckets[key]->next; /* and put them back in the array */ j--; }  } }

I have written FORTRAN examples of both the recursive forward radix sort and the nonrecursive reverse radix sort. Both examples sort pointers to fixed length character string arrays. Should I add these examples to the article, or are non-C examples considered obsolete ? StuRat 09:47, 5 August 2005 (UTC)

- Don't see why they would be! Not everyone uses C after all

Is there anyone to write pascal source of radix sort.

Style Improvement Request
In the section "Incremental Trie-based Radix sort", there is a sentence with this as its subject: "The sorted lists of other non-incremental radix sorting algorithms that sort all of the keys in a batch". I hope its obvious that this needs to be improved. I would do it myself, but I have no idea what it's trying to say. Can someone please attend to this? Thanks. Danielx 16:15, 13 August 2006 (UTC)

OK. I revised that sentence. -Ed Lee 24.12.11.31 03:04, 15 October 2006 (UTC)

Most of the article was already separated into a discussion of least significant digit radix sorts versus most significant digit radix sorts. One of the earlier contributors was probably only familiar with least significant digit radix sorts, which is why the article originally started off sounding as if least significant digit radix sorts were the only kind that existed. I've more explicitly separated the discussion of least significant digit and most significant digit radix sorts, reformatted the queue example to be easier to read, and made some other changes.

I think that the mention of, "In practice, O(n) time is only observed when the number of items is much greater than the size of the key space," is in reference to the observed execution time of an LSD radix sort processing fixed-length keys relative to the execution time of an O(n*log(n)) time sorting algorithm, because it might not be obvious from just looking at the asymptotic notation that a radix sort might actually be slower than a comparison-based sorting algorithm for a sufficiently small amount of input data. So, I added some discussion of that.

I've communicated with people who insist on defining the length of a linear quantity of data as n*log(n), meaning n keys with log(n) bits per key, so that the time complexity of a radix sort would end up looking like O(n*log(n)), and the time complexity of some comparison-based sorting algorithms would end up looking like O(n*log(n)*log(n*log(n))), but that notational convention is going to be confusing, so I did not want to go into a discussion of notational preferences.

Someone else who has relevant expertise in the area of in-place radix sorting will have to contribute on that topic, since I have not studied that topic and frankly can't see how that would work without allocating additional memory beyond the original memory required to store the input data. The computer would have to perform a lot of exchanges to do a radix sort in place, or perhaps I am misunderstanding what, "in place," means in this context.

No one is omniscient, so I encourage you to contribute what truths you know to the Wikipedia.

-Ed Lee 24.12.11.31 03:41, 15 October 2006 (UTC)

I've worked with a derivative algorithm Spreadsort for some time. Some fixes: Radixsort is O(n*k), or more precisely O(n*(k/s)*2s) for a MSD Radixsort, where s is the size of chunk processed at a time in bits. The main sorting page for that is wrong, and I'm going to fix it. By contrast, comparisons can be quicker than O(k), they are often theta(1). They are only O(k) if usually done between items with very similar keys. Another issue with the assumption that a comparison is just as slow as checking small pieces of a key multiple times is that comparisons are usually running sequentially along a single strip of memory (which is fast), versus looking at pieces of larger chunks in an array multiple times (which is slower). The biggest performance problem with MSD radixsort is the overhead of processing nearly empty bins (say you have 2 items in a bin, and they match to all but the last bit; you could be processing 256 bins each iteration k/s times instead of doing 1 comparison), and the memory overhead of creating bins. LSD's big problem is with long keys, especially strings. All the memory allocation/deallocation is really slow relative to other operations, on top of having to sort items of different length separately.

There is nothing fundamental stopping an "in-place" MSD radixsort as stability is not required; the memory overhead of such an implementation is proportional to 2s. Spreadsort can be thought of as a modified MSD radixsort that switches to comparison-based sorting where radixsort is slow and has cut the problem down.Cuberoot31 08:32, 9 November 2006 (UTC)

I have thought about it a little more, and MSD radix sort doesn't have much going for it when k > log(n). When k < log(n) it can be in-place and thus use less memory than LSD, but MSD's best performance when n < 2k requires data that splits evenly into a single item per bin. Otherwise, if you have 2 items per bin, and they only differ on the last bit, you will recurse through all 2s bins multiple times until you hit the end of the radix, where a comparison-based algorithm would do a single comparison on those 2 and be done. It is important not to forget the overhead of creating, filling, and deleting bins (or whole copies of arrays) in analyzing the relative performance of radix sorts. I have found in practice that memory operations are much slower than all but the slowest comparisons (string < being one slow exception) on a system that has been up long enough to fragment its memory, so it is important to keep this in mind when designing and analyzing radix-based sorts. This is a major reason why radix-based sorts aren't more popular. If you want to use MSD radix sort for its ability to run in parallel, why not use MSD for the first (or first few) iterations to split up the problem into smaller bins, then share those bins out across multiple systems, and have them use LSD radix sort (or Spreadsort). MSD radix sort has poor performance on small bins, which you will eventually get if you recurse long enough, so much so that I consider it an impractical algorithm for general application on its own. If I'm missing something, I'd welcome some more comment. I believe that Spreadsort solves this weakness in MSD radix sort, which is one of the reasons why I developed it. I would welcome some real performance data comparing MSD radix sort versus LSD, std::sort, or Spreadsort on a single processor where n << 2k. If I don't get a reply soon (a week), it sounds like this issue has been resolved, and I would like to take down the dispute tag.Cuberoot31 16:49, 10 November 2006 (UTC)

Cuberoot31's discussion of MSD radix sort seems limited to an array-based MSD radix sort. A trie-based radix sort is a kind of MSD radix sort, and it performs best in terms of lowest memory usage and fastest execution speed when the keys are redundant and sparsely distributed in the key space. The recursive function calls of a more traditional array-based MSD radix sort analogously trace the paths of a trie in stack space in a depth-first traversal. I wrote BRADSORT, which is a kind of trie-based radix sort, and a link to its source code is in the reference section of the Wikipedia radix sort article. You can compile BRADSORT with a C compiler and compare execution times with different quantities and distributions of input data.

Regarding Rspeer's change on 04:53, 2 December 2006, where Rspeer comments, "take out misinformation about big-O notation. You never assume constant memory," I disagree. In the analysis of the time complexity of sorting algorithms, you certainly do assume constant-bounded memory or else you lose all connection to physical reality, as memory accesses no longer happen in constant-bounded time as the size of the memory grows due to the speed of light limitation.

-Ed Lee 24.12.11.31 06:05, 7 December 2006 (UTC)

I agree, the trie-based sorting does improve the worst-case memory usage and runtime for an MSD radix sort. So what do you claim as the worst, average, and best-case performance of your trie-based MSD radix sort? How does it handle the problem of (many groups of) a few items with only slightly different key values of the form: xxxxxxxxxxxxx xxxxxxxxxxxxy xxxxxyxxxxxxx xxxxxxxxxyxxx xxxxxxxxxxyxx in an efficient manner? I don't see how any radix sort can be faster than theta(nk), and in terms of worst-case, a trie-based sort can still have significant overhead due to storing possible next steps that may not be filled. As noted in the link, this can be six times the size of the input data; I'm not sure why it couldn't be higher, but I'd have to delve more into your implementation to see it. So radix-based sorts are good when the key value is short. For long keys, where k >> log(n), comparison-based sorting has the advantage of being able to look along a key linearly through memory (which is fast) as opposed to in a fragmented fashion, piece-by-piece as radix sorts do, and comparison-based sorts can be done without significant additional memory, which can be very slow to allocate and deallocate relative to comparisons. The disadvantage of conventional comparison-based algorithms in this situation is that they still need log(n) comparisons, and if the keys don't differ until near the end, they need to look (linearly) through k bits each comparison. I'll do some runtime comparisons when I get a chance; I believe trie-based radixsorting can have excellent performance under some conditions.Cuberoot31 20:47, 7 December 2006 (UTC)

For a trie-based radix sort, the amount of time spent processing each unit length of input data has a constant upper bound, so the time complexity is O(N), where N is the total length of the input data, whether the strings are fixed in length or variable in length.

BRADSORT v1.50 deals with duplicate strings by storing one instance of a string and a count of the number of times that each string appears to indicate each string's frequency. So, if there are duplicates of a string in the input data which occupy more memory space than the amount of memory required to store a count and one instance of the string, then data compression occurs.

For unique strings, BRADSORT executes the most quickly and uses the least amount of memory space for representing strings as paths in the trie when the unique prefixes are short, such as:

axxxxxxx... bxxxxxxxx... cxxxxxxxx... dxxxxxxxxx...

It does not matter how long the individual strings are if the unique prefixes are relatively short, because the trie only needs to represent the unique prefixes of the strings as paths in the trie in order to distinguish the strings from each other. The worst-case trie space usage occurs when the entire string or almost the entire string defines a unique string prefix, such as:

xxxxxxxxxa xxxxxxxxxb xxxxxxxxxc xxxxxxxxxd

The worst-case space usage of BRADSORT and the longest sorting time occurs in this situation, when strings are identical to each other except for the very last bit or last few bits. The memory space required to store the trie structure can be greater than 6 times the size of the input data. What the exact amount of space is in the worst case depends on the C compiler implementation and how much space it allocates for a node structure. For BRADSORT v1.50, each node structure below the root node corresponds to one bit of a string. For 2 strings that are nearly identical and overlap as paths in the trie except for the very last bit, the worst-case space required to store the trie structure would be approximately:

sizeof(node)*total_number_of_bits_in_strings/number_of_strings + sizeof(node)*number_of_strings

As I mentioned in the radix sort article, trie-based radix sorts can be used for incremental sorting. If you already have a sorted list and need to insert one new string into it, then you might have to re-sort the entire list with a batch sorting algorithm, which can require more time than inserting a new string into a trie.

Other programs use different trie structures and different methods for minimizing space usage. For example, the people who wrote, "burstsort," avoid trie branching to some extent in order to increase the locality of data references for faster execution and to minimize space usage.

-Ed Lee 24.12.11.31 01:51, 8 December 2006 (UTC)

I agree that tries are useful for reusable data structures, and burstsort is another such algorithm I've been aware of for some time; I read up on it when looking at algorithms with some similarity to Spreadsort (I'm curious as to how burstsort and BRADSORT compare in performance). Many sorting algorithms have a trie equivalent, but for many problems an array is more appropriate; it depends on the application. In particular, rebuilding key values can add a large enough overhead to readout from the trie that lookups on a trie can be slower than even a binary search! (depending on the trie, data type, and whether people wish to look at the key, vs a value stored depending on the key) I'd like to clarify on sorting performance: your O(N) is equivalent to O(nk). Your s for the performance summaries I put on the sorting algorithm page sounds like it is 2, so your worst-case is still basically O(nk). What is different is that since BRADSORT (and other conventional trie approaches such as burstsort) is allocating and deallocating memory, it has a significant number of memory operations, which are generally much slower than processing operations on a per-operation basis (sometimes requiring off-processor communication). Also, O(nlog(n)) comparison-based approaches are really O(nlog(n)) comparisons and slightly less swaps, where swaps can be constant time if pointers are used, and the comparisons are O(k), but comparison operations reading linearly through memory are very fast on conventional modern computers. So it's really O(nk) slow operations for radix sorting versus O(nlog(n)) times (1 slow operation + 1 usually fast O(k) operation) for comparison-based sorting. I found this was true in developing Spreadsort, where comparisons were faster than memory allocation and swaps, which I needed to optimize aggressively. std::sort is a good basis for performance comparison because it is quite fast on modern processors.Cuberoot31 15:17, 9 December 2006 (UTC)

Earlier versions of BRADSORT did rebuild key values from paths in the trie, because the keys were entirely represented as paths in the trie, but BRADSORT v1.50 stores keys separately from the trie structure. Each individual key in BRADSORT v1.50 occupies contiguous memory space to represent the key in its entirety, so no rebuilding of a key is necessary. The trie structure in BRADSORT v1.50 acts more as a hierarchical index to the keys rather than as a complete representation of the keys. Reading out all of the keys with BRADSORT v1.50 is as fast as traversing a sorted linked list, because BRADSORT v1.50 uses a circularly threaded trie.

I don't think that we have a material difference of opinion on what notation to use for expressing the time complexity. If k has a constant upper bound in O(nk), then the expression can also be written as O(n) by the definition of O. I prefer the more concise notation. We can theorize all that we want about the performance of sorting algorithms, but the bottom line is the performance that is achieved in the real world on real-world applications.

-Ed Lee 24.12.11.31 20:34, 9 December 2006 (UTC)

I do not accept the assumption that k has a constant upper bound; strings can be of infinite length and I've seen real-world sorting problems with keys over 1kB in size (compression is usually appropriate in such cases, but that's a side issue). I think it is appropriate to compare worst-case performance for large n AND large k, otherwise the relative performance of radix-based sorting and comparison-based sorting is difficult to understand. Let's try to keep the article accurate with only the assumptions necessary for a reasonable discussion, and in terms of real-world performance, radix-based algorithms do poorly relative to comparison-based algorithms for large k.Cuberoot31 04:26, 10 December 2006 (UTC)

There is no real computer system with an infinite amount of memory capable of storing a string of infinite length. Debating about which sorting algorithm is better in a situation where there are infinitely long keys is purely academic. If you define the length of the input data to be nk, the number of keys times the length of a key, then it is unsurprising that a radix sort will take O(nk) time to complete. An optimal comparison-based sorting algorithm will take O(nk*log(n)) time to complete with the same data.

I do not accept the generalization that radix-based algorithms perform, "poorly," relative to comparison-based algorithms for large key lengths, k. For incremental sorting, a trie-based radix sort can be faster than a batch sorting algorithm. For sorting many duplicate keys, a trie-based radix sort which represents multiple instances of a key as a count and one instance of the key can use less memory due to the resulting data compression and be faster than other sorting algorithms due to a greater locality of data references and less need or no need for virtual memory which is paged from a hard drive. The same conditions under which a comparison can be done in constant time benefit a trie-based radix sort, when the unique key prefixes are short relative to the lengths of the keys. The same conditions under which a comparison of two strings takes the longest time, when the two strings are identical or nearly identical up to the last bit, also make a trie-based radix sort take the longest time to process the data. Comparison-based sorting algorithms can be faster than radix sorting algorithms when a sufficiently small number of keys are processed. It is conceivable that a hybrid radix/comparison sort may be faster than a pure radix sort when the input data is partitioned into small enough groups by the radix sorting portion that can then be processed by a comparison-based sorting algorithm. -Ed Lee 24.12.11.31 03:39, 11 December 2006 (UTC)

It is correct that comparison-based algorithms will take O(nk*log(n)) worst-case time because of the time for comparisons, but since the comparisons run linearly through memory, a processor can run through even long k quickly, enough to make up for the log(n) for even large n (billions of items), while O(nk) separate operations require random accesses, which are very slow in comparison, and even worse, most radix-based algorithms require allocating and deleting memory, which is horribly slow (multiple orders of magnitude) relative to running linearly though k bytes in order. If modern processors weren't pipelined, this might be different, but pipelining is now a standard optimization in hardware design. Tries are basically equivalent to heaps in terms of being able to incrementally sort; this is a useful capability, I agree, but does not differentiate radix-based sorting from comparison-based sorting. To attain compression, you need overhead bytes, and for some types of data, there will be an expansion. As far as I know, this is a capability only of radix-based algorithms, and when applied it makes them specialized in their utility to problems that can be compressed; the overhead of compression is prohibitive otherwise. So this is a useful capability for a subset of problems, but not all problems. Spreadsort is an example of a hybrid algorithm, as is the more specialized burstsort. These approaches are significantly faster than Quicksort alone, but when compared to std::sort on random data with large k, it becomes a much closer competition (in my testing Spreadsort was still faster than std::sort by 20-30% in my tests). I recommend comparing BRADSORT to std::sort with such a situation. The difficult part is creating a large n, large k testcase. One conceivable way to do this is to write 100MB random bytes in a file (0-255, use rand % 256), and count every 0 as an end of string. Do not allow writing two zeros in a row to prevent getting a significant number of null strings. Keep track of both runtime and memory usage. I will try to run this test myself when I next get sufficient time.Cuberoot31 20:08, 11 December 2006 (UTC)

With the complicated and varied cache strategies employed by different microprocessors, it's hard to tell in advance how efficient or inefficient random memory accesses are going to be.

Regarding deletions, BRADSORT does not perform any deletions of nodes until the algorithm is finished, although incomplete deletion functions are in the source code. I never got around to updating the deletion functions for the new threaded trie structure in BRADSORT v1.50.

For your test of BRADSORT, you might have to increase the size of the input buffer to be large enough to hold the longest input string that you want:

BRADSORT v1.50 defaults to using strings that are terminated by the '\n' character, ASCII value 10. I recommend that you put a '\n' character in s[MAXSLEN] in the main function in case the input buffer contains no newline characters. Also, change the "MAXSLEN+1" expression in the while to "MAXSLEN":

A more practical application of BRADSORT would be to sort words or count the number of times that each word occurs in some file, such as:

The tr program is available on Unix and Linux systems. This particular example does not deal with words that contain apostrophes or hyphenated words. What it should do is replace all non-alphabetic characters from a text file with the '\n', newline, line feed, character and then squeeze consecutive instances of the '\n' character into one character so that words are separated from each other on separate lines. The output then gets piped to BRADSORT with the "f" option which will display the frequency of each word next to each word in the output file. The frequency is represented as a count of the number of times that each word appears in the input.

-Ed Lee 24.12.11.31 02:58, 12 December 2006 (UTC)

Constant memory
Ed, I really don't believe it's useful to assume constant memory when talking about the asymptotic behavior of an algorithm. That would make every algorithm run in O(1) space and O(1) time, if you think it through. If you disagree, point me to an algorithms textbook that assumes constant memory.

The only thing close to that that I can think of is something you may have touched on: sometimes you do assume each memory access happens in constant time, just to simplify things, even though it's not true.  r speer  / ɹəəds ɹ  16:17, 9 December 2006 (UTC)

O notation was originally used in the study of pure math, where infinite domain intervals are commonly considered, but O is still useful in the study of the relative performance of algorithms within a finite interval. No definition of O that I have seen restricts its use to infinite intervals, just sufficiently large intervals:

"Suppose f and g are two real-valued functions defined on real numbers. We write f=O(g) when there are constants c and N such that f(n)&le;cg(n) for all n&ge;N .... It means that when n is large enough, some constant multiple of g(n) is an upper bound on the size of f(n)." (Christopher J. Van Wyk, Data Structures and C Programs, p. 35, ISBN 0-201-16116-8.)

There is no problem with using O notation in O(1) space and O(1) time. You just don't abstract the notation down to that level where all time complexities and space complexities on real computers can be expressed as O(1). Allowing a computer's memory to be infinite in size, as I previously stated, loses touch with physical reality, and then O is no longer as useful as an estimate of how much physical time an algorithm will require to complete.

-Ed Lee 24.12.11.31 19:50, 9 December 2006 (UTC)

Slashdot sending editors here
I love Slashdot. History of the Article page so far in 2007: edits on Jan 15,24,26 and Feb 17th. Then there's a Slashdot story (http://it.slashdot.org/it/07/02/25/1848200.shtml) about a supposedly "new" sort algorithm which, in reality, appears to be the good old Radix sort and we get (so far) four edits today. (Feb 25th). Thomas Dzubin Talk 22:46, 25 February 2007 (UTC)

=Example?=

The LSD example provided seems rather poor, in that it is unnecessary to sort by the 1s place to achieve correct ordering. Perhaps someone could revise the example to make 1s place ordering a necessary component?

PS I realize the 1s place ordering is necessary for the radix sort. But the example itself doesn't need this ordering, so it might make the radix sort seem a bit less useful than it really is. My two cents.168.39.166.127 19:04, 27 February 2007 (UTC)

Recursive example?

Is there any need of a recursive implementation of the algorithm? If so, I could provide my code, written in C++. --91.45.24.43 (talk) 22:16, 28 June 2012 (UTC)

=Please Compare MSD and LSD in detail= I find that most texts book do not have a detailed description of MSD. I hope someone could analyse it by the Big-O notation, and compare it with LSD. MSD is not necessarily done recursively.Some text book says we could use Insertion Sort to sort each bin. I would also like to see the time complexity analysis of this method. THANK YOU! (I am a newbie on algorithms) Visame (talk) 23:12, 18 June 2008 (UTC)


 * I see that it's being done, although I'm not a big fan of Big-O notation when I know how to spell it out in terms of maximum, minimum, and constant. When you use insertion sort to sort each bin, rather than a recursion, then it's called bucket sort.

AFAIK, there are likely to be implementations of MSD and LSD that work out to pretty much the same thing, but MSD is more easily understood in a recursive manner, so it's probably more common. I suspect that LSD implementations dominate external disk-based implementations, though. BrewJay (talk) 09:37, 25 July 2008 (UTC)

Merge from bucket sort.
Graphics from bucket sort might be useful, but the mention of insertion sort in it reminds me of a horrible sorting method I devised in high school. (Index keys by the number of keys they are superior to). I'm not sure that inferior methods like bubble sort are not instructive or useful in a pinch or a variation. BrewJay (talk) 09:20, 25 July 2008 (UTC)

Clever Programmers
People who try to be clever, and end up making the code unreadable to should leave it alone. Replacing the merge function with a call to sum was really gratuitous. Yes, it works, but only for obscure reasons. There is nothing that would make a normal reader think of sum, which normally sums a list and returns a scalar, as a way to appending a bunch of lists together.

The goal of the Wikipedia should be to explain, not for you to show off that you have some clever and completely opaque trick.
 * I agree with the change if it makes the article more clear to you. But don't think the sum definition is accidental nor some advanced stuff, it's fairly trivial in terms of the definition of the list type. You should learn more about recursion and functional programming if that simple example looks obscure to you. Diego (talk) 10:47, 25 November 2009 (UTC)


 * You condescension is unnecessary and, frankly, offensive. I understand recursion and functional programming quite well, having taught programming for nearly 30 years.My point, which you brush aside, is that there is nothing in the use of sum that would lead the average reader of Wikipedia to understand that you use it to concatenate lists. Again, clarity is the most important aspect. We strive to instruct the reader, so we sacrifice the more esoteric so that the novice may understand.
 * I didn't mean to be condescending, it's just that I failed to see how you are skilled in FP yet find basic functions on lists obscure. I would have understood had you argued about the functional vs imperative nature of the example, as I agree that skill in functional programming is less common.
 * But your opinion stating that viewing list concatenations as adittion is "gratuitous" (when it's the very theoretic definition of list concatenation) and that "sum normally sums a list and return a scalar", when addition can be and is usually applied to every data type with a successor (or 'cons') operator, is what made you look a novice to my eyes. Diego (talk) 20:35, 26 November 2009 (UTC)
 * Diego, not every reader of the page is going to see "sum" and instantly think of list concatenation, regardless of the theoretical foundations of it. Those of us who edit articles like this can of course understand what it means, but the readers don't, necessarily. In writing for an audience that does not consist solely of yourself, you have to be willing to write below your own level of expertise. This is an important component of teaching, and is emphatically not a sign that the teacher is a "novice" and needs to learn more. rspεεr (talk) 07:27, 27 November 2009 (UTC)

To rspεεr: I concur, that's why agreed to the change all along. I was trying to understand the rationale for the change, in order to clarify whether it's really a wise move changing a one-liner with a more complex three lines loop (with a '+=' assignmet-addition operator, no less). There's also the danger that you're over-thinking when trying to put yourself in the shoes of others. Will ALL readers with few expertice find "for i in x: l += i" clearer than "sum(x, [])"? What about someone with a mathematical background, instead of a programming one? It bothers me when someone assumes that putting code in imperative style makes it automatically clearer to everybody.

In particular, this is an implementation detail and the purpose of the 'merge' method is already stated in the comment (concatenate the lists back in order for the next step), so is this additional complexity really helping to better understand the whole program? Will that not miss the forest for the trees? What's wrong with a bit of idiomatic Python, and why do you think it's less clear than a spelled-out loop? Those are my concerns. Diego (talk) 11:05, 27 November 2009 (UTC)

A particular problem with sum here is that it is, precisely as Diego says, idiomatic Python. The topic of this wikipage is radix sort and any particular language used for the example should be used as generically as possible. The Python should not be too Pythonesque. People come here to learn about radix sorting, not Python, and the python code should be transparent enough that a budding programmer without experience of Python can follow the code to a reasonable degree. (On this score the += is far more helpful as it exists in many languages.) However, as the algorithm dates back to the fifties, or earlier, and as its formal efficiency is independent of any specific implementation, I strongly urge that a PSEUDOCODE version be included in the page. Not only would this conform to the high standard of most of the other sorting articles, but it would make the article more accessible to more readers. For instance, the Python code is more accessible if it appears alongside pseudocode.

Efficiency
The section "Efficiency" is hopelessly confused because it does not involve the concept of the model of computation. Without it the comparison in the last paragraph of radix sort with other sorts is apples and oranges kind of stuff (I don't want to list the problems with the seemingly convincing text of the "Efficiency" section). Can someone provide more expertise here? 71.146.68.48 (talk) 05:56, 13 May 2010 (UTC)
 * P.S. A bit more thinking reveals that the section is plain wrong in its naivety. Since I don't have references at hand to provide a correct text, I leave the problems as an exercise for those with critical thinking and as a pitfall to those who uncritically rely on wikipedia. :-) 71.146.68.48 (talk) 06:10, 13 May 2010 (UTC)

I think it's perfectly clear. It's not about a model of computation at all. Can you say more than just "apples and oranges"? Radix sort is not a comparison sort, so the question can't be a comparison between radix sort in general and comparison sort in general, because they solve different problems. But if we add that we are sorting strings, or integers (or some radixable thing), then the paragraph becomes entirely clear. Comparison sorts run (at best) in O(n log n) comparisons, but the cost of comparing a string or an integer is linear in the length of the datum. If the data have a length of k, then this is the same factor k which shows up in arguing that radix sort sorts n elements in O(nk) time. I have pretty much as much expertise as you're going to find; perhaps if you could list the problems that you think are present would help. Tb (talk) 17:54, 13 May 2010 (UTC)

I beg to disagree. In the common case of comparing keys that can be considered primitive types, comparison cost is generally constant-time; the time to compare is the same regardless of the magnitude of the keys being compared. I have edited the efficiency section to describe this, as I find it fairly logical: comparing variable-length keys does favor the quick position-at-a-time radix-sort approach, but comparing constant-length, "atomic" keys does not. In these cases, "smaller comparisons" is not a primary advantage of radix-sort. --Khondrion (talk) 19:57, 15 May 2010 (UTC)


 * If the keys are primitive types, the comparison is constant-time, and likewise, the radix sort now has a constant number of buckets, and becomes linear. Tb (talk) 17:03, 17 May 2010 (UTC)


 * The entire second paragraph has no references, and is downright wrong. I see nothing we can do with it except remove it, or do a serious rewrite. happypal (Talk | contribs) 20:57, 28 May 2010 (UTC)


 * It's not "downright wrong", it's exactly correct. Can you explain what is supposed to be wrong?  Can you explain what you think the time complexity of radix sort is if you disagree?  (I agree that a reference would be valuable.)  See the slides from a course at NJIT at  or  or .  The mistake is common, but real.  Imagine we are counting bit comparisons.  Mergesort of integers must do O(n log(n)) integer compares, and each integer compare takes O(k) bit compares if the integer is k bits wide. (For example, on now-standard hardware, you can compare 32-bit integers in one instruction, and comparing arbitrary integers requires O(k) such comparisons.)  Radix sort does not compare whole integers, but only one digit at a time.  So radix sort is O(n log(k)) bit compares, and Mergesort does O(n log(n)) integer compares, which is O(n log(n) k) bit compares.  The only way radix sort is worse (asymptotically) is if you are on hardware which can perform comparison of arbitrary-length integers at the same speed as fixed-width integers.  Tb (talk) 22:02, 28 May 2010 (UTC)


 * Some people are surprised, because radix sort looks better than mergesort/quicksort/heapsort, and they were told that the best a sorting algorithm could be is O(n log(n)). They were not told incorrectly: the best known comparison sorts are O(n log(n)).  Radix sort is not a comparison sort.  Note that comparison sorts require only the ability to swap elements and compare them.  Radix sort is more limited: it only works when the elements have the necessary algebraic structure which can be exploited.  Moreover, the practical efficiency of radix sort is often significantly worse (in shorthand speak, "the coefficients get too large"), so the use of radix sort is limited to special situations.  Tb (talk) 23:11, 28 May 2010 (UTC)


 * Is it common for the elements not to have that kind of algebraic structure? It seems like the sort key just needs to be made of bits or other digits, right? Kragen Javier Sitaker (talk) 21:23, 1 June 2010 (UTC)


 * In practice, most search keys have such algebraic structure indeed. However, it is difficult to identify a good interface for a generic radix sort function analogous to what you can do for comparison sorts.  So there is a practical engineering difficulty as well as the problem that the practical efficiency of radix sort only gets good for quite huge data sets or specialized cases. Tb (talk) 22:45, 1 June 2010 (UTC)
 * It doesn't make sense because of "Since k is normally on the order of log n". For starters, really? there is a relation ship between k and n? Second of all, order of growth is an indication of the time the algorithm takes relative to the size of its input. It is not an indication of absolute speed. It is an indication of how much longer sorting will take if you double the size of the data. And in that matter, if you double the amount of objects you want to sort, k will be utterly unchanged.
 * This entire section is trying to say who beats who (or at least compare the algorithms) based solely on the assumption that "k*n == nlogn when k=logn". Not only is the k=logn part wrong, but you can't compare too algorithms based solely on order of growth.
 * There's good in the paragraph, but I think it requires a serious rewording. Currently, the paragraph is happily comparing speed based solely on order of growth, and that's ridiculous. The analysis is interesting though, but needs to be presented differently. happypal (Talk | contribs) 19:05, 5 June 2010 (UTC)


 * Yes, if we are talking about input length, then we need a different sort of discussion. But when people say that comparison sorts are n log n sorts, they mean that the number of comparisons and swaps is O(n log n), where there are N items to be sorted, not input of length N.  We don't need at all the "k = log n" question, which I agree is badly stated.  The mistaken view is that radix sort is no better (asymptotically) than the best comparison sorts, and the mistaken view starts by saying "k = log n" and then saying that the O(n k) cost of radix sort is thus the same as merge sort.  This is all the mistaken view.  The non-mistaken view simply has radix sort as (always) better asymptotically, whether one expresses the complexity in terms of N and K, or in terms of the input length in bits.  I am happy to see a clearer presentation, naturally.  What is crucial, in my opinion, is that the basic distinction between the wrong view and the right view be maintained here.  If we count input length, and we are sorting integers, then doubling the input length isn't even well-defined: are we doubling the number of integers, or their width?  Tb (talk) 17:08, 7 June 2010 (UTC)

<-- Hum, by "input length", I meant amount of items to be sorted. That said, changing the length of the items to be sorted has a big impact on radix, but not comparison sorts (or less). I think you perfectly put your finger on what I think is wrong with the current way things are said. I could give a try at re-writing the paragraph, but I've always been notoriously bad at at this kind of stuff. Maybe I'll try it and post here, rather than immediately put it on the page. EDIT: Hum, I seem to have misread a bit of the paragraph, and is not as wrong (or at all) as I originally thought. Sorry. Still, I find it provides for confusion, especially happily mixing complexity and number of operations. happypal (Talk | contribs) 10:57, 8 June 2010 (UTC)


 * changing the length of the items to be sorted has a big impact on radix, but not comparison sorts (or less). Wrong.  Changing the length of the items to be sorted changes the cost of a comparison, and changes it by exactly the same factor as it changes the complexity of radix sort.  Tb (talk) 18:13, 8 June 2010 (UTC)

Just jumping in on this discussion. The paragraph as it stands is hopelessly oversimplified, IMHO, because it obscures the fact that digits might not necessarily be primitive types (i.e. they may be smaller). Without serious work, this paragraph should probably be culled. Oli Filth(talk&#124;contribs) 16:43, 18 June 2010 (UTC)


 * Can you give an example of what might be smaller than a bit? Tb (talk) 16:46, 21 June 2010 (UTC)

What's really confusing in this paragraph is the connection it attempts to make between the item itself and the number of items. These two concepts are independent of each other. For example, 200 shopping carts, or 200 apples, or 200 planets. The number of items and the items themselves don't have any relationship to each other. This seems to be the core concept the argument in this section stands on, and attempts to make a connection, where there should be none. Could someone elaborate how a connection between an item and the count of items could possibly be made? Duvavic1 (talk) 00:07, 21 August 2010 (UTC)

What's even worse now is that this section is plain contradictory. Someone wrote the first part, then apparently someone else came along and added that all the above text is incorrect. While this is perhaps true, it's inacceptable -- a Wikipedia article shouldn't feel like a debate, or like a discussion forum, after all. --85.3.72.22 (talk) 08:18, 22 August 2010 (UTC)

Good point!

The above argument in the Efficiency section is incorrect. The reason for the error is that the author connects items to the number of items, while these two concepts are independent of each other. For example, let's say we are using 5 digit keys. These keys are not required to be unique. With 5-digit keys, decimal for instance, we can represent numbers from 0 to 99999. We can have an array of 10 keys or 10 million keys, with all keys being the same size (5 digits each). In the case of 10 million keys, there will be lots of duplicated values, since only 99999 unique key values are possible. Having duplicate values in the input data to a sorting algorithm is allowed and is a perfectly valid (even good random number generators create duplicate values). A comparison-based sort will take O( n*lg(n)) time sort these 10 keys or 10 million keys, where n is either 10 or 10 million (i.e. the number of items). Radix sort will take O( n*5 ) to sort these 10 keys or 10 million keys. For 10 keys, lg(10) is slightly larger than 3, and thus n*lg(n) = 10*3 = 30 for comparison-based sort. For Radix Sort n*5 = 10*5 = 50, and thus Radix Sort is slower. For 10 million 5 digit keys, a comparison-based sort will take 10M*lg(10M) = 10M*23 = 230M units of time. Radix sort will take n*d = 10M * 5 = 50M units of time, and thus Radix Sort is estimated to be faster as the array size of 5-digit keys increases. Radix sort is faster than comparison-based sort in this case, because it does not use comparisons, but processes keys one digit at a time.

Imposing the restriction of uniqueness of items introduces a connection between item size and the number of items (which is not a necessary restriction for sorting). For example, to have 2 unique items 1 bit is necessary - item 0 and item 1. To have 4 unique items, two bits are necessary - items 0, 1, 2, 3. Thus, lg(n) bits are necessary for n unique items. Thus, a uniqueness constraint creates a connection between an item size and the number of items. Once again, however, uniqueness of keys is not a required constraint for sorting. As a concrete example, 32-bit keys can address up to 4 billion unique items. To sort 10 million 32-bit items a comparison-based algorithm would take n*lg(n) = 10M*23 = 230M time units. A binary-Radix sort, where digits are 1-bit (i.e. 32-bit key is made of 32 1-bit digits), takes n*k = 10M*32 = 320M time units, which is slower than comparison-based sort (depending on algorithm constants). However, 256-Radix Sort, where 8-bit digits are used (i.e. four 8-bit digits per 32-bit key), takes n*k = 10M*4 = 40M time units, which is faster than comparison-based sort. Comparison-based sorts make lg(n) passes over the n items in the array, whereas Radix Sort makes k passes over the n items in the array. Thus, when comparing these two types of sorting algorithms we are comparing k to lg(n), since in both cases there are n items in the input array. Duvavic1 (talk) 01:10, 23 August 2010 (UTC)

I have encountered this type of confusing/misleading analysis before, which I mentioned above several years ago when I wrote of people who insist on representing the length of a linear quantity of data with a non-linear expression, such as n*log(n), where n is the number of keys and log(n) is the length of each key. This leads to expressing the time complexity of processing unique data with a linear time algorithm as O(n*log(n)). One problem with this analysis is that even when the input data is artificially restricted to be unique, the number of keys to be sorted is not necessarily at the maximum number allowed by the key length, so there is not necessarily a relationship between the number of keys to be sorted and the length of each key, requiring O(n*log(N)) time to process the data, where little n is the actual number of keys and big N is the maximum possible number of unique keys allowed by the key length. If you further restrict the number of keys to be the maximum possible number of unique keys allowed by the key length, then expressing the time complexity as O(N*log(N)) obscures the fact that the length of each key is a constant for a given N, not a variable. Even if the key length were variable, the text treats each key as if it is still processed in O(1) time by a comparison-based sorting algorithm while treating each key as though it is being processed in O(log(n)) time by a radix sorting algorithm when the same artificial relationship between the key length and number of unique keys would apply to both comparison-based and radix sorting algorithms. If you restrict the length of each key to being the word length of the computer, then the constant key length should not show up in the O notation. So, I have deleted the analysis.

24.15.215.181 (talk) 08:27, 11 September 2011 (UTC)Edward Lee

In the situation where keys are densely distributed, such as when they are unique and occupy all of the keys that can be represented in a fixed key length, I cannot think of a comparison-based sorting algorithm which would outperform counting sort on a single processor system. Counting sort is a form of radix sort where the entire key can be viewed as a single digit in a higher number base for the purpose of grouping identical values together. Most significant digit radix sorts which operate at lower number bases, such as 2, are better for sorting sparsely distributed data.

24.15.215.181 (talk) 20:38, 11 September 2011 (UTC)Edward Lee

I've removed the third paragraph of the Efficiency section, because it was flat-out mathematically wrong. Comparison sort's performance won't depend on key length (for lengths greater than log2(n) as was stipulated) if the keys are random. Rescuing that section didn't seem worth it. My edits to the second paragraph to wrap things up could probably be improved -- hopefully for some value of "improve" that doesn't mean making it longer.

75.80.106.67 (talk) 21:18, 11 December 2015 (UTC) Sam

"Unstable implementations of in-place sorting" section?
What on earth is this section about? Oli Filth(talk&#124;contribs) 16:51, 15 May 2010 (UTC)


 * Well, I've deleted it now. If it can be salvaged, please do so, but only if you can make it clear!  Oli Filth(talk&#124;contribs) 07:21, 18 May 2010 (UTC)


 * I agree. Tb (talk) 17:30, 18 May 2010 (UTC)

Useless article (?)
I don't know if it still applies, but this article is claimed to be useless:


 * "Unfortunately, the Wikipedia article on radix sort is useless. The section about an in-place variant is complete rubbish."

--Mortense (talk) 15:19, 4 January 2013 (UTC)

The writer at the web site above seemed to be indicating that the writer did not understand how to implement an in-place radix sort without an explicit coding example, which the writer later found, so the Wikipedia radix sorting article was, "useless," to the writer in particular. The current discussion of the binary in-place radix sorting makes sense but can be non-trivial to implement for someone who has not seen an illustrative example. At some point in the past, someone swept all of the earlier radix sorting coding examples, including an example of an in-place radix sort, from the Wikipedia article to include in a Wikibooks project. I thought that removing the coding examples was a mistake. Perhaps we should restore the older radix sorting coding examples.

-Edward Lee 76.16.39.16 (talk) 06:15, 29 January 2013 (UTC)

Radix sort for floats
I was wondering if some of the information from here (http://codercorner.com/RadixSortRevisited.htm) could be integrated into this article about sorting floating point numbers with radix sort? — Preceding unsigned comment added by 150.135.222.234 (talk) 20:12, 15 February 2013 (UTC)

Why Omega?
Is there some reason that in some places (for example in the section "Least significant digit radix sorts") omega is used while "Big-O" is used elsewhere? Gwideman (talk) 00:43, 22 October 2013 (UTC)

It is an over-generalization to say that comparison-based sorting algorithms can do no better than Ω(n log(n)) in execution time. Ω represents a lower bound, or best-case performance, when applied to the execution time of algorithms. Some comparison-based sorting algorithms do have better than Ω(n log(n)) execution time. In the case of bubble sort encountering an already sorted list of numbers, the bubble sort algorithm makes one pass through the list, determines that no exchanges took place, and then terminates in Ω(n) time. In the case of insertion sort, when it has a list of numbers that is known to be sorted, inserting a new value at the end of the list which happens to be the next value in sorted sequence will require Ω(1) time. I have replaced the Ω with O in the, "Least significant digit radix sorts," section. -Edward Lee

Call for abstract algorithm, instead of implemetations
There's a score of different implementation: 3 for C++! none of which are very clear; especially the C version. I propose that the various implementations be deleted and instead a proper description of the algorithm be inserted, both for LSD and MSD, independent of any programming language. — Preceding unsigned comment added by 212.130.79.38 (talk) 21:05, 26 March 2015 (UTC)
 * I second this, especially as the C++14 example won't even compile with certain compilers without modifications. (Works under g++ 4.8.2, fails on msvc 12 and clang 3.4.1) I think the key here should be proper description instead of psuedocode sillyness. --70.15.250.196 (talk) 13:59, 6 April 2015 (UTC)
 * There's a reason why the example specifically stated C++14: MSVC 12 and Clang 3.4.x are not C++14 compilers. Not that it matters too much now, but there was significant syntactical difference between the two language versions that one could argue warranted a separate, more succint, example. — Preceding unsigned comment added by 203.213.88.195 (talk) 14:19, 1 July 2015 (UTC)

I made a major edit
It seems to me this article became a playground for two individuals wanting to talk in detail about their own implementations and articles. That's not really what a general encyclopedia should be focused on. I tried to pare it down to history (expanded), 2 simple examples, a Java thing that actually works, a short discussion of MSD and LSD, and left the specialist stuff at the end for those still interested. I ripped out most of the perf talk, no citations. Perf is contextual, asymptotics aside, and if we want to have a section about it, links to real implementations/articles with real benchmarks would be a starting point. I didn't know what to make of the Snow White stuff, except to say that Tries, Trees, and Traversals all have their own articles, and that section wasn't focused on on the sorting part of radix sort. Cronholm144 05:09, 31 August 2019 (UTC)

Removing technically correct material from Wikipedia just because you think it is weird or because you don't understand it is little more than egocentric vandalism where you are imposing your own ignorance on the rest of the world. The recursive preorder traversal of a trie is what accomplishes the sorting of a trie's data, similar to the way an in order traversal of a binary search tree accomplishes the sorting of a binary search tree's data. The text that remained in the Wikipedia article on radix sort for performing a preorder traversal to sort the data in a trie displayed a fundamental misunderstanding of how a trie-based sorting algorithm works by incorrectly assuming that all data is stored in the leaf nodes, by not recognizing that data can also be stored in the interior nodes of a trie. If you have no relevant expertise in implementing a trie-based radix sorting algorithm, then I recommend that you do not attempt to correct the text contributed by someone like me who does have such relevant expertise. I have corrected the erroneous text that you or someone else contributed. I do not care to spend any more time to restore the in-depth discussion of the trie based sorting that you deleted. People can look in the history of the Wikipedia radix sort article to judge for themselves the educational value or the technical merits of my past contributions. 67.167.247.90 (talk) 05:13, 25 October 2020 (UTC) Edward Lee

Java example is not working
Java example is not an LSD sort since we start with the highest bytes. Also, it's not the correct sort. As far as I understand if you want to operate without buckets using single array you have to do a recursive sort for the part of an array where all high bytes are equal. The current example breaks the layout of a higher level sorted by the previous step.

It works similar to:

[0b00, 0b01, 0b10, 0b10] -> sort by high bit -> [0b00, 0b01] + [0b10, 0b10] -> [0b00, 0b01, 0b10, 0b10] (same as original) [0b00, 0b01, 0b10, 0b10] -> sort by low bit -> [0b00, 0b10, 0b10] + [0b01] -> [0b00, 0b10, 0b10, 0b01] = [0, 2, 2, 1] (not sorted!)

If you adapt this part to the example, the array to test will be [0, 1, 256, 256]. Result [0, 256, 256, 1] - not sorted.

Live example with the online compiler: https://www.jdoodle.com/embed/v0/1ufh

Ruslo (talk) 18:09, 4 September 2019 (UTC)