Talk:Huffman coding

Pain in the ass
It's a real pain in the ass how every mathematics-related article on wikipedia assumes you have a masters degree, at least. I understood Huffman coding as explained in my first undergrad book on data structures and algorithms, and reading this completely wiped out any intuition that I gained previously. — Preceding unsigned comment added by 24.85.228.245 (talk) 05:52, 4 July 2012 (UTC)


 * I'm in the same boat. I know Huffman coding is simple, and I used to be able to do it, but this article was nowhere near the refresher I had hoped it would be. 192.203.137.242 (talk) 19:25, 7 December 2012 (UTC)


 * If you can spell out the problems with the material, someone more well-versed in the subject may be able to make improvements to the article.
 * Sebastian Garth (talk) 19:44, 8 December 2012 (UTC)

basic example image is wrong?
hi! i think the example in the basic example, the one in the captation of the image is wrong. it shoud be

and not:

both the image is wrong and the table. i think one can't mix the way the 0/1 is set. PedroCarvalho (talk) 06:23, 28 November 2007 (UTC)


 * well, you can mix the way 0/1 is set, the output code will be good anyway, you just have to remember it when you decode it! anyway, you'd better not to change the way 0/1 is set, so I have fixed both image and caption. Alessio Damato 13:59, 4 December 2007 (UTC)


 * it still mixes the way 0/1 is set. In the first split it gives 0 to the least common (a1: 0.4), and 1 to the most common ({a2,a3,a4}: 0.6), but in the two other splits is gives 0 to the most common and 1 to the least common. -Thomas Nygreen (talk) 09:59, 9 May 2008 (UTC)


 * Huffman code generators can select 0/1 in any manner they choose (including randomly) and still generate a valid Huffman code. Nevertheless the diagram ought to be consistent with the pseudocode. Dcoetzee 16:30, 9 May 2008 (UTC)


 * Exactly like it says in the text below it, the hoffman coding should be done from left to right, taking the lower probability each time. This is NOT what the picture shows, so it is clearly wrong. Yomach (talk) 06:27, 26 June 2010 (UTC)


 * Isn't the frequency for the 'f' character also wrong? Should be 2 not 3? Moulsonp (talk) 10:54, 11 May 2012 (UTC)

older
The algorithm in "Basic technique" is ambiguous. It's not clear how the two queues are used. It seems that the second queue is never read from, only added to.

[BREAK INTO DISCUSSION: 30 July 2007 - Simply does not compute - I have been trying to make sense of this 'pseudo code' for a couple of hours now. I am no slouch, but I just can't get the algorithm from this description. Somebody - please post something more intelligible, then remove this comment. - TropicalCoder (Google me) EXIT DISCUSSION]

~ I believe I may have better solved the example "Sample-1". I may be wrong, but I found the algorithm ill explained in my notes so I visited wikipedia to better understand it... I tried to solve it in the most logical way to me... building a complete binary tree with apropriate depth and using the percentage costs as guides to choose which boundaries to choose.

What I got was this: There is 1 character with weight 30; 1 with weight 20; 3 * 15; 1 * 10 and 3 * 5 for a total sum of 120. Dividing all by 120 and using the percentage to define how much of the tree to take away you get that 30/120 = 25%... and you go on like that... using this technic I got:

Total cost is exactly the same, but the coding varies in size between 2 and 4 bits, and in #Basic_technique it says that "It is generally beneficial to minimize the variance of codeword length." I believe that makes my solution better, so if anyone with more than a mere day's knowledge on the matter agrees, then by all means make the correction :) --nunocordeiro 02:02, 11 July 2006 (UTC)
 * You got the right answer for the wrong reasons. You used Shannon-Fano coding, which in this case happens yield a result identical (in terms of each symbol's weight) to the bottom-merge Huffman code, the Huffman code with the lowest variance and lowest maximum length (among optimal codes).  In general, these are not the same, and Shannon-Fano is suboptimal.  The example should be replaced by one that either yields only one Huffman code (again, in terms of symbol weights, so {0,1} is the same code as {1,0}) or explain bottom-merge Huffman coding.  I'll put this on my to do list, but if someone wants to fix this, go ahead. Calbaer 02:10, 27 July 2006 (UTC)

Removed "*Animation of the Huffman Algorithm: Algorithm Simulation by Simon Mescal" since the link is dead. If it comes back up move back in.

What is meant by "cost" in this article?


 * Yeah, I was wondering too look, what I found, now I see. The cost is the cost to send each letter in the target alphabet. For example if we send letters as in Morse with "." & "_" we will spend more on "_". Gnomz007 15:07, 16 Nov 2004 (UTC)

Or to qualify a little more, the letter S in morse is three dots in rapid sucession. Three time periods. The letter 0 is three dashes. Each dash takes up at least two time periods. So, at least six time periods. Therefore, sending O is at least twice as expensive (in time!) as sending an S. You would think that a byte is always 8 characters long and so this has no meaning. However, most US english only uses about 7 characters. If you find that no 8-bit characters are being used in the file -- or if you find a way to mark them as such -- you can compress the file down to 7/8th its original size right there. Consider some of the unicode encodings, where most characters take 8 bits, but if you want an extended character, you can use extra bytes is you flag them. --67.177.235.1 18:55, 4 October 2005 (UTC)
 * Oh no! This is wrong, wrong, wrong (in the context of the article).  For most applications, costs are probabilities, pure and simple.  But they need not be, because they are just multiplicative factors (or, as the Basic technique section has it, weights) for the expression to be minimized.  Adding to the confusion, the Huffman coding with unequal letter costs subsection uses cost in the way you two and the linked paper are discussing, as output costs!  It seems as though an information theorist wrote most the article but a computer scientist wrote the Problem definition section.  Anyway, I think most people who read the entire article would come away very confused.  Looks like it's time for a rewrite (or at least a find/replace of the word 'cost').... Calbaer 02:22, 27 July 2006 (UTC)

Back-story on the Discovery of Huffman Coding
Dr. David A. Huffman was the Chairperson of the Information and Computer Science Department at University of California at Santa Cruz (UCSC) in the early 1970's. He taught an undergraduate course, "The Algebraic Foundations of Cybernetics" in 1971-73. During one lecture he taught "Minimum Redundancy Coding". A student noticed, in some textbooks, it was called "Huffman Coding" -- and asked him if that was the same principle.

Dr. Huffman replied by relating an interesting story.

As a student at MIT, David Huffman had been working on an unsolved, challenging problem in Information Theory. Instead of studying for the final, he had decided to devote all his free time to work on this problem. The night before the final, with no solution yet, in a moment of despair, he threw his large stack of notes into the trash. He described the next moment as a flash of insight in which the solution occurred to him.

He went on to explain that he was still working on many difficult, interesting problems that, when solved, could be represented in textbooks for centuries. He did not want such a "simple idea" as minimum reduncancy coding to bear his name. Students wondered and discused among themselves: was this humility, or hubris?

not one-to-one on finite messages
The Huffman coding is not one-to-one for finite messages. Suppose the message consists completely of the same character (e.g. a file full of 'aaaaaaaaaaaaaaaaaaaa...'); then that character would be assigned the empty bitstring; and the message would encode to the empty string. But such messages of different lengths would all encode to the empty string, hence the information about the length is lost. If you think about decoding such a message, you would also see the problem: you have an empty encoded message, but you also have a symbol whose code is the empty bitstring, so you don't know how many characters to write out.

This problem does not occur for infinite messages, of course; since there is only one infinite message consisting solely of the same character, for any given character.

Perhaps this should be mentioned in the article? --Spoon! (talk) 04:18, 12 January 2008 (UTC)


 * This is not entirely correct. The weights in the Huffman algorithm represent probabilities, not observed frequencies: so you can only assign the empty bitstring to a character if you know that such character will occur with certainty.  Hence this is a degenerate case, where the only information being transmitted is the length of the string.  In practice, if all you have is finite sample data, then you need to use something alike to Laplace's rule of succession to assign probabilities.  —Preceding unsigned comment added by Classicalecon (talk • contribs) 11:50, 11 February 2008 (UTC)
 * Another thing to keep in mind is that the empty string is not part of a valid Huffman code. A valid Huffman code has no codeword of length less than one.  If the most likely outcome is more than 40% likely, its codeword will only have one bit, but it cannot have less, even if it is 99.9999% likely.  If it is 100% likely &mdash; or even just much more likely than 50% &mdash; then a practical compressor wouldn't use a Huffman code per symbol, but perhaps instead a run-length encoding on the entire sequence.  If there are relatively few possible sequences, that run-length code could be a Huffman code. Calbaer (talk) 22:27, 11 February 2008 (UTC)

Length-limited Huffman coding
The section on length-limited Huffman coding needs to be clarified, since finding the optimal length-limited code given the initial weight distribution is trivial. In what sense is this an unsolved problem? —Preceding unsigned comment added by Classicalecon (talk • contribs) 10:56, 11 February 2008 (UTC)
 * How is it trivial? If the Huffman code has a length longer than the length limit, we have to throw away this code and find the optimal length-limited code using a different method, e.g., the Package-Merge algorithm.  It might be trivial to find a length-limited code in general &mdash; a fixed-length code certainly limits length &mdash; but to find an optimal length-limited code is another matter.  No method is known to solve this problem in linear or logarithmic time. Calbaer (talk) 22:27, 11 February 2008 (UTC)
 * The article section currently claims that it isn't linearithmic, but I don't see how that's the case. The package-merge algorithm finishes in O(nL) time. But if L is greater than log n in base phi, code length is not a limiting factor. (See Witten, Moffat, and Bell, Managing Gigabytes, ISBN 9780442018634, pp. 345-350, for a proof.) Otherwise, L ≤ log n, and O(nL) ≤ O(n log n). So why isn't that linearithmic? --Damian Yerrick (talk | stalk) 21:21, 24 April 2012 (UTC)
 * Because Huffman coding is used where the weights/probabilities/frequencies are not all identical. If the probability of the jth item were 2-j, then the optimal (length-unlimited) code would have two codewords of length n-1.  If you wanted to limit lengths to n/2, then the Package-Merge method would take O(n2). Calbaer (talk) 20:33, 6 June 2014 (UTC)

Basic technique?
In the article it describe the basic technique like so:

1. Start with as many leaves as there are symbols. 2. Enqueue all leaf nodes into the first queue (by probability in increasing order so that the least likely item is in the head of the queue). 3. While there is more than one node in the queues: 1. Dequeue the two nodes with the lowest weight. 2. Create a new internal node, with the two just-removed nodes as children (either node can be either child) and the sum of their weights as the new weight. 3. Enqueue the new node into the rear of the second queue. 4. The remaining node is the root node; the tree has now been generated.

However, it never says what the second queue is used for! From what I understand shouldn't this node be inserted back into the first queue and the queue sorted? I'm having real problems working out how to generate the Huffman tree (since it differs from a normal binary tree as the highest weight should only be a single bit). —Preceding unsigned comment added by 88.107.145.7 (talk) 19:30, 6 May 2008 (UTC)
 * This description is correct but confusing. The step "Dequeue the two nodes with the lowest weight" is implicitly examining the fronts of both queues. A conceptually simpler implementation uses priority queues, but is not linear time. I will attempt to clarify the exposition. Dcoetzee 19:35, 6 May 2008 (UTC)

Alternate Pseudocode
I think a more programming-focused version of the pseudocode should be added to the article, as the current explanation is a bit difficult for programmers to implement and somewhat ambiguous. Archer1742 (talk) 10:29, 26 March 2009 (UTC)

Removal of links
Please explain your reason for mass removal of links from this article. Please do not simply refer to WP:LINKFARM. Explain why each link you removed was bad.Biophys (talk) 14:56, 9 April 2009 (UTC)

How does one decode?
I have a solid background in comp theory, but I can't see from this article, unless I might start writing the math from scratch, how one takes the encoding and reforms a string like the French phrase in the example. The article linked in prefix code, also woefully inexplicable. explains me the idea of using the tree plus the encoded string, and so now I understand. But why isn't there an algorithm, pseudocode, and/or illustration of decoding?

Also, some proof of optimality would be nice. SamuelRiv (talk) 00:01, 11 February 2011 (UTC)


 * Yes, I see what you mean. Well, I can put together at least a brief overview of the topic, I think. It'll probably need some additional expansion, of course... Sebastian Garth (talk) 17:24, 11 February 2011 (UTC)


 * I just read the section and it is very useful. The thing that was bothering me is that even though I know how the prefix tree works now, the article seems to assume not just familiarity, but working comfort with the prefix tree. As such, the beginning sections completely skip over a basic, diagrammed rundown of prefix tree compression, such that the formulation of the problem makes sense in context, the importance of representing characters with probabilities has actual meaning in terms of prefix code, and the later animation (which also should be reduced in size, btw) can be explainable quickly after the introduction.
 * Really what was so confusing about the first section is it is representing all characters by probabilities. Again, I have a lot of background, but my immediate question was how the probabilities are useful in any sense of encoding and decoding. Of course, they are (generally) not - they are useful strictly in optimizing the algorithm. Ideally, the prefix code article would be readable as well - by the way, this article introducing prefix coding is what I used to figure this out - but a summary could easily be done on Wikipedia so that one only needs a little comp sci to get through. SamuelRiv (talk) 19:04, 12 February 2011 (UTC)

This article strikes me as a little too math-y and abstract for most Wikipedia readers.

I think that it would be good to have a section on basic practical implementation issues so that people can see how Huffman coding can be very fast to encode and decode, even for large alphabets---the encoder constructs a flat encoding table from the Huffman tree, mapping input symbols to their encoded lengths and actual codes. The decoder uses an inverse table to map codes back to the input symbols. Once the frequencies are sorted and codes are assigned, the actual encoding or decoding can be very, very fast. I think explaining that would make the abstract stuff more concrete for many readers, especially programmers who aren't fluent in formal theory.

Some useful detail:

For small alphabets with a reasonable longest code length, the decoder can just use a dense table (1-dimensional array) with an entry for every possible bit pattern whose length is the length of the longest code.

Suppose, for example, that your maximum code length is 9, but you have mostly shorter codes like 0x101. To decode the next code, you can just take the next 9 bits (not knowing whether you need them all) and index into a 2^9 = 512-element decoding table. For short codes like Ox101 there are duplicate entries for all possible 9-bit patterns that start with that short pattern (101000001, 101000010, ... 101111111). No matter which of those bit patterns you use to index into the array, you get back the same code length (3) and encoded bit pattern (0x101), in fast constant time.

For large alphabets or other situations where you may have some longer codes, you can get most of the same benefit by using a small table like that and a conditional branch---take the next 9 bits, index into the table as above, and check the code length. If it's 9 or less, proceed as above. If it's not, that's because you're actually looking at the first 9 bits of a longer-than-9-bit code, so you branch to a somewhat slower routine for the longer codes.

This is generally extremely fast, because short codes occur much more frequently than longer codes---that is why they're shorter. In the common case it's just a handful of instructions, hitting a small table that fits in small super-high-speed Level 0 cache memory. (And with only one conditional branch, which is usually not taken and won't cause a pipeline stall on modern processors with branch predictors that notice things like that.) The routine to handle long codes can be pretty fast too, with time bounded by the code length, so that the total time to decode a file is proportinal to the entropy (encoded file size).64.134.159.110 (talk) 17:45, 1 May 2016 (UTC)

Ambiguity alert
Hey there, this is not 100% clear is it? Huffman avoided the major flaw of the suboptimal Shannon-Fano coding by building the tree from the bottom up instead of from the top down. So was the flaw created "by building the tree from the bottom up", or was the "bottom up" way the good method? Heh, if you have no idea about H. C. at first, you might be more confused than you were at the beginning. -andy 217.50.52.128 (talk) 15:54, 27 October 2012 (UTC)

Second paragraph is poorly worded
Nuf' said. — Preceding unsigned comment added by 99.245.3.15 (talk) 10:49, 25 August 2013 (UTC)

Asymmetric numeral systems (ANS) - combining speed of Huffman with accuracy of arithmetic?
There is lots of going on about this alternative to arithmetic coding, which have similar complexity to Huffman coding, but handles fractional bits of information - have accuracy like arithmetic coding.

Some implementations: https://github.com/Cyan4973/FiniteStateEntropy https://github.com/rygorous/ryg_rans Blogs: http://fastcompression.blogspot.fr/ http://fgiesen.wordpress.com/2014/02/02/rans-notes/ http://cbloomrants.blogspot.com/

Maybe it should be mentioned somewhere in the article? — Preceding unsigned comment added by 128.211.178.20 (talk) 21:58, 4 February 2014 (UTC)

I have added some very brief subsection, but ANS rather requires a separate article. — Preceding unsigned comment added by 98.222.202.103 (talk) 15:47, 28 February 2014 (UTC)
 * It had one, but that was deleted due to original research: Articles_for_deletion/Asymmetric_binary_system. The subsection here seems to have been a way to get that content back in.  No idea what's happened since then, but back then one compression expert mentioned that a fatal flaw of the system was the need to decode in reverse order .  That means no context-adaptivity (which the inventor of the code acknowledged).  Since the primary advantage of arithmetic coding is, in practice, that adaptivity, I don't see the practical application of ANS, let alone its viability as a distinct subsection in the article of a completely different coding technique (without reliable sources, to boot). Calbaer (talk) 20:45, 6 June 2014 (UTC)
 * Not to say that ANS is useless, just that the evidence of notability has to be more than its inventor extolling its virtues and some folks implementing it in personal libraries. Calbaer (talk) 20:50, 6 June 2014 (UTC)
 * I removed it. The first comment here is from a Purdue domain, and Purdue is where the ANS inventor was located when this was posted.  He has been using Wikipedia to try to promote this for years in violation of WP:SOAP, WP:FORUM, WP:NOR, WP:RS, and WP:notability. Calbaer (talk) 00:02, 7 June 2014 (UTC)
 * Maybe look at some current benchmarks (you can test yourself) e.g. http://encode.ru/threads/1920-In-memory-open-source-benchmark-of-entropy-coders http://encode.ru/threads/1890-Benchmarking-Entropy-Coders, http://heartofcomp.altervista.org/MOC/MOCATD.htm (zhuff is LZ4 + FSE). — Preceding unsigned comment added by 89.76.32.36 (talk) 05:55, 7 June 2014 (UTC)
 * Maybe look at the Wikipedia policies and guidelines and follow them. It's been 6.5 years since Articles_for_deletion/Asymmetric_binary_system, where several people tried to explain to you why this is not an appropriate place for this content.  If you really feel it is, read the policies and guidelines, then say how it doesn't violate them.  You clearly have either not bothered to read them, or have ignored them, repeatedly polluting several articles with your blatant and annoying self-promotion.  This article, for example, had a whole section about your recent-but-arcane algorithm, while other well-established alternative algorithms such as LZW and arithmetic coding, used by most people in industrial societies every day of their lives, merited only a brief mention, and that just so folks could know there were commonly used alternatives to Huffman coding with various pros and cons.  Your algorithm isn't among those well-established commonly used ones.
 * After 6.5 years, the method hasn't been published in a peer-reviewed paper, adopted by a commercial or open-source product (I'm not counting libraries), or otherwise garnered notability; a few excited hobbyists on the web do not make something "notable," nor does any benchmark, no matter how good. Data compression is a very small community, and I know a couple of the people you've worked and been in contact with.  Your position in the community is not that of a crank or a hack.  But, on Wikipedia, that's precisely what you come off as.  Unfortunately, your material being here reminds me of the Wikipedia Seigenthaler biography incident: Both take advantage of a period of time in which no one who saw knew to fix the misplaced content.  Under WP:AGF, I'll have to take this as a sign of willful ignorance rather than dishonesty.  However, I suggest you no longer write about your own work.  If it is as notable as you think, someone else inevitably will.  And if they're familiar with Wikipedia, they'll do a better job: Writing for a general audience rather than a technical one, including reliable sources, making the notability of the work obvious.  But, even then, it won't be in this article, because your compression method is not a form of Huffman coding. Calbaer (talk) 15:29, 7 June 2014 (UTC)
 * Also, since you seem to reply to critiques but not read policies and guidelines, I should make it clear that, even if your algorithm were presented in a peer reviewed paper or used in a mobile app, that alone does not merit its inclusion here. I cite the lack of these for reasons why it isn't notable, but notability requires a higher bar than that alone, and not every notable invention merits inclusion on Wikipedia.  I have several optimal coding algorithms presented in peer-reviewed papers.  They are somewhat relevant to this topic, but mentioning them would hinder, not help, someone who comes here trying to understand Huffman coding.  And they wouldn't merit their own pages on Wikipedia, useful though I think they might be.  Though it isn't in a policy or guideline that I know of, one thing you should ask yourself before contributing is: Does this help Wikipedia and its readers, or does this just help me?  I'm not here to promote myself and my work, and you shouldn't either. Calbaer (talk) 17:11, 7 June 2014 (UTC)
 * You really don't like this guy. Anyway, don't you think that Wikipedia should somehow mention that there are now methods, like a few month old FSE (or maybe some of your codings you've mentioned?), which are both faster and providing better compression than Huffman - that we could save time/energy/disk space? — Preceding unsigned comment added by 178.180.14.129 (talk) 22:29, 7 June 2014 (UTC)
 * Hi, I was wondering who would call Matt Mahoney (PAQ) and Bulat Ziganshin (FreeArc) as "a few excited hobbyists", or wouldn't like that, while comparing with FSE, prefix codes look really miserable now. — Preceding unsigned comment added by 89.76.32.36 (talk) 23:22, 7 June 2014 (UTC)
 * I don't know who you think you're fooling. First you contributed self-promotion through a named Wikipedia account, then, when that article was deleted, you contributed to tangentially related articles as an anon through your Purdue IP.  Then you followed up on the talk page via your U.S. home IP.  After you moved back to Poland, you posted a few hours later from your Polish home IP.  And you followed it up by saying I really don't like "that guy" from your Polish T-Mobile IP.  It's all the same guy.  You think you're posting anonymously and that people will think the other posts are not you, but the sockpuppeting is painfully obvious (and yet another violation of Wikipedia policy).  Perhaps you'll try a VPN next, but by this point I think we can assume that any anonymous follow-up or follow-up from a newly created account is just you again.
 * Then you top it off by trying to dig up and post irrelevant information about me. I guess I knew going in that you were rude and deceptive, but to be so blatant about it does nonetheless surprise me.  It especially surprises me because behavior on such a public forum holds the potential for seriously harming your professional reputation.  Your work being unnotable is not a reflection of either a lack of intellect or any deficiency as a human being.  But your behavior on Wikipedia is another matter.
 * Matt Mahoney told you that your code suffers from the fatal flaw of not being context-adaptive. Yet you cite him as a defender, evidence that your work is notable in the context of Wikipedia?  Given what he and others have said, your code seems slower than prefix coding (which doesn't require multiplies) and lacks the main advantage of arithmetic coding (context adaptivity).  It's fun for toy problems, but not as a replacement for current technologies.  Matt is interested in this because he's interested in new methods.  But just because another academic shows interest in your work does not make it notable.  As you've been repeatedly told, even if your method were the best ever, that alone would not make it notable in this context.  Wikipedia is not the forum for advancing research.  You must know this, since you've been told this repeatedly for years.  This is not an alternative to the peer review process.  It is an encyclopedia.
 * The Huffman coding article needs a lot of work; it suffers from bloat and confuses novices, and stuff like this only adds to these problems. Please stop contributing to Wikipedia until you can be constructive rather than destructive, and can stop willfully violating the policies of the website. Calbaer (talk) 00:36, 8 June 2014 (UTC)

A few compressors have recently switched to ANS - maybe it is worth mentioning it somewhere: zhuff (huff->tANS), lzturbo (huff->tANS), LZA (range coding->rANS), ZSTD (only tANS) and they look quite interesting in Matt's benchmarks ( http://mattmahoney.net/dc/text.html ): size for enwiki8, comp and decomp time (ns/byte):

lzturbo 1.2 –39 | 26,915,461 | 582 | 2.8

LZA 0.70b –mx9 –b7 –h7 | 27,111,239 | 378 | 10

WinRAR 5.00 –ma5 –m5 | 27,835,431 | 1004 | 31

WinRAR 5.00 –ma5 –m2 | 29,758,785 | 228 | 30

lzturbo 1.2 –32 | 30,979,376 | 19 | 2.7

zhuff 0.97 –c2 | 34,907,478 | 24 | 3.5

gzip 1.3.5  –9 | 36,445,248 | 101 | 17

pkzip 2.0.4 –ex | 36,556,552 | 171 | 50

ZSTD 0.0.1 40,024,854 | 7.7 | 3.8

WinRAR 5.00 –ma5 –m1 | 40,565,268 | 54 | 31 — Preceding unsigned comment added by 193.0.101.229 (talk) 12:55, 30 January 2015 (UTC)

calbaer, you say that the primary advantage of arithmetic coding is for adaptivity, by which you seem to mean very fine-grained adaptation to changes in symbol frequency distributions.

My understanding is that the primary advantage of arithmetic coding is simply the ability to use fractional bits to encode symbols with non-power-of-two probabilities. For example, in PPM the biggest win of arithmetic coding is for encoding sequences of symbols with probabilities very close to 1, using significantly less than 1 bit per symbol. (As in compressing the string "United States of America"---once you've seen "United Sta" you can encode "tes" in less than a bit, and once you've seen "United States of A" you can likewise encode "merica" with less than one bit for all 6 symbols.) As Charles Bloom has pointed out, the other "probabilities" used in PPM are usually pretty bogus because they're too sparse to be statistically significant. (By the time you've seen enough occurrences of a context to do real statistics, that context is usually masked by new higher-order contexts with the same sparsity problem.) So in practice, the biggest win of arithmetic coding is often just to efficiently encode a probability that's very nearly 1.

It is common for the entropy coder to be faced with relatively stable statistics from a fixed model for a particular domain, or computed for large blocks of input, or as the output of an adaptive model that decorrelates away most of the instabilities. A good adaptive predictor may tend to make very different predictions depending on what it's seen lately, but output residuals that look a whole lot more like noise with a particular characteristic frequency distribution. That's especially true for an adaptively approximate predictor, which may predict only those bits it's likely to be able to predict, and regard the rest as incompressible noise, essentially splitting the data into the prediction, the residual (prediction error), and the too-random-to-bother-with details.

For use cases where very fine-grained adaptation isn't needed, *or* where it's handled by some decorrelating transform or model-switching ahead of the entropy coder, ANS has a lot of advantages over traditional range coding. It's faster to decode (the fastest way to decode Huffman codes is usually ANS-style), it's very easy to parallelize SIMD-style (or on GPU, but I don't know if anybody's actually done that yet), and it's easy to interleave outputs of different models with no metadata to mark the switch. (The reverse-order thing is not generally a killer, either, for a blockwise or sliding-window asymmetrical compressor. The compressor can do things in whatever order makes for the fastest decompression---see GLZA for an extreme example.)

That is why the cool kids are already doing it---including Charles Bloom, who invented PPMZ and LZP, and Yann Collett. ANS is used in most of the new fast asymmetrical compressors, both free software and commercial products(Collet's STD et al., Bloom & Giesen's Oodle LZNIB, etc.)---and others have made noises about switching when they get around to it (e.g., Kennon Conrad (GLZA), Richard Geldreich (LZham)).

I am not one of those people---I've never used ANS, yet, and don't have a dog in the fight---but it's pretty clear that this is an expert consensus. Nobody knows more about how to compress data well and decompress it fast than those guys. And they think that ANS is a win, if not the greatest thing since sliced bread---essentially the only interesting development in entropy coding per se in a long time, and one of the most useful for practical lossless compression algorithms.

It is unfortunate that most of those people do not (currently) prioritize refereed scientific publications, if that's keeping ANS out of Wikipedia. That is a big problem in lossless data compression---most of what makes practical compression systems work well is hidden in archivers and proprietary code, or is in widely-used free software but with zero publications in the refereed literature. (For example, LZMA was the most scientifically interesting and *useful* general-purpose lossless compression algorithm for a decade, and millions of people use it, but to this day almost nobody really understands why it works so well *except* for the people I mention here and a very few others. There are zero academic publications explaining LZMA, and the best descriptions are on Charles Bloom's blog.)

64.134.159.110 (talk) 18:21, 29 April 2016 (UTC)

There are at least two peer-reviewed papers about ANS now: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7170048&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7170048 http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7306068&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7306068 Additionally, it is currently the only entropy coding in experimental branch of Google VP10 video codec: https://chromium.googlesource.com/webm/libvpx/+/fb9186d68d498919b290d7f444560f44a1d8d1df 149.156.69.178 (talk) 08:58, 1 May 2016 (UTC)

Calbaer, you said that "No idea what's happened since then, but back then one compression expert mentioned that a fatal flaw of the system was the need to decode in reverse order". Did you actually read the old thread you linked to? Nobody said anything about a fatal flaw---just things that it's not appropriate for (direct application to PPM or context mixers). The issue of reverse order was soon resolved in that thread. For the things that ANS is appropriate for, you're going to use frequencies for a reasonably large block, in which case the compressor can go through the block backwards, a little slower, so that the decompressor only has to go forwards, fast.

That means that ANS is good for the pretty much same things that *Huffman* is good for. If that's ANS's "fatal flaw", it seems to be Huffman's too---adaptive Huffman is too slow. (As the present article says.) If that's true, then the article should say so up front: that Huffman is obsolete and is of historical interest only, because you gotta have quick adaptation and arithmetic coding just wins. And maybe the Huffman article should be deleted. :)

But that is NOT true. Blockwise entropy coding is very useful in the most popular compression algorithms in the world: asymmetric LZ77-based compressors, gzip (DEFLATE) and LZMA, like where you entropy code LZ77 match offsets, match lengths, and/or literals between matches, or the output of some model of those.

And that is precisely where ANS is actually replacing Huffman---in high-performance asymmetrical LZ77- and LZMA-like algorithms like ZSTD and LZNIB.

You can't have it both ways---if ANS is useless and uninteresting, so is Huffman. But if Huffman is interesting---and it is--then ANS is *more* interesting, because it has one of the two main advantages of arithmetic coding (efficiency for probabilities not near a power of two or near one), and Huffman has neither.

You seem to argue that the precision of arithmetic coding (or ANS) is relatively unimportant, compared to the fast adaptation thing.

That is exactly the reverse of every description of arithmetic coding I've ever seen. (And I've seen a bunch.) Every time anybody starts talking about arithmetic coding, the very first thing they say is that it's a solution to Huffman's inability to use fractional bits, and is that really cool. And ANS has that over Huffman, too.

ANS will rise
''>As you've been repeatedly told, even if your method were the best ever, that alone would not make it notable in this context. Wikipedia is not the forum for advancing research. You must know this, since you've been told this repeatedly for years. This is not an alternative to the peer review process. It is an encyclopedia. ''

Calbaer, this is scary, why such brutally restrictive policy, people should know that there is an superb algorithm - that's the main purpose of one encyclopedia, your policy delays the ... advent of one good news.

They say that a good thing never lasts

And then it has to fall

Those are the the people that did not

Amount to much at all

/Madonna 'Give It 2 Me' lyrics/

My point, good things eventually will make their way in this crazy world.

Sanmayce (talk) 08:28, 20 May 2015 (UTC)
 * Do you really think that current data compression specialists - people who have built their careers on Huffman will acknowledge its obsolescence? Generation needs to change ... 62.221.52.45 (talk) 03:49, 24 May 2015 (UTC)


 * You might prefer Madonna, but I prefer Carl Sagan: "The fact that some geniuses were laughed at does not imply that all who are laughed at are geniuses. They laughed at Columbus, they laughed at Fulton, they laughed at the Wright brothers. But they also laughed at Bozo the Clown." The terms of notability are laid out at Notability.  Just because your invention isn't notable today doesn't mean it will never be.  When and if it is notable, it will be its own article, not part of this one.  (Incidentally, if you think people have built careers on Huffman coding, you really don't know much about the field.  It's a fertile subject for investigation, but has never been one for the volume of grants or industry interest one would need for anyone having "built their careers on" it.) Calbaer (talk) 04:44, 2 June 2015 (UTC)
 * So how many compressors need to switch to ANS to make it notable? (current list: http://encode.ru/threads/2078-List-of-Asymmetric-Numeral-Systems-implementations )149.156.75.199 (talk) 12:07, 2 June 2015 (UTC)
 * The real question is, how many peer-reviewed articles describing ANS need to appear to make it notable?
 * Please, please, please: spend your time getting those articles written and accepted, rather than trying to promote your ideas here. Difficult as it may be to believe, Wikipedia is not the place for original research.  That's just the way it is, and has to be. —Steve Summit (talk) 12:45, 19 June 2015 (UTC)
 * Peer-reviewed paper: "The use of asymmetric numeral systems as an accurate replacement for Huffman coding". Apple's new 3x faster compressor LZFSE: Lempel-Ziv + Finite State Entropy clearly points Yann Collet's tANS implementation: https://github.com/Cyan4973/FiniteStateEntropy . CRAM v3 genetic data compressor of the European Bioinformatics Institute uses rANS ( http://www.htslib.org/benchmarks/CRAM.html ) ... 149.156.75.199 (talk) 08:59, 7 August 2015 (UTC)

Another peer-reviewed paper, on synthesizing FPGA's for high-throughput ANS: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=7306068&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D7306068 107.77.218.83 (talk) 13:51, 2 May 2016 (UTC)


 * ISIT 2016 ( http://www.isit2016.org/program/technical-program/#Mo-AM-1-1 ): "On the Stationary Distribution of Asymmetric Binary Systems", Google VP10 ANS implementation: https://chromium-review.googlesource.com/#/c/318743, Google WebP ANS implementation: https://chromium-review.googlesource.com/#/c/338781/ , fast rANS: https://github.com/jkbonfield/rans_static — Preceding unsigned comment added by 149.156.69.178 (talk) 09:00, 13 June 2016 (UTC)
 * Apple has just made open source its currently default compressor (LZFSE, since iOS 9 and OS X 10.11) - its entropy coding is indeed tANS: https://github.com/lzfse/lzfse/blob/master/src/lzfse_fse.h 149.156.69.178 (talk) 08:54, 17 June 2016 (UTC)
 * First book with chapter about ANS (of Colt McAnlis from Google: https://medium.com/@duhroach ): https://library.oreilly.com/book/0636920052036/understanding-compression/35.xhtml?ref=toc#idm139672412737600 149.156.69.178 (talk) 07:24, 19 July 2016 (UTC)


 * You could not possibly be missing the point more. This would be like adding a huge section about the Nissan Leaf to https://en.wikipedia.org/wiki/Ford_Model_T .  No matter how much or little tANS is used, no matter whether or not it's a significantly different than arithmetic coding, no matter how many or few papers there are about it, the following remains the salient fact: It isn't Huffman coding and is only relevant to Huffman coding in that - like the Leaf and the Model T - both are the same category of thing.
 * It's many been years that you've been at this and three years since I questioned "its viability as a distinct subsection in the article of a completely different coding technique (without reliable sources, to boot)." This critique has not changed in those three years.  Perhaps you now have reliable sources (though none where mentioned in the section I deleted today), but the irrelevance will never change, even if every last bit of compression in the future winds up using tANS.  Please, please, stop vandalizing pages with irrelevant, obsessive self-promotion. Calbaer (talk) 22:14, 31 May 2017 (UTC)


 * Beside theoretical understanding of the possibility of fractional bit extension of Huffman decoder, one practical aspect of the Section you have deleted is compatibility - that tANS decoder (e.g. hardware) can also decode Huffman as a special case. Another practical aspect is that large advances in speed of Huffman decoder (~3x comparing to zlibh) were made in 2015 by Yann Collet based on his earlier FSE/tANS decoder - see series of his posts, for example http://fastcompression.blogspot.com/2015/07/huffman-revisited-part-2-decoder.html Jarek Duda (talk) 22:53, 31 May 2017 (UTC)


 * And a Nissan Leaf can drive down the same roads as a Model T. It also uses much of the same technology (like the wheel).  And it also goes faster, just like tANS.  That doesn't mean putting a section for one on the page of the other would be proper.  It would be bizarre and confusing, just like the tANS section of this article. Calbaer (talk) 23:59, 31 May 2017 (UTC)
 * This is not about HUNDREDS of COMPETING car MANUFACTURERS you can pay for, but about TWO COMPLEMENTING TECHNOLOGIES used in a popular class of data compressors (gzip-zstd) you can use for free. Yes they are complementing, e.g. zstd uses both simultaneously. As explained in the Section you have deleted, while tANS is a generalization handling fractional bits for potentially better compression ratio, it comes with some additional costs (beside larger header and backward encoding): the newX table which in Huffman can be quickly calculated using bitshfts, but for tANS it rather has to be stored. The best comparison with other Wikipedia article is understanding arithmetic coding as another fractional bit expansion of Huffman ... which has own page-long section: https://en.wikipedia.org/wiki/Arithmetic_coding#Connections_with_other_compression_methods


 * As an author of dozens of Huffman articles you see tANS as a deadly enemy you can kill with ignorance, while the deleted Section shown a very smooth boundary between them - which should be well understood and exploited, e.g. after building the Huffman tree, tANS allows to apply tuning for suboptimality caused by symbols far from power-of-two probability by modifying Huffman's symbol spread (like aaaabcdd -> aaabbcdd if b has turned out more probable). Jarek Duda (talk) 05:30, 1 June 2017 (UTC)


 * Your defense of relevance is rather thin. There aren't just two technologies for compression.  The vast, vast majority of compression that occurs occurs without the use of true Huffman coding (as oppose to VLCs, of which Huffman codes is a subclass) or ANS.  Treating them like two giants slugging it out for dominance is ridiculous, and, like many claims you make, makes you come across as a obsessive crank.


 * I already explained how Huffman coding's commercial fortunes have absolutely no effect on me. (Besides, ANS it hardly the first fast compression method that outpaces Huffman coding.  H.264's CABAC fits that bill nicely, for example.)  This is not about self-interest.  Heck, I've long moved on from compression in general, which is why I don't care about the details of ANS.  I just care about Wikipedia, and your edits have been done for self-promotion and without regard to relevance, fundamental Wikipedia policies I've mentioned above, or basic decency.  You post it anonymously but defend it with named accounts when it's removed, for example, then sneak it back in again when you think no one is watching.


 * This is a page where people go to learn about Huffman coding, not a page where people can promote their own compression work. (If it were the latter, I'd link to my papers before yours, because at least they're about Huffman coding.)  If you don't understand that after three years, you never will, but as long as you keep posting violating content, it will be removed.  Perhaps not immediately and perhaps not always by me, but repeatedly.  In the meantime, you are degrading a great resource and confusing people who just want to learn.  All so you can promote yourself.  That's a pathetic way to spend your time and promote your work. Calbaer (talk) 14:41, 1 June 2017 (UTC)


 * Beside direct description, the purpose of a Wikipedia article is also to provide a glimpse of a complete picture to an interested reader, especially for such applied topics like Huffman coding - connections with other popular widely applied methods is essential part of it, exactly like in the AC article: https://en.wikipedia.org/wiki/Arithmetic_coding#Connections_with_other_compression_methods Especially that Huffman has crucial limitation of not handling fractional bits, what is repaired in other methods which can be seen as its generalizations: explaining this connection would improve the completeness of this article - if you know others (than AC and ANS) popular alternatives, this article could also benefit from adding them. Many Wikipedia articles list alternative methods and briefly discuss connections. But I don't have time nor patience for repelling charges of a Huffman crusader - let us stop it and wait for someone more objective. Jarek Duda (talk) 16:38, 1 June 2017 (UTC)


 * "Many Wikipedia articles list alternative methods and briefly discuss connections." That is not what your section does.  It discusses one method at length, yours.  It is there not to help understanding of Huffman coding, but to promote something else.  I didn't write the Huffman coding section on the Arithmetic coding page, but it discusses a connection to a prior technology (which is more appropriate than the converse).  And it's much longer than anything I would've added to the article.  (If you'd like me to cut it down to size, I can do so.)  And I don't think it's 100% necessary.  But at least it's relevant.  If someone more objective comes along - and not just you from yet another anonymous IP or a friend posting for you - then it would be fine to have a brief discussion of connections to other coding methods (depending on how it was done, of course).  If agreeing to that it all it takes to get you to stop sneaking in this information, then I'll gladly agree. Calbaer (talk) 17:59, 1 June 2017 (UTC)

Well, it took fewer than 72 hours for you (User:Jarek Duda) to break your promise of leaving this article alone so that others could determine the relevance of your work to this article. Now I know what your word is worth.

And, as I mentioned three years ago, attempting to dox someone, whether successful or not, is harassment and a serious violation of Wikipedia policy. Yet you keep doing it, thinking that my identity might somehow undermine an argument where my identity has no bearing.

You claim it does because I engage in "self-promotion," which I can only assume is a reflex tu quoque argument, since it makes absolutely no sense otherwise. I've never posted material from or links to the author you claim me to be. You also like to claim that I'm somehow "protecting" a 65-year-old technology that was bettered 40 years ago, not 3 years ago. That's especially laughable given that I've added more than once that the "Huffman coding" used in almost every compression technology in common use is not always truly Huffman coding. Why would I point out that what I want to "protect" isn't used as much as most people think?

I'm also not sure why you want an edit war that would put at risk all self-promotion you've put on Wikipedia, when I'm just concerned about this page. Would Asymmetric Numeral Systems, a purely self-promoting page that has many of the same problems as your attempts to shoehorn the topic in Huffman coding, survive an AfD? Until now, I haven't been motived enough to try one; I figure people who come across it ultimately ignore it, written as it is like a conference talk, not an encyclopedia article. But it's certainly an article ripe for deletion. Why would you then want to antagonize someone, giving motivation to put that to the test?

I also haven't been motivated enough to inform others of your pattern of harassment and vandalism, broader knowledge of which could have professional impacts. Given that you do use your real name and freely admit to your identity, I would think you'd be more circumspect. That makes it all the more strange that the ANS contributions are always done by anonymous IPs from Indiana or Eastern Europe; why not use your Wikipedia account in the first place when you defend all these anonymous IP edits with the named account?

Considering all that, it would be best for you to do what you said you'd do three days ago, and stop modifying this page, leaving it for those who are not trying to promote their work, but instead trying to provide a useful introduction to the titular algorithm. Calbaer (talk) 06:15, 4 June 2017 (UTC)


 * Oh man, please give me a break, I really don't have time for this - e.g. I have to defend ANS from patent trolls (like https://encode.ru/threads/2648-Published-rANS-patent-by-Storeleap ) and I cannot even afford legal assistance as I wanted ANS to be free to prevent situation with AC and so I didn't get a dime from it.


 * Your motivation here is quite clear (even if you camouflage it with other actions): as author of many Huffman articles (anyone can easily find from your nick and interests), you would like to take the world back to the Glorious Huffman Times. Hence you vandalize Wikipedia article about Huffman coding - the only widely taught especially to cs student (and so they check Wikipedia article) - you remove: "Specifically, Huffman coding is optimal only if the probabilities of symbols are natural powers of 1/2. This is usually not the case. As an example, a symbol of probability 0.99 carries only $$\log_2(1/0.99)\approx 0.014 $$ bits of information, but Huffman coding encodes each symbol separately and therefore the minimum length for each symbol is 1 bit. This sub-optimality is repaired in arithmetic coding and recent faster Asymmetric Numeral Systems family of entropy codings." and replace it with innocent "is not always optimal" ... while Huffman being optimal is a generic situation: powers of 1/2 probabilities, so more appropriate would be rather "is not always suboptimal" and it should be clearly stated that this "innocent" suboptimality can be 100x and more, like in the part you have vandalized.


 * ANS needed "promotion" ~2007 to shorten the delay for its current wide use in modern compressors, leading to unimaginable world-wide energy and time savings thanks to up to 30x speedup (AC vs ANS decoding: https://sites.google.com/site/powturbo/entropy-coder ). Sadly, also due to deletion of Wikipedia article, there was 6 years with nearly no progress here - up to late 2013.


 * Now it has become the mainstream for people actually writing compressors, also from giants like Apple, Facebook and Google - instead of promotion, it needs objective description of what is actually used in modern data compressors - so that the authors of new compressors can avoid past suboptimalities. However, I got used to the attitude that "the authorities" in this field: people with dozens of Huffman and AC articles, just want to maintain the status quo - and you are only one of many examples. Maybe instead of trying to stop the progress, you could see an opportunity here - e.g. for exploring very smooth Huffman-tANS boundary ... or finding ANS alternative without its main weak point: backward decoding. Peace. Jarek Duda (talk) 05:22, 5 June 2017 (UTC)


 * If you "don't have time for this," why have you spent so much time on this? Years and years, paragraphs and paragraphs.  I shortened the intro because it was too long and technical.  In empirical use on practical data sets, I've found Huffman coding within 5% of optimality, but I don't have any reliable sources behind that, so I'm not making that claim, just asserting suboptimality.  Your "100x worse" example is artificial, especially when such applications generally use not trivial Huffman coding, but the more efficient method of Golomb coding run lengths, a technique based on ... Huffman coding.  Anyway, I keep saying that Huffman coding (as designed) isn't used as much in practice as people think - I added Huffman coding#Terminology to say as much - yet you continue to claim that I'm defending it, rather than merely trying to make the article clearer and more accurate.  Perhaps the reason you think I'm doing this out of self-interest is because that's the only motivation for editing Wikipedia you know of, as evidenced by your claim that your inability to publicize your research via Wikipedia delayed its progress.  Wikipedia is not designed to be a medium for advancing your work, but an encyclopedia.  Please treat it that way.  Otherwise your time will continue to be wasted. Calbaer (talk) 23:27, 5 June 2017 (UTC)


 * The suboptimality of prefix/Huffman codes is very data-dependent, e.g. for letter-wise entropy of English text one loses only ~1% compression ratio due to inaccuracy ... but it can be arbitrarily large for popular in many situations so called "skewed distributions" from the example: with one symbol of probability very close to 1 - corresponding to typical behavior (containing much less than 1 bit/symbol of information), for which we still need to be able to handle rare exceptions (like escape symbol).


 * Anyway, whether it is only 1% ratio loss (e.g. for exabytes of internet traffic), or much larger "surprise" for our compressor, Wikipedia article should have a big clear warning about it and ways to handle it - especially that there is currently usually practically no cost of repairing this suboptimality: e.g. while standard zlibh has ~300MB/s/core decoding, recent Huffman decoder (inspired by tANS) has ~1000MB/s ... accurate open-source rANS reaches now ~1500MB/s ( https://sites.google.com/site/powturbo/entropy-coder ). This is not about self-promotion, but objective description of the current situation and avoiding mistakes of the past. Jarek Duda (talk) 00:27, 6 June 2017 (UTC)


 * Please read What Wikipedia is not from start to end. Among other things, Wikipedia is not an instruction manual.  The page describes a concept, not whether someone should use it in a given application.  This is not an advertising section.  It's not a soapbox.  It's not a scientific journal.  It's not a textbook.  It's not all the things you're trying to make it by promoting yourself in articles like Maximal Entropy Random Walk, Asymmetric Numeral Systems, and Chiral life concept via non-encyclopedic explanations.  It's an encyclopedia.  It should be useful to those reading, not to those writing.  (As an aside, it is truly mind-boggling that you think of people doing things other than what you invented before you invented it as a "mistakes of the past.")
 * Please stay far, far away from editing Wikipedia until your internalize and accept these concepts. Use that extra time productively until then; you claim to need it.  After that, maybe you'll be ready to contribute something that isn't about yourself and is helpful to others.  Ten years of self-promotion is ten years too long.
 * Incidentally, I wasn't involved in either of your AfDs, Articles for deletion/Asymmetric binary system or Articles for deletion/Chiral life concept; it wasn't clear whether or not you realized this. The reason what I say agrees with what they said is that I'm discussing Wikipedia policy, which does not depend on my own self-interest or other peculiarities. Calbaer (talk) 01:09, 6 June 2017 (UTC)
 * By "mistakes of the past" I have referred to the suboptimalites discussed in this statement - which are no longer needed in the current view of what is used in data compression - which situation should be objectively described in a Wikipedia article. While this decade ago I was a lone PhD student who didn't see a different way to be heard (regarding what has been confirmed to be useful), a lot has changed in this long time and I have matured to be as objective as possible, also many neutral credible external sources have come for all the topics you mention. I have also created a few Wikipedia articles I am completely not connected with, e.g. Aharonov-Casher effect or Tumor Treating Fields - my main motivation is doing what I believe is right (like not patenting ANS) and objectively important.


 * Hence, if you question my objectivity, please be specific and I will explain as here: yes I still believe Huffman coding article objectively requires clear warning about its suboptimalities and modern widely used ways to repair it. Maximal Entropy Random Walk has after this decade ~100 citations including the best journals like Nature, many of them use this name in the title. Chiral life concept has now e.g. WIRED article confirming my original 2007 concerns that it could eradicate our life ... and e.g. 2016 Nature article showing that the run for such synthesis has already started (mirror polymerase has been synthesized). Jarek Duda (talk) 02:14, 6 June 2017 (UTC)

Examples don't match?
Is it my imagination, or does the image in the Problem definition section (File:HuffmanCodeAlg.png) not match the Example in the Problem definition section? (Yet they use the same symbols,  ,  ,  , and  , inviting the reader to try to correlate them.) —Steve Summit (talk) 12:41, 19 June 2015 (UTC)

Remove most External links, all See also
I removed the See also section and most External links. WP:ALSO states:


 * Editors should provide a brief annotation when a link's relevance is not immediately apparent, when the meaning of the term may not be generally known, or when the term is ambiguous.

This was not done here, except for Modified Huffman coding, which I can restore if desired. However, its article is short and not very illuminating, and that article's external link doesn't indicate how the code was created, so the link might not be useful as is. Others were to items already in the article or technologies that happened to use Huffman coding and have the syllable "Huff" in them. For example, Huffyuv seems similar to Lossless JPEG and JPEG-LS in its use of Huffman coding methods. That makes its inclusion arbitrary, which is true of most of the section; better to leave it out and have links inline where appropriate.

Of the three external links, one was broken and another was to an open source library. Only the third - to readable examples of code - seemed useful. As WP:EL states:


 * Some acceptable links include those that contain further research that is accurate and on-topic, information that could not be added to the article for reasons such as copyright or amount of detail, or other meaningful, relevant content that is not suitable for inclusion in an article for reasons unrelated to its accuracy.


 * Some external links are welcome (see § What can normally be linked), but it is not Wikipedia's purpose to include a lengthy or comprehensive list of external links related to each topic. No page should be linked from a Wikipedia article unless its inclusion is justifiable according to this guideline and common sense. The burden of providing this justification is on the person who wants to include an external link.

I kept the link I believe satisfied this, being instructional and (wiki) collaborative, i.e., not just a single person trying to get a page they made linked from Wikipedia. The section should be short with only truly useful links, to keep it from devolving to the link dump it was before this change. Calbaer (talk) 14:55, 8 June 2017 (UTC)

Optimality
This section has several inaccuracies and would benefit from links to other pure entropy-encoding algorithms.

"Although Huffman's original algorithm is optimal for a symbol-by-symbol coding (i.e., a stream of unrelated symbols) with a known input probability distribution,"

This is incorrect. Huffman is optimal for a prefix code algorithm, but there are non-prefix code coders that give better compression so clearly Huffman is not optimal out of all of them. (What it has in advantage is simplicity, especially in hardware implementations.) I also don't understand why Arithmetic coding and LZW are grouped together here. That is muddling up modelling with entropy encoding. Many of the most widely used LZ algorithms also utilise huffman (eg gzip / Deflate) - these are totally orthogonal issues and it is confusing to the reader to imply otherwise. I would therefore drop LZW from this statement and add in Asymmetric Numeral System instead as it fits along side Arithmetic Coding as a direct alternative to entropy encoding and we should be unbiased in our listing of alternatives.

"Prefix codes tend to have inefficiency on small alphabets, where probabilities often fall between these optimal points. The worst case for Huffman coding can happen when the probability of a symbol exceeds 2−1 = 0.5, making the upper limit of inefficiency unbounded. Combining symbols together ("blocking") can be used to make Huffman coding as efficient as theoretically possible."

Agreed on the principle of symbol combining to improve efficiency, but it cannot make huffman "as efficient as theoretically possible" unless we want to encode the entire stream as a single combined symbol (and then how do we encode the table?). Rather it reduces the inefficiency. Nothing quite matches the theoretical shannon limit. Huffman isn't even as efficient as most arithmetic coders, even with blocking. Also if we have a large alphabet with one very common symbol (eg byte 0 out of 256 byte values - all used) then symbol combining cannot solve this unless we move to a larger alphabet size than 256 which brings its own set of inefficiencies when implementing. I see therefore why RLE is added here too, but again it starts to feel like we're muddling up modelling with entropy encoding at this point.

Symbol combining and RLE are valid and well used techniques (I've used both myself, even with other more advanced entropy encoders) so it's good to have their inclusion here to educate readers on how to practically work around the limitations if and when they crop up. However it could perhaps do with some practical examples. Eg the most obvious is a stream of just two symbols with no inter-symbol correlation; eg A A A B A A B A B A. p(A)=7/10, p(B)=3/10 giving around 0.88 bits per symbol, but huffman can only ever assign 0 and 1 giving exactly 1 bit per symbol and 0% compression. We could combine them into AA AB AA BA BA symbols, which on a larger data set would offer some compression using huffman bit lengths 1(AA), 2(AB) and 3(BA, BB) offering .905 bits per symbol. Further improvements can be made by combining more symbols together into successively large alphabets. Such techniques do not work with an already large alphabet, requiring alternative methods such as Run Length Encoding.

80.229.230.181 (talk) 09:32, 24 July 2017 (UTC) James
 * Regarding your last paragraph, what you want sounds very similar to Arithmetic coding, so we could copy that text or have a "see also." As far as what entropy-approaching alternatives we should list, they should be those the reader would be more likely to be familiar with.  (The idea to only have one or two is to avoid confusion by grouping together many arcane compression methods.  Googling the various compression methods shows that arithmetic coding is far more commonly known than similar alternatives, so it's best to stick with that.)  It wouldn't hurt to name only arithmetic coding (leaving out mention of LZW).
 * I don't think there's any inherent problem with mixing entropy coding techniques with modeling, though, since both are approaches to achieving/improving compression, especially with blocking and run-length coding, which turn Huffman from something that can be useless (e.g., for compressing binary symbols) to something that approaches entropy. And "approaches entropy" would be more accurate than "as efficient as theoretically possible," but the latter is more readable.  A lot of this is about readability versus completeness and strict accuracy.  We could say, "The Huffman code approaches entropy - i.e., optimal compression - using blocking.  However, the practicality of such an approach is usually limited by the size of the resulting coding trees, in terms of both storage and complexity.  There is one form of blocking, run-length encoding, without this drawback, at least on binary inputs."  That's more accurate.
 * My concern is that this section will explode into something far too big for a simple encyclopedia article, or become a place for people to plug their favorite compression technique. But there is certainly room for improvement. Calbaer (talk) 14:49, 24 July 2017 (UTC)


 * Your revised summary is better, but still inaccurate. You are mistaking pattern removal with entropy coding by implying that run-length encoding has no drawbacks and solves the issue of Huffman being unable to compress a bit stream.  Consider the example of a random selection of heads and tails with a weighted coin that gives tails 2/3rds of the time and heads 1/3rd.  RLE just adds noise and doesn't help here.  Try it using zlib with Z_HUFFMAN_ONLY vs Z_RLE and you'll see the file grows with Z_RLE, and neither of them even get to 1 bit per symbol (due to EOF). Accuracy *does* matter - this is an encyclopedia and accuracy should always be paramount.  If it's too complex to explain accurately then perhaps it's not appropriate for the site, but we should never slide into inaccuracy simply to make it easy to read.  Therefore my suggestion here would be to simply state that huffman alone does not encode optimally in some cases and suggest that in *some* cases this can be worked around, but keep the specifics minimal. (James)  — Preceding unsigned comment added by 193.62.202.135 (talk) 08:47, 21 August 2017 (UTC)


 * Which part of the wording do you find inaccurate? Characterizing RLE as a form of "blocking"?  Or something else?  I don't think there's any implication that RLE has no drawbacks, since the text specifically says that it works, "For the simple case of Bernoulli processes," implying suboptimality elsewhere.


 * Also, your example of a 2/3-1/3 coin seems wrong here. Without combining symbols, compression would be 0%.  With RLE and Golomb coding (encoding run length using a unary code plus one bit per run of heads), it's about 6.7%.  (That's equivalent to n=1, p=2/3 in the equation at Golomb_coding.)  That's not quite the 8.2% you could get with optimal entropy coding, but it is compression, not just noise.  (As for whether a given library would implement this properly, that's another matter all together.)


 * The idea of fixed-length blocks is largely of theoretical import, but is important for that reason. Using RLE instead of dumbly encoding input is an important demonstration of how doing the right thing before entropy coding can turn a coding method that looks "bad" (because it's being used poorly) to one that's "good" (because it's being used well).  Your example appears to illustrate a common misconception here; you're not the first person to present me with a Bernoulli example as evidence of where Huffman coding doesn't compress, when proper use of RLE shows that it does.


 * Still, I'm not sure how important it is to distinguish which part of compression is categorized as "entropy coding" and which part isn't. Like I said, I don't see how the current text - including omitting that distinction in favor of just linking to the RLE article - is inaccurate. Calbaer (talk) 14:18, 21 August 2017 (UTC)

n-ary Huffman coding
Under the n-ary Huffman coding section, it is mentioned that "The same algorithm applies as for binary (n equals 2) codes, except that the n least probable symbols are taken together, instead of just the 2 least probable." - but is that really the case?

Let's look at a simple example: trinary alphabet (n equals 3), with four symbols (call them w, x, y and z, in increasing order of probability). The algorithm as described will group w, x and y under a single node, and then (supposedly, as there are only two nodes remaining, not three, so this isn't explained explicitly, but I think this is the reasonable way to understand it) that new node and z under one more new node, which ends up as the root:

root / \   wxy   z  / | \ w x  y

So the root only has two children (which is OK, the article does mention the possibility that "additional 0-probability place holders must be added"). But surely this can't be the optimum, as better compression would be achieved by moving one of the first three symbols (optimally y) to occupy the free position directly under the root, rather than its current, deeper position:

root / | \   wx  y  z  /  \ w   x

This prompted me to look at Huffman's paper (http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf as referred to from the article), where I think the issue is actually addressed, by saying (left column of the fourth - and last - page, which is page 1101) "... it will be required that all these brackets, with the possible exception of one combining the least probable messages of the original ensemble, always combine a number of messages equal to D." (emphasis mine).

I think the article should be fixed to reflect this and to explain the algorithm more clearly. And I'd rather leave this task to someone more proficient than myself (but I might do it myself eventually if I notice in some time that this hasn't been addressed). 95.90.235.121 (talk) 12:00, 29 December 2020 (UTC)

¿Huffman tree?
What it's called Huffman tree it's called in bibliography as Kraft Tree--138.4.177.124 (talk) 15:20, 16 February 2021 (UTC)

Better applications section?
The applications section needs some work. The first paragraph compares Huffman codes to Arithmetic coding (see the rest of the drama in this talk page), which is not an application. The second paragraph describes some compression algorithms that use prefix codes which are specifically described as NOT Huffman codes.

So... the applications section has no applications in it.

J2kun (talk) 02:00, 4 May 2021 (UTC)
 * Gof for it. Prefixed codes are not huffman codes for sure (judging by papers that I've seen so far). AXO NOV  (talk) ⚑ 15:14, 3 December 2021 (UTC)

Finite vs infinite alphabet
The definition speaks of sets, but doesn't say finite. Does the tree construction still work out OK for infinite alphabets and trees of infinite depth? Clearly one can't work from the leaves combining least-probably nodes if there are infinitely many. See Golomb code, the last part of this edit where I'm not so sure about what I said. Dicklyon (talk) 16:07, 12 July 2021 (UTC)

Simpler explanation
Current § Problem definition subsection is overly abstract. I got a paper described in much simpler terms how to get huffman code for a given set of symbols (probability values). No hard math involved, take a look: Huffman Offline.pdf AXO NOV  (talk) ⚑ 15:18, 3 December 2021 (UTC)

Wrong bit quantity/comparison in the example at the beginning
I think there´s some kind of "mistake" in the description: "this code requires 135 (or 147) bits, as opposed to 288 (or 180) bits if 36 characters of 8 (or 5) bits were used.". There are only 16 sympbols listed in the table given below the tree-graphics, so for a FAIR comparison, only 4 bits are required to binary code 16 different symbols. Please don´t compare most wasteful ASCII-Coding with the Huffman Code, please don´t limit Huffman Coding to text-data! If it comes to the GAIN everybody is greedily looking for, it´s 135 bits used by huffman-coding (without codebook and decoder) versus 36 chars * 4 binary code bits = 144 bits (NOT 288 or 180 given in the description!). I know it does look much less impressive, if you honestly say "only 144-135=9 bits gain in this example, and overhead for codebook and decoder is even discounted!". I ask for a fair comparison, because everybody should face the truth how hard it is to find powerful algorithms for data compression. Don´t try to make Huffman Coding better than it is, don´t feed the "magic function theory". And PLEASE also show an example where beloved Huffman Coding clearly FAILS! — Preceding unsigned comment added by 84.115.238.220 (talk) 05:47, 3 June 2023 (UTC)