Talk:Arithmetic coding

Old stuff not in a section
The definition at the top is completely useless and unclear. Someone provided this definition years ago and now it is copied from one page to another. The definition that explains it is provided below in the text:

Arithmetic coding is converting a message to a whole number, which length is close to the length of the number of possible permutations for the provided statistical distribution of symbols and does not depend on the particular order of symbols.

First of all when doing arithmetic coding we compute the number. Even when it is fraction we output only numerator, which is whole number. Second property: its length does not depend on the position of symbols. After any rearrangement of symbols the message still can be compressed to the same limit. That answers many questions, such as when and why to apply arithmetic coding. The answer is when there is no way of using the positioning of symbols. And third important thing that entropy is very close to bit length of the number of possible permutations divided by number of symbols.

Older stuff not in a section
The sentence that reads... Instead, we represent the sequence as a rational number between 0 and 1 in base 3 Should read... Instead, we represent the sequence as a rational number between 0 and 2 in base 3 I didn't want to edit the main page, maybe one of you who is more involved can —Preceding unsigned comment added by 66.244.163.200 (talk) 18:17, 10 April 2008 (UTC)

The above statement is incorrect. Any number, in any base, starting with 0. (zero-dot) has a value between 0 (inclusive) and 1. True; the value lies between 0 and 2 as well (or -1000 and +1000 for that matter) but since the number 0.XX (zero-dot-anything) can never equal or exceed 1, stating that the value lies between 0 and 2 is wrong in this context. —Preceding unsigned comment added by 77.251.53.27 (talk) 21:13, 5 August 2009 (UTC)

Patents
Oy, oy, oy, just think what if Newton and Briggs had decided to patent logarithms!

Converted the sums and products to TeX, with the exception of one which had the unexpected expression "log_2". I think it is missing its argument. --Ejrh

JPEG2000 has an arithmetic coder... there must be an associated patent license. -- taral@taral.net

If these are covered by patent, we should have the list of patent numbers in the article so that we know when they expire. --ssd 18:26, 11 Jul 2004 (UTC)


 * Look here

Encoding and decoding: overview and example
These parts are entitled with “Encoding and decoding: overview” and “Encoding and decoding: example” but the overview part describes the encoding process only while the example part describes the decoding process only. This is very inconsistent and makes it hard for a reader to understand both processes as the encoding remains theory whilst the decoding example uses an input value coming out-of-nothing. — Preceding unsigned comment added by 77.12.61.89 (talk) 17:39, 16 April 2013 (UTC)

Entropy number wrong
I think the entropy estimate in the example is wrong. Each character seems to contain (-log2(.6) * .6) + (-log2(.2) * .2) + (-log2(.1) * .1) + (-log2(.1) * .1) = 1.57 bits of entropy. Ignoring the fact that an end of message only occurs at the end (which would reduce entropy of the entire message), a three character message like the example would have an entropy of 3*1.57=4.71. The article says 7.381.

As a check; you could actually encode any three character message of a four symbol alphabet in only 6 bits by simply assigning a 2 bit code to each of the inputs. Our example must have less entropy because the probabilities are skewed away from .25/.25/.25/.25. Mfinn68916 15:53, 27 March 2007 (UTC)

1.57b is the mean entropy of a symbol from the example entropy source. You can't just multiply that by the length of a particular message to get its entropy, as the symbols have different probabilities. The fact that a 2b fixed length code does 'better than entropy' in the "NEUTRAL NEGATIVE ENDOFDATA" case is irrelevant, what matters is the mean case, where indeed 2b/symbol does not beat the entropy of ~ 1.57b/sym. The self-information of a particular message should be calculated as the sum of -log2(Pj) over the probabilities, Pj, of the symbols in the message, for an order-0 Markov source (is there a better term?). So 7.381 is correct for the example message, but a new and clearer example would probably be a good idea... --Yjo (talk) 17:09, 26 November 2007 (UTC)

Please remove or completely rewrite the section "Sources of inefficiency"....it is wrong (I had removed it if I knew how to do it)! No 4-symbols source can exceed 2-bit of entropy! The entropy formula is wrong! — Preceding unsigned comment added by 2001:760:2C05:1005:0:0:200:45 (talk) 10:45, 14 October 2014 (UTC)

Improving the article
(note: these comments on improving the article were originally left on my talk page; I've taken the liberty of copying them here to start discussion of where the article can be improved.)

I still think my version is better for a few reasons. The main one is that the article jumps right into an example, which it then interrupts with a discussion. I think it is better to start with an analogy describing the method. Follow this with a complete example of the simple static case. Then discuss the adaptive case. And those diagrams in the history looked interesting, too. It was a shame to discard them.

And in general, the prose is too wordy, with too many asides within sentences. Go for simpler sentence structure, with fewer words. I don't have the page in front of me, but I remember the first sentence had these flaws. There was no need to define a message within the definition of arithmetic coding. A separate sentence would flow better.

And finally, I feel it would be better to do the examples in binary rather than decimal. It is just as easy, and that's really how the algorithm works in any implementation I've seen. To do it in decimal is analogous, but needlessly confusing. And speaking of analogies, your algorithm is an analogy as well. The real algorithm doesn't calculate all the interval endpoints; it just calculates the two it needs at each stage. And the decoding works more like how I described, by scaling the intervals to [0, 1) at each step.

I know my version was not finished, but it had some of the elements I describe here. Could you look at it again and see if it may be possible to work some of it into yours? Scottcraig 08:08, 13 Nov 2004 (UTC)


 * I think it would be better to begin discussion of these one at a time. I'll start with the issue of simplifications; there are two simplifications that you're complaining about here.  The first is doing the example in decimal, when the actual computer will almost certainly be doing it in binary.  You say "it is just as easy" for someone trying to grasp the basic concept of arithmetic coding to follow an example in binary as in decimal.  Can you actually tell us why you think that?  Among all the computer scientists and programmers I know, I don't know anyone who thinks in binary; I don't know anyone who finds it "just as easy" as decimal.  We're trying to get across the basic concept of arithmetic coding; even if (for the sake of argument) 90% of our audience found binary just as easy as decimal, and only 10% found it an obstacle to following the obstacle and understand the principle illustrated, there's no reason to throw any unneeded obstacle in the path of the readers.


 * The second simplification is showing the algorithm as if the calculation was performed with infinite precision at each step and then converted to digits only at the end of the calculation. I think this is the proper way to introduce the concept -- which is a hard concept for those who don't already understand it -- that this mathematical interval actually contains the whole message we encoded into it.  We would only obscure understanding of the basic concept by introducing the complication of re-normalization too fast.  It would be a good idea to discuss the difference between the simplified version that makes the concept clear and the way it's actually done in production code; I think it would be wiser to discuss those differences after the basic concept has been explained.  -- Antaeus Feldspar 17:42, 13 Nov 2004 (UTC)

Renormalisation
The description isn't quite accurate. You've left out an important case. Namely when the range is of the form [0111111111..., 1000000000...). Digits are also shifted, just not sent to the output. Scottcraig 20:16, 3 Dec 2004 (UTC)


 * Can you go into more detail about that? I'm not following the example you just gave.  -- Antaeus Feldspar 20:34, 3 Dec 2004 (UTC)

The idea is that there is no limit to how much buffering you might have to do before you can know the correct value for a certain bit. Whenever the start and end of the range match in some number of bits, those bits are known for sure, and can be sent to the output. However, there's no limit to how small the range can get (i.e. how many bits are required to describe it) without any bits matching at all. This is problematic because the encoder and decoder need unlimited buffers.

To solve this problem, encoders typically use a technique called "bit-stuffing" that wastes a bit, but puts a finite limit on the amount of buffering required. Conceptually, bit stuffing puts a zero bit in the middle of a long string of 1 bits to limit the number of bits that can be flipped by a carry. So, for instance, you might find that you have reduced the range to [.01111111, .10000000). If your encoder/decoder has only 8 bits of buffer, you're toast because you still don't know what that first bit will be.  However, in this situation, you can stuff a zero bit after the 7th bit, thereby representing this range as [011111101, 011111110). The extra digit in the 8th position has "absorbed" the carry from subsequent bits. Now you have 7 known bits, and you can send them to the output. Strictly speaking, this representation is no longer truly binary: the bit-stuffed sequence "011111110" is equivalent to the binary fraction ".10000000". --Doradus 20:37, Dec 21, 2004 (UTC)


 * I believe I understand now. What confused me was describing a "range" but marking it in interval notation instead.  The important part is that the range can get smaller but will not always naturally converge to matching initial digits; however, when such convergence fails to occur naturally, nothing stops the encoder from voluntarily limiting its range to create such matching initial digits, which it does and then proceeds to renormalize.  Is that correct? -- Antaeus Feldspar 23:31, 21 Dec 2004 (UTC)


 * I never looked at it that way. Since the range [.0111111, .1000000) is too hard to deal with for practical reasons, we take all the messages that would have been mapped to that interval and re-map them to the new interval [.01111110, 0.1111111), which is half of the original interval.  From that point of view, the encodings are still perfectly valid binary fractions; it's just that the encoding interval [0,1) has some holes in it due to the practical limitations of the encoder/decoder.  --Doradus 01:47, Dec 22, 2004 (UTC)

I will explain more thoroughly. I had assumed you had known about this already and only had to remind you, which is why I kept it short. I didn't see your question until today.

Suppose the current interval is [0111111111..., 1000000000...) to a large number of digits, with more symbols to encode. Now the left endpoint may move above 1/2, in which case both ends will start with 10000... . But the right endpoint could move below 1/2, in which case both ends start with 0111111... . But if all those digits are kept until either case happens, then all the precision is lost. So instead of this, we shift both endpoints past the 0111... and 1000... parts, keeping a count of the number of shifts. When case 1 or 2 eventually happens, we can output the correct digits. If neither has occurred and there are no more symbols, just output 10000... to the saved number of 0s. Scottcraig 07:53, 22 Dec 2004 (UTC)


 * I see. That's different from bit stuffing.  I wonder why people do bit stuffing when this seems so much simpler and better?  --Doradus 15:36, Dec 22, 2004 (UTC)


 * There are pros and cons to both. The original way achieves better compression. But it does not work so well with streaming data. It pauses, processing some symbols, then outputs a burst. I suppose this is why bit stuffing was developed. (I hadn't heard of it before. Thanks.) Bit stuffing ensures that every byte is sent as processed, at a slight cost. It's not that these situations occur that often anyway, so there wouldn't be much penalty. But it may be necessary to ensure that the pausing never occurs, I suppose. There is also a slight possibility that the stream may pause for so long that the number of shifts wraps around. This seems extremely unlikely, but bit-stuffing prevents it.


 * Actually, the whole discussion of renormalization concerns the specific implementation of the algorithm, not the algorithm itself. If we can't talk about the binary implementation, then the topic of renormalization is pretty well disconnected from anything else.


 * No one ever said we couldn't talk about the binary implementation. What was objected to was the idea that we should do all the examples in binary, on the assumption that someone who does not yet understand the basic idea of arithmetic coding and is coming to this article to learn will find it no obstacle at all to have the only demonstrations of the central concept in binary rather than decimal arithmetic. -- Antaeus Feldspar 07:04, 23 Dec 2004 (UTC)


 * The topic arises by answering the question, "Which bits can we store?"


 * Consider the current interval endpoints as a binary fraction, say xx...xx0abcdefgh... and xx...xx1ijklmnop... . They agree on the first part, and any subsequent narrowing will as well, so these bits can be sent to output. At the first difference, the left always has a 0, and the right has a 1. These can't be sent out yet, but they don't need to be stored either. So start storing from there, 16 bits each or what have you.


 * But if we stop here, then the problem under discussion arises. If the left starts with 01111, and the right starts with 10000, then storing the 1111 and 0000 robs some precision. (Remember the initial 0 and 1 are not stored anyway.) Because we know that eventually the 01111 will flip to 10000, or vice versa. If the string length goes past 16 (or what have you) bits, then the algorithm hits a wall with no way to recover. So instead, keep a counter of the length that will be output when we know if it's 0111... or 1000..., and store the 16 bits after that.


 * So suppose at the current step, we are storing three variables as unsigned ints: n, left, and right. Then not counting the bits already output, the endpoints are 216*(2n-1) + left, and 2n+16 + right. i.e., 01111leftbits and 10000rightbits. Note that left can be greater than right.


 * Next suppose that the next encoded symbol reduces the interval to a/c to b/c of the current one. (The data model must use fractions.)


 * The current interval has a width of 216 + right - left, or w = right + not(left) + 1, which is always positive. Then the new interval is left := left + (w div c) * a, to right := left + (w div c) * b, with adjustments still to be made depending on overflow in the calculations. Step 1 Case a) The calculation of left overflows, then the left end has moved above the half mark. Output 10...0 and set n to 0. Case b) The calculation of right does not overflow, then the right point has moved below the half mark. Output 01...1 and set n to 0. Case c) left does not overflow and right does. Do nothing. Step 2) In cases a or b, output the common starting digits and shift left. Step 3) If (0)111/(1)000 occurs, shift left while incrementing n.


 * I'm not really sure if all the above details are correct; there may be some off by one errors or the like. I'm sure the correct code can be found. I just don't have it in front of me. Scottcraig 05:28, 23 Dec 2004 (UTC)


 * In the first table under the heading: Interval reduced to eight-bit precision (as fractions), the denominators in the fractions are 256. Should they not be 255? 60.234.137.171 19:43, 18 March 2007 (UTC)

111 in sources of inefficiency wrong
I think that the 111 is wrong. The binary 0.00110 is the shortest sufficient stream ((5/27 + 6/27)/2 in binary). (Of course you cut off the leading 0 and you get 001100 later). I guess the 111 suggests 0.00111 but this would be wrong, too.

Le me explain why 0.00110:

Let's say 1 = NEUTRAL, 2 = NEGATIVE, 3 = END OF STREAM and the string to compress is "123" (as in the article)

No input (Input 0): You have no clue about any numbers after the dot. In worst case it is 0.1111111... which goes asymptotically towards 1. Therefore 0 => [0,1)

Input 0.0: You have no clue about any numbers that follow. In worst case it is 0.0111111... which goes asymptotically towards dec(0.1) = 0.5. Therefore 0.0 => [0, 0.5) Now we can say that the first word will be 1 or 2 but not 3 which had the interval [0.66,1)

Input 0.00: You have no clue about any numbers that follow. In worst case it is 0.00111111... which goes asymptotically towards dec(0.01) = 0.25. Therefore 0.00 => [0, 0.25) Now we can say that the first word will be 1 for sure. the second word could still be 1,2 or 3, because [0, 0.25) contains [0,1/9), [1/9,2/9) and a part of [2/9,3/9)

Input 0.001: You have no clue about any numbers that follow. In worst case it is as before 0.00111111... which goes asymptotically towards dec(0.01) = 0.25. Therefore 0.001 => [dec(0,001), 0.25) = [0.125, 0.25) Now the second must be 2 or 3 because the interval doesn't contain [0,1/9)

Input 0.0011: You have no clue about any numbers that follow. In worst case it is as before 0.00111111... which goes asymptotically towards dec(0.01) = 0.25. Therefore 0.0011 => [dec(0,0011), 0.25) = [0.1875, 0.25) This is still not sufficient to know the second word because it still contains [1/9,2/9) and a part of [2/9,3/9)

Input 0.00110: You have no clue about any numbers that follow. In worst case it is 0.00110111... which goes asymptotically towards dec(0.00111) = 0.21875. Therefore 0.00110 => [dec(0,00110), 0.25) = [0.1875, 0.21875) Now it is in [1/9,2/9) and the second word is 2

Now if we go for the third word then [0.1875, 0.21875) is also exclusively in [5/27,6/27) = [0.1851...,0.2222...). Therefore we don't need more bits to know that the third word is 3

0.00111 would require a lot more bits by the way because then you would need to shrink [0.2185, 0.25) to [0.2185, 0.2222) to include the word 3

Now we cut the leading 0. because it is always 0. and we get 00111 which is 5 bits! — Preceding unsigned comment added by 78.34.236.143 (talk) 20:49, 8 April 2012 (UTC)

Encoding shorter than entropy is misleading
The article used to contain this text:

...we could have encoded the same message in the binary fraction .1000101 (equivalent to .5390625 decimal) at a cost of only 7 bits. This is actually shorter than the information content, or entropy of our message, which with a probability of 0.6% has an entropy of approximately 7.381 bits.

I have changed this because it is misleading. The fact is that giving the binary fraction as ".1000101" is ambiguous; these same bits can start any binary fraction in the range [.1000101, .1000110). For instance, both sequences "10001010" and "10001011" would start with the same bits, but one would decode to "NEUTRAL, NEGATIVE, END" while the other decodes to "NEUTRAL, END".

This is a subtle problem. We, as humans looking at the fraction ".1000101", know that all the subsequent bits must be zeoro. How do we know that? Because the symbol following the final "1" is a quote mark, which indicates that the fraction ends at that point. We have cheated, and implicitly added an extra symbol to the encoding. In a computer, there are two ways to deal with this problem:
 * 1) Add enough bits to make the fraction unambiguous.  In this case, adding another zero bit makes the encoding unambiguous, in that subsequent bits don't affect the message.  Now our encoding is 8 bits, which is more than the message's entropy.
 * 2) Encode the length of the stream in out-of-band data.  This is an acceptable real-world solution, since filesystems do store file sizes in their metadata, but now the metadata must be included in theoretical discussions of message length.

In either case, Claude Shannon rests happily in his grave. --Doradus 19:43, Dec 21, 2004 (UTC)


 * Of course the encoding of a particular message can be shorter than its entropy. The title of this section is just wrong.


 * Shannon's result is that no encoding scheme can encode every message shorter than its entropy. This does not preclude the possibility that some are shorter, as long as some are not.


 * Fair enough. I have renamed this section, and removed the contentious statements from my explanation above.  But what you describe is not what happened here.  I think my explanation given above is correct, and it has nothing to do with decimal representation.  --Doradus 20:41, Dec 21, 2004 (UTC)


 * On the other hand, you have uncovered a problem with the article. The ambiguity you describe comes from performing the encoding in decimal before converting to binary. I would prefer to do everything in binary. I think this would be much clearer, and provide the motivation for arithmetic coding in the first place. But others feel that requiring familiarity with binary would be too burdensome for our audience. As a result, the description lacks motivation, and has problems like the one you found. Scottcraig 20:15, 21 Dec 2004 (UTC)


 * Ooops. Thanks for catching that; I hadn't realized that "10001011" would decode to a different symbol stream than "10001010".  Perhaps you could add something to the article about how a decoder knows it's added enough digits to make the end result truly un-ambiguous?


 * At the risk of making myself look even more like a dunce than I do, I also have to question your statement that a message's encoding can't be shorter than its entropy. That is certainly true over the set of all messages, that the total of their encodings will always be more than the total of their entropies.  But consider a set of symbols such that one symbol, let's call it A, has a probability of just less than .5, and the least probable symbol still has a probability greater than (.5 - p(A)).  (Hope I got that notation right...)


 * Anyways, under those conditions, those symbol probabilities would produce a Huffman tree that assigns A a one-bit code -- and yet the entropy of A must be greater than 1 bit, since its probability is less than .5. Every time an A appears, then, the cost of the encoding goes up by 1, but the total entropy of the message goes up by more than 1.  With that being the case, doesn't it follow that to create a message whose encoding was shorter than its entropy, all we would need to do is take any message which did not meet that criteria and add A's until it did? -- Antaeus Feldspar 20:59, 21 Dec 2004 (UTC)


 * Agreed. I was wrong to say that no message can ever be encoded in fewer bits than its entropy.  --Doradus 21:12, Dec 21, 2004 (UTC)

The relationship between arithmetic coding and Huffman
Could Antaeus Feldspar give a reference to e.g. a scientific paper that proves "it has been shown that Huffman is just a specialized case of arithmetic coding"? It is clear to me that in the case of the probabilities of the symbols being 2^(-n) n=(1,...), then Huffman and the arithmetic coder get the same results. But this is not showing that Huffman is a special case of the arithmetic coding?! I don't believe that "Huffman is a special case of arithmetic coding" but that there are cases were both algorithms work identically?? -- stef@nstrahl.de 18:10, 16 Feb 2005 (CET)


 * No, I do not have a reference in the literature for this. My understanding that Huffman had been at some point formally proven to be a specialized case of arithmetic coding was based on years of subscribing to the Usenet newsgroup comp.compression.  I would be unable at this point to find the exact post(s) where that claim was made.  If you feel you have a better phrasing for the relationship between arithmetic coding and Huffman coding, by all means present it for consideration. -- Antaeus Feldspar 00:54, 17 Feb 2005 (UTC)


 * Thanks for the reference to comp.compression. If found where Charles Bloom gives a reference to his paper Binary Range Encoding - A New Entropy Encoder in which he joins Shannon-Fano, Huffman, and arithmetic Coders into a theoretical framework. I haven't read it yet but I think this will give the prove. I'll report  more details if I have done so. -- stef@nstrahl.de 22:01, 16 Feb 2005 (CET)


 * That probably is it. Bloom was/is one of the most prolific and most knowledgeable c.c posters and I always tried to keep up with his posts. -- Antaeus Feldspar 00:35, 19 Feb 2005 (UTC)



There is a basic flaw to this article. Arithmetic Coding is not a data compression scheme, just a method of encapsulating the result of applying a data prediction mechanism to a data stream. The beauty of it is that it separates out the encoding problem entirely, leaving the focus on the prediction mechanism. For a given model (as it is called in the text) arithmetic encoding will achieve (aside from small implementation costs) 100% of the entropy of any data stream **with respect to the model**.

You can encode "War and Peace" down to a single bit plus a stop code if you use arithmetic coding, with "War and Peace" as the prediction model. There is no global lowest degree of compression achievable ... only a lower bound for a given data stream wrt a given model.

My honours thesis in 1980 was on data compression and I showed that Huffman coding and LZW could both be viewed as a prediction model plus arithmetic coding. The original algorithms result in a discretized version of the output of an arithmetic coding approach. {Paul McGee]Pmcgee2 (talk) 05:11, 22 December 2007 (UTC)

Hi I wonder how these figures were obtained. Perhaps an explanation is needed too? * 000: 85.7%   * 001, 010, 100: 4.5% each * 011, 101, 110: .24% each * 111: 0.0125%--Deeflex (talk) 14:47, 21 February 2009 (UTC)

I was wondering how the "1.3 bits per 3 symbols" metric was obtained. When I compute a Huffman tree for 8 symbols using a frequency model similar to that above (where one symbol has a frequency larger than all others combined)I obtain an average length of 3.5 bits per symbol. —Preceding unsigned comment added by 75.39.112.166 (talk) 20:48, 1 February 2010 (UTC)

Range coding and arithmetic coding
The article currently mentions range coding (which is a good thing for it to do), but it does seem to support what I understand to be some common misconceptions about range coding in relation to arithmetic coding.

I've already expounded on this in Talk:Range encoding, but the key points are:-


 * 1) Range coding seems to be a different interpretation of the same thing, rather than being a different thing itself.
 * 2) As I understand it, all range coders (decoders/encoders) are the corresponding arithmetic coders, and vice-versa. (It's just a matter of what you interpret the actual range/arithmetic code as being.  You could interpret it as being an arithmetic code in the usual way, or you could interpret it as being the common prefix of a range of natural numbers (non-negative integers), in the usual range coding way.)
 * 3) Differences in renormalisation are implementational, and there's no reason why byte-based renormalisation can't be applied in implementations of arithmetic coding.
 * 4) Although I am not a lawyer, it does seem that the 'myth' that arithmetic-coding-applicable patents don't apply to range coding relies on range coding being different to arithmetic coding. I strongly suspect it's a myth, a rumour that keeps getting (unwisely) propagated.  But, as I said, I am not a lawyer.  More on this below.

--Simon G Best 17:24, 20 July 2006 (UTC)

Range coding and arithmetic coding patents
The article says:-

"Range encoding, unlike arithmetic coding, is generally believed not to be covered by any company's patents."

Source? It's a rumour I've come across quite a number of times, now, but I strongly suspect it's a myth (as range coding and arithmetic coding do seem to be the same thing (just interpreted in two, different ways)). I'm certainly doubtful that it's "generally believed", as there seems to be no shortage of people who look at range coding, and conclude that it's really just arithmetic coding interpreted a different way. (In my experience, this kind of observation seems to be made, by someone or other, whenever range coding comes up on the net. (And no, not just by me :o) ).)  "Rumoured" might be better than "generally believed".

I should emphasise, of course, that I am not a lawyer.

--Simon G Best 17:24, 20 July 2006 (UTC)

About patent validity. For me it looks like that some patents mentioned in the article are not valid any more (17 years from grated date or 20 years from filind date) + some patens are expiring very soon. During years 2008. —Preceding unsigned comment added by 88.114.252.64 (talk) 20:34, 11 May 2008 (UTC)

The whole patent section is incorrect and needs to be redone. The .alt cryptography FAQ is pretty old and somewhat outdated on this issue. The paper in which range coding was introduced was dated 1979, so range coding is believed to be patent free. The equivalence of range coding and arithmetic coding and the fact that range coding was invented by at latest 1979 when the paper introducing it was published, invalidates many of the arithmetic code patents. Agalmic (talk) 19:55, 1 November 2008 (UTC)

Changes to this subsection of the article
(I've added a subsection heading for this, so that it doesn't seem to be part of the preceding subsection on patents. --Simon G Best 21:38, 20 July 2006 (UTC))

I've now edited the subsection of the article on range encoding to clarify and expand upon the relationship between range coding and arithmetic coding. It was a bit of a rewrite. The article on range encoding needs to be brought into line with these changes, which I may well do. It's just too hot and humid here, right now. Especially humid.

--Simon G Best 21:34, 20 July 2006 (UTC)

Applications
What common programs/file formats/standards use Arithmetic coding? --Apoc2400 08:18, 10 October 2006 (UTC)


 * bzip (ok, not exactly common, sadly), PAQ, most PPM implementations, DjVu, JBIG/JBIG2, JPEG 2000, H.264/MPEG-4 AVC... --Piet Delport 01:21, 11 October 2006 (UTC)

History
The history of AC would be quite dandy. When it started development, first implementations, how it has developed over the years to improve, who were the major contributors (such as Jorma Rissanen and Glen G. Langdon Jr.), etc. Spodi (talk) 11:41, 30 December 2007 (UTC)

combinatorics
I found myself often confuzed when permutations and variations and combinations are all called combinations. (Lets say you have apple, banana and pear. There are 27 variations, 6 permutations and 1 combination of 3 of those 3 fruits.) Is there any particular reason for doing so? 84.16.123.194 (talk) 10:59, 27 September 2008 (UTC)

Patent Section Incorrect
This article needs to be brought into agreement with the range encoding article. Arithmetic coding and range coding are mathematically equivalent and range coding was introduced in an article in the year 1979. Therefore the article should not suggest that the patent situation precludes the implementation of arithmetic coding without licensing. Only specific implementations of arithmetic coding necessary to implement certain standards are covered by patents. The general arithmetic coding algorithm however was prior art by the time these patents were issued. Agalmic (talk) 20:02, 1 November 2008 (UTC)
 * This was mentioned a few times. Without getting an IP lawyer involved, I'm uncertain of just what the scope of the arithmetic coding patent is - I think it might be more accurate to say that its scope hasn't been tested in court. Dcoetzee 22:46, 2 November 2008 (UTC)

Also, right now the first patent is marked as expired. Aren't many of the other patents expired as well? For example, the second one was granted in 1981, and by my count should have expired in 2001.. 130.236.55.37 (talk) 10:59, 2 February 2010 (UTC)

It looks like all the patents have expired. Is this the case, meaning that arithmetic coding is now open? — Preceding unsigned comment added by 73.32.145.62 (talk) 14:10, 13 October 2015 (UTC)

minor copyediting
I sincerely hope that I have not altered this article in any adverse way as pertaining to the technical information contained herein. I am unfamiliar with this subject and would ask someone else more knowledgeable to read my edit, and check that I have done some good. If you judge otherwise, feel free to revert me with no hard feelings.--Paraballo (talk) 03:42, 16 May 2009 (UTC)

Invalid of obsolete links
Some of the links on this page are obsolete or with restricted access, e.g. an article "An Introduction to Arithmetic Coding, IBM J. RES. DEVELOP. VOL. 28, No 2, March 1984" in "Range encoding" section which requires a valid login or an article "Arithmetic coding" in "References" section which does not seem to be right.

Radekg1000 (talk) 23:48, 25 November 2010 (UTC)

Suggestions
This article is better than the last time I looked, but it could still be a lot better. A couple of things that jumped out at me were:

1) In the introductory paragraph it states "Normally, a string of characters such as the words "hello there" is represented using a fixed number of bits per character, as in the ASCII code" - This page is about a specific, quite advanced form of data compression and does not need to spell out this simple concept. The pages for entropy, lossless compress etc... are linked right beforehand and are enough for uninformed users to follow.

2) In the explanatory section the article states "A more efficient solution is to represent the sequence as a rational number between 0 and 1 in base 3, where each digit represents a symbol. For example, the sequence "ABBCAB" could become 0.0112013" - without any explanation of how this is performed. An example and some detail come much later in the article but I think this idea, which is fundamental to the principle of arithmetic coding, should be explained in much more detail here. — Preceding unsigned comment added by 91.143.178.131 (talk) 17:09, 19 October 2011 (UTC)

Theoretical Limit...
Could someone look at this equation please, as it doesn't balance. It's in the theoretical limit section. The LHS appears to be correct.
 * $$n \log_2(n) - \sum_{i=1}^A f_i \log_2(f_i) = -\left[\sum_{i=1}^A p_i \log_2(p_i)\right] * n$$

UKWikiGuy (talk) 23:31, 30 May 2013 (UTC)

Asymmetric numeral systems (ANS) - combining speed of Huffman with accuracy of arithmetic?
There is lots of going on about this alternative to arithmetic coding, which have similar complexity to Huffman coding, but handles fractional bits of information - have accuracy like arithmetic coding.

Some implementations: https://github.com/Cyan4973/FiniteStateEntropy https://github.com/rygorous/ryg_rans Blogs: http://fastcompression.blogspot.fr/ http://fgiesen.wordpress.com/2014/02/02/rans-notes/ http://cbloomrants.blogspot.com/

Maybe it should be mentioned somewhere in the article? — Preceding unsigned comment added by 128.211.178.20 (talk) 21:56, 4 February 2014 (UTC)
 * No. This is blatant self-promotion; see Talk:Huffman coding.  Additions about this to other articles violated WP:SOAP, WP:FORUM, WP:NOR, WP:RS, and WP:notability. Calbaer (talk) 00:14, 7 June 2014 (UTC)

What does this even mean?!
"The cumulative frequency is the total of all frequencies below it in a frequency distribution (a running total of frequencies)."

What does that even mean? it's already listed under total frequency, yet the values are different...

Can we get someone who isn't trying to jerk off their ego by using so much jargon to rewrite the article? — Preceding unsigned comment added by 2601:405:4202:C7F0:E0FA:1C06:932F:9C6D (talk) 20:12, 4 September 2016 (UTC)
 * Where is the Jargon? A Shortfall Of Gravitas (talk) 18:15, 4 September 2018 (UTC)

"The cumulative frequency" is the name of column "is the" means we explain what it is "total of" means sum

"all frequencies below it in a frequency distribution" or the "a running total of frequencies" means all numbers in column "frequency" what are above the line you're looking at

A | 1 | 0 (as nothing is above A) | B | 2 | 0 + 1 (as A is above B) | C | 3 | 0 + 1 + 2 (as both A and B are above C) | — Preceding unsigned comment added by 160.83.30.185 (talk) 16:48, 17 April 2017 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Arithmetic coding. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20030228094950/http://planetmath.org:80/encyclopedia/ArithmeticEncoding.html to http://planetmath.org/encyclopedia/ArithmeticEncoding.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 21:58, 17 October 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 2 external links on Arithmetic coding. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20071222053346/http://www.inference.phy.cam.ac.uk/mackay/itila/book.html to http://www.inference.phy.cam.ac.uk/mackay/itila/book.html
 * Corrected formatting/usage for http://planetmath.org/encyclopedia/ArithmeticEncoding.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 02:16, 9 July 2017 (UTC)

Error in radix example in "Arithmetic coding as a generalized change of radix"
Hi,

after working on radix based stuff for a while, I just noticed this example and that it is actually wrong. The radix wouldn't be chosen based on the size of the string (just check, that any value from 4 to 5 in the thusly implied 6 symbol alphabet would never be used. The radix should actually be 4 (or exactly 2 bits). So the radix encoded value of the string "DABDDB" would be 6*2 = 12 bits.

Best regards, Dresdenboy (talk) 09:30, 28 August 2020 (UTC)

Pasco
I'm surprised to not find Rich Pasco's name in the article at all. His 1976 dissertation laid out the algorithms. Most (or at least many) works on arithmetic coding attribute it him, along with Rissanen who published part of it earlier in 1976. Pasco referenced Rissanen saying "One algorithm of the family was developed independently by Rissanen [1976]. It shifts the code element to the most significant end of the accumulator, using a pointer obtained by addition and exponentiation. We shall now compare the alternatives in the three choices, and see that it is preferable to shift the code element rather than the accumulator, and to add code elements to the least significant end of the accumulator." I'll work on adding something about that. Dicklyon (talk) 22:47, 11 September 2020 (UTC)