Talk:Zipf's law

Needs simple leading statement
The article needs to start with a very simple leading statement, so that anyone stumbling onto it can know what it is about. The discussion can then move onto more technical stuff.
 * Provided one. --Jorge Stolfi (talk) 01:24, 9 May 2023 (UTC)

True that "the" is twice as common as the next one?
Is it true that the word "the" does indeed occur about twice as often as the next common English word? The rest of the article seems to allow for some proportionality constants. AxelBoldt


 * It's not true, so I replaced it with a statement about Shakespeare's plays. AxelBoldt
 * It is approximately true, see the example of the Brown corpus. The normalization constant applies to both words, so the ratio is 1/1 to 1/2, that is 2:1. --Jorge Stolfi (talk) 01:28, 9 May 2023 (UTC)

Random typing
The main article claimed that I doubt that very much. For one thing, if you type randomly, all words of length one will be equally likely, all words of length 2 will be equally likely and so on. Or am I missing something? Maybe we should perform a little perl experiment. AxelBoldt
 * the frequency distribution of words generated by random typing follows Zipf's law.


 * I tried this Python code, and plotted the results in a log-log plot. The early ranks are a bit stepped, but the overall pattern fits Zipf's law rather well. The Anome


 * Could you repeat the experiment with all letters and the space getting the same probability? That's at least what I thought off when I heard "random typing". AxelBoldt


 * When I was told of the "random typing" experiment, the "typing" part was more important than "random". If you sit at a keyboard and randomly type, you have a much higher chance of hitting certain keys than others because your fingers tend to like certain positions.  Also, the human brain is a pattern-matching machine, so it works in patters.  If you look at what you type, you might notice you tend to type the same sequences of letters over and over and over.
 * For a brief historical note, this theory was used in cracking the one-time-pad cyphers. Humans typed the pads, so it was possible to guess at the probability of future cyphers by knowing past ones.  It is like playing the lottery knowing that a 6 has a 80% chance of being the first number while a 2 has a 5% chance.  If you knew the percentage chance for each number and each position, you have a much greater chance of winning over time. Kainaw 13:59, 24 Sep 2004 (UTC)

I can't right now, but I'll give you the reason for the skewed probabilities -- the space is by far the most common character in English, and other chars have different probabilities -- I wanted to model that. The Anome

The main article claimed that ''I doubt that very much. For one thing, if you type randomly, all words of length one will be equally likely, all words of length 2 will be equally likely and so on. Or am I missing something?''
 * the frequency distribution of words generated by random typing follows Zipf's law.

Yes, you're missing something.

It doesn't *exactly* match Zipf's law. But then, no real measurement exactly matches Zipf's law -- there's always "measurement noise". It does come pretty close. In most English text, about 1/5 of all the characters are space characters. If we randomly type the space and 4 other letters (with equal letter frequencies), then we expect words to have one of these discrete probabilities: 1/4*1/5: each of the 4 single-letter words 1/4*1/4*1/5: each of the 4*4 two-letter words 1/4*1/4*1/4*1/5: each of the 4*4*4 three-letter words. ... (1/5)*(1/4)^n: each of the 4^n n-letter words.

This is a stair-step graph, as you pointed out. However, if we plot it on a log-log graph, we get x = log( 4^n / 2 ) = n * log(4) - log(2). y = log( 1/5 * (1/4)^n ) = -n*log(4) - log(5).

which is pretty close to a straight line (and therefore a Zipf distribution), with slope m = &Delta;y / &Delta;x = -1.

You get the same stair-step on top of a straight line no matter how many letters (plus space) you use, with equal letter frequencies. If the letter frequencies are unequal (but still memoryless), I think that rounds off the corners and makes things even closer to a Zipf distribution. --DavidCary 01:54, 12 Feb 2005 (UTC)

The problem here seems to be arising from the ambiguarity of the phrase "random typing".To avoid this I'll replace the phrase with "random sampling of a text". Jekyll

It's already gone, never mind. Jekyll 14:45, 18 November 2005 (UTC)


 * this is a pretty minor example of Zipf's law. A much more important one is that the frequency of occurrence of words in natural language conform to Zipf's law. This is in the original 1949 reference. Callivert (talk) 11:29, 24 March 2008 (UTC)

I did some study for university on Zipf's Law. A (in my opinion) very clever and better explanation for Zipf's law is presented by Baek et al. I do not have the time to add this information. In case anyone can do this: http://stacks.iop.org/1367-2630/13/i=4/a=043004 and https://doi.org/10.1038/474164a — Preceding unsigned comment added by 2A02:810D:4740:5520:E1D3:6110:57E:C5B7 (talk) 16:14, 11 December 2017 (UTC)

Why ?
Moved from main article:


 * We need an explanation here: why do these distributions follow Zipf's law?


 * No we don't. Zipf's law is empirical, not theoretical.  We don't know why it works.  But even without a theory, even the simplest experiments that try to model a society of independent actors consistently turn it up!--BJT

Well, empirical facts have to be explained too. It's not enough to simply state that the moon always shows us the same side; you have to give the reason if you try to understand the world. It's the same here. If Zipfian distributions show up in a variety of situations, then there must be some underlying principle which generates them. I doubt very much that "we don't know" that principle. AxelBoldt


 * I agree - every theory begins with empirical evidence. The theory models an explanation to fit those facts. I'm sure someone has tried to come up with an explanation? 70.93.249.46


 * We need an explanation here: why do these distributions follow Zipf's law?

An excellent question. However, I doubt there is a single cause that can explain every occurance of Zipf's law. (For some distributions, such as wealth distribution, the cause of the distribution is controversial).

''Well, empirical facts have to be explained too. It's not enough to simply state that the moon always shows us the same side; you have to give the reason if you try to understand the world.''

Good point. However, sometimes we don't yet know the cause of some empirical facts -- we can't yet give a good explanation. In those cases, I would prefer the Wikipedia article to bluntly tell me "we don't know yet" rather than try to dance around that fact.

While it is true that Zipf's law is empirical, I agree with AxelBoldt that it is useful to have an interpretation of it. The most obvious place to look is the book that Zipf himself wrote in which he linked his observation to the Principle of Least-Effort, kind of an application of Conservation of Energy to human behavior.

I've just measured Polish Wikipedia page access distribution using Apache logs (so they had to be heavily Perlscripted) for about 2 weeks of late July, only for main namespace articles, and excluding Main Page. In most part it seems to be following Zipf's law with b about 0.5, except at both ends, where it behaves a bit weird (what was to be expected). Now why did I get constant so grossly different from stated constant for English Wikipedia ?

Some possibilities: Taw 01:38, 4 Aug 2003 (UTC)
 * Polish and English Wikipedias really have different Zipf's factors
 * It was due to my perlscripting
 * Measurment given for English Wikipedia here is wrong for some reason, like measuring only top 100, and not all articles.

I think we may be fast approaching the point when merging this article with Zipf-Mandelbrot law would be appropriating, along the way doing some reorganizing of the article. Michael Hardy 22:57, 6 Dec 2003 (UTC)

Although the reason is not well-understood, mechanisms that bring about the Zipf distribution have been suggested by physicists. Power laws tend to crop up in systems where the entities (words in this case) are not independent, but interact locally. The choice of a word isn't random, neither does it follow a mechanistic prescription - the choice of a word depends strongly on what other words have already been chosen in the same sentence/paragraph. I think these speculations should be mentioned in the article as a side note, for the sake of completeness. 137.222.40.132 12:45, 17 October 2005 (UTC)

It is meaningful to ask why in general a particular distribution is found in nature - the passage above is a good start; I'd like more clarification - for example, the normal distribution arises when their the outcome is caused by a large number of minor factors where no particular factor predominates. The bimodal distribution arises when their are a large number of minor factors coupled with one predominant factor. The Poisson distribution arises when an event is the consequence of a large number of rare events converging. Etc. For Zipf's distribution, I would like to know, why does interdependence of events lead to it?

Another interesting example of Zipf's law is the distribution of 25,000,000 organic compounds among the possible shapes that the ring structures can take (download PDF or HTML). Half of all the compounds have just 143 shapes (out of more nearly 850,000 shapes). The authors refer to a "rich-get-richer" reason: When we are speaking or writing, the probability that we will use a particular word is proportional to the number of times that we have already used it. The particular reason for organic chemistry is that is quicker and cheaper to synthesize new compounds if you can buy the "parts" commercially, or if syntheses for the "parts" have already been published; thus the most common shapes tend to be used more.

In these word-list examples we also have to do with a fundamental fact of linguistics: Every language has a few dozen "structural" words, mostly articles, conjunctions, and prepositions, which have to be used to achieve proper syntax (note the three most popular words in English). This can skew the distribution of the most popular words.

This would also apply to populations of cities. A large city has many opportunities for work and leisure, so it will attract new inhabitants and keep its current ones. On the whole, one would expect that the probability that new inhabitants will come would be, very roughly, proportional to the existing population.

So the reason seems to be that prior use encourages further use. Almost banal.

--Solo Owl (talk) 17:26, 7 September 2008 (UTC)

Consistency of variables in text
In the examples section, the variable quoted in each case is b. However, this variable is not used anywhere else. Some consistency throughout the article would be nice (and less confusing!). &mdash; 130.209.6.41 17:01, 1 Jun 2004 (UTC)


 * I agree - please explain what is b ? \Mikez 10:00, 8 Jun 2004 (UTC)


 * I was about to add something about this as well... is it what is called s in the discussion of formulas? It's not clear from the text. -- pne 14:01, 8 Jun 2004 (UTC)


 * Well, Ive seen Zipf's Law stated as $$f_n=[\mbox{constant}]/n^b.$$ So Im pretty sure that s and b are the same thing. Im changing b to s in the "Examples..." section. -- Aparajit 06:01, Jun 24, 2004 (UTC)


 * Can someone check the values of b / s given in the examples? Especially the word frequency example.  I took the data for word frequencies in Hamlet, and fitted a line to the log-log plot.  This gave a slope of more like 1.1, rather than the 0.5 figure quoted here.  Taking the merged frequencies over the complete set of plays gives a value which gets towards 1.3.  This would agree more with the origin of Zipf's law, which is that the frequency of the i'th word in a written text is proportional to 1/i.  The value of 0.5 seems much too small to match this observation.  Graham.

Linked site doesn't exist
It seems to me that the special page of the [|most popular pages] no longer exists, but it's used for an example in this page.

Has this page simply moved or do we need to get a new example?

reported constants in Examples section
the examples section reports values of s < 1 as resulting from analysis of wikipedia page view data. the earlier discussion correctly notes that such values do not yield a valid probability distribution. what gives? perhaps (s - 1) is being reported?


 * Or it could be that that value of s is right for a moderately large (hundreds?) finite number of pages. That seems to happen with some usenet posting statistics. Michael Hardy 22:07, 28 Jun 2004 (UTC)

I miss reference to Zipf's (other) law: the principle of least effort.

s = 1?
I plotted Shakespeare's word frequency lists and top 5000 words in Wikipedia: Where did the old value of s~0.5 come from? -- Nichtich 00:27, 23 Jun 2005 (UTC)

Sources needed for examples
Section 3 ("Examples of collections approximately obeying Zipf's law") has a bunch of examples with no further explanation or reference (Shakespeare excluded). That's highly undesirable, so let's get some sources. By the way, I find the final point (notes in a musical performance) very questionable. I imagine it would depend very much on the type of music. EldKatt (Talk) 13:20, 19 July 2005 (UTC)


 * It might not be a bad idea to give examples from Zipf's book. Also see the article by Richard Perline in the February 2005 issue of Statistical Science. Michael Hardy 22:28, 19 July 2005 (UTC)

Too technical?
Rd232 added the "technical" template to the article. I've moved it here per template. Paul August &#9742; 03:49, 27 November 2005 (UTC)


 * There have been numerous edits to the article since the template was first added almost a year ago. Also, there has been no discussion of what about the article is too technical so I've removed the template.  Feel free to put it back, but if you do, please leave some comments as to what you find is too technical and some suggestions as to how to improve the article.  Lunch 02:25, 21 November 2006 (UTC)

Does Wikipedia traffic obey Zipf's law?
Yes, apparently, with an exponent of 0.5. See Does Wikipedia traffic obey Zipf's law? for more. -- The Anome 22:45, 20 September 2006 (UTC)


 * Why 0.5? And what else has the exponent? I've got 0.5 in some genomic dta, don't know why its 0.5 and not 1.0 .... 67.198.37.16 (talk) 18:17, 24 February 2020 (UTC)

Wikipedia's Zipf law
Just a plot of English Wikipedia word frequencies: http://oc-co.org/?p=79


 * Is this plot availiable to Wikipedia - i.e. Free content? It would look good in the article.
 * Yes, it is released under LGPL by me, the author :) --  Victor Grishchenko


 * I downloaded it, tagged it as LGPL with you as author, and put it into the article - please check it out and make corrections if needed. This is an excellent demonstration of Zipf's law (and its limitations). Thanks! PAR 15:00, 29 November 2006 (UTC)

--

This article has 107 "the" and 68 "of" last time i checked.

-- —Preceding unsigned comment added by 200.198.220.131 (talk) 13:35, 5 October 2009 (UTC) Now this is interesting. It is, however, not a true power law, as Grishchenko shows quite clearly by superposing blue and green lines. That is, English Wikipedia is not an example of Zipf’s law. The tail (the lower righthand part of the curve) is not typical of power laws.

The first part of the plot, for the 8000 or so most common words, does follow a power law, with exponent slightly greater than 1, just as we would expect from Zipf’s Law. The rest of the plot, for another million different words in English Wikipedia, follows a power law with exponent approximately 2; this part has the tail that we look for in power laws.

My guess is that in writing encyclopedia articles, one must select from the 8000 most common words just to create readable prose. Hence the log-log plot with slope ~1.

An encyclopedia, by its nature, is also going to contain hundreds of thousands of technical terms and proper names, which will be used infrequently in the encyclopedia as a whole. Hence the log-log plot with a steeper slope. Why should this slope be ~2? Does anyone know?

Does Zipf’s law apply to individual Wikipedia articles? Do other encyclopedias behave the same?

--Solo Owl (talk) 17:59, 7 September 2008 (UTC)
 * It's a general principle that words drop off more quickly than Zipf's law predicts. If we looked at word n-grams I think we would see something closer. Dcoetzee 21:27, 31 October 2008 (UTC)

+ 1 paper related to Wikipedia's Zipf law:
 * Index wiki database: design and experiments. Krizhanovsky A. A. In: FLINS'08, Corpus Linguistics'08, AIS/CAD'08, 2008. -- AKA MBG (talk) 17:52, 3 November 2008 (UTC)

On "Zipf, Power-laws, and Pareto - a ranking tutorial"
I've found two doubtful places in the tutorial by L.Adamic (ext. link N3). Probably, I've misread something...

First, "(a = 1.17)" regarding to Fig.1b must be a typo; the slope is clearly -2 or so.

Second, it is not clear whether Fig.2a is a cumulative or disjoint histogram? To the best of my knowledge, logarithmically-disjointly-that-way-binned Zipf must have slope=-1 and not -2. I.e., if every bin catches items which popularity resides in the range $$[c^{i}:c^{i+1})$$. Just to verify it, I did a log2-log2 graph of log2-binned word frequencies compiled from Wikipedia. I.e. y is log2 of number of words mentioned $$2^x$$ to $$2^{x+1}-1$$ times in the whole Wikipedia. Although the curve is not that simple, it shows slope=-1, especially in regard to more frequent words.

Any thoughts? Gritzko


 * Yes it looks like a typo, a=1.17 certainly is not right. Regarding Fig 2a, it is cumulative (the vertical axis is "proportion of sites" which will have an itercept at (1,1)). The Zipf law does not specify an exponent of -1, just that it is some negative constant. It happens to be close to -1 for word frequency, but maybe its closer to -2 for aol user data. PAR 14:20, 1 January 2007 (UTC)


 * "As demonstrated with the AOL data, in the case b = 1, the power-law exponent a = 2.", i.e. b is "close to unity" in the case of AOL user data. I had some doubts whether Fig 2a is cumulative because at x=1 y seems to be slightly less than 1. Probably, it is just a rendering glitch. Thanks! -- Gritzko

k = 0 in support?
I'm not sure if k=0 should be included in the support... pmf is not well defined there, as 1 / (k^s) = 1 / 0 and it -> +inf. Krzkrz 08:56, 3 May 2007 (UTC)

biographical information
I was surprised that there wasn't even a brief note at the beginning of the article on who Zipf is (was?). I think it's sort of nice to see that before you get into the technical stuff. Jdrice8 05:38, 14 October 2007 (UTC)


 * That was the brief second paragraph of the article. Now I've made it into a parenthesis in the first sentence, set off by commas. Michael Hardy 01:58, 15 October 2007 (UTC)

Word length
I'm sure I've come across in a no. of places the term Zipf's law used to refer to the inverse relation between word length & frequency. This may well be a misuse, but isn't it common enough to be mentioned with a link to whatever the correct term is? Peter jackson (talk) 14:43, 15 September 2008 (UTC)

Same here. I wish the page explained that up front, but I don't know enough to write it. — Preceding unsigned comment added by 76.94.255.117 (talk) 20:48, 14 July 2012 (UTC)

This page supposedly has a quote from Zipf: http://link.springer.com/content/pdf/10.1007%2F978-1-4020-4068-9_13.pdf

“that the magnitude of words tends, on the whole, to stand in an inverse (not necessarily proportionate) relationship to the number of occurrences” — Preceding unsigned comment added by 216.239.45.72 (talk) 19:43, 16 May 2013 (UTC)

The word-length sense of the term "Zipf's law" is commonly recognized in the psycholinguists and HCI communities. It would certainly not be difficult to find this usage in the literature of those fields. I don't see how this could be considered a "misuse". It should be handled as a separate sense of the term.

Lead
The lead must be rephrased: defining Zipf's law in terms of a Zipfian distribution is no help to the reader who must actually learn. Srnec (talk) 15:11, 31 October 2008 (UTC) —Preceding unsigned comment added by 81.185.244.190 (talk) 21:28, 8 January 2009 (UTC)


 * I was just coming by to say the same thing. The law itself should be defined in the lede; defining it in terms of something derived from it does no one any good. It's been three and a half years and no one's done anything about it? 174.57.203.45 (talk) 19:12, 1 June 2012 (UTC)


 * Fixed now, I hope. --Jorge Stolfi (talk) 01:32, 9 May 2023 (UTC)

Nonsense? Err Nevermind
The article currently states the following:


 * That Zipfian distributions arise in randomly-generated texts with no linguistic structure suggests that in linguistic contexts, the law may be a statistical artifact.[2]

I don't get it, this sounds like patent non-sense. When one generates a random text, one must choose works with some random frequency or distribution. What distribution is used? The result of random generation should show the same distribution as the random number generator: so if one generates random text using a Zipfian distribution, the result will be Zipfian. So this is a tautology. Surely something different is being implied in citation 2. However, this text, as written, is nonsense. linas (talk) 22:36, 14 March 2009 (UTC)


 * Ah. The one sentence summary was misleading. I modified the article text so that is made sense. The "random text" is actually a random string, chosen from an alphabet of N letters, with one of the letters being a blank space. The blank space acts as a word separator, and the letter frequencies are uniformly distributed. linas (talk) 22:59, 14 March 2009 (UTC)
 * As the text is now, it's still (or again?) nonsense. It claims that the Zipfian distribution is a consequence of using letters to spell words. Just a few sentences above, the article says that words in the corpus are in Zipfian distribution. This has nothing to do with how words are encoded, and the distribution would exist regardless of what, if anything, is used to represent them. 193.77.151.137 (talk) 16:07, 5 April 2009 (UTC)

Frequency of words in English
The article originally stated,

In English, the frequencies of the approximately 1000 most-frequently-used words are approximately proportional to $$1/{n^s}$$ where s is just slightly more than one.[citation needed]

This is misleading, not just because the citation is missing, but also because it seems to me that English word frequencies may well follow the maximum entropy principle: their distribution maximises the entropy under the constraint that the probabilities decrease. Zipfs law, for any $$s>1$$, does not maximise the entropy, because it is possible to define distributions with probabilities that decrease even more slowly. For example, $$P(n) = 1/\log(n+1) - 1/\log(n+2)$$ is a very slow one.

I therefore rewrote this bit somehwat. —Preceding unsigned comment added by 131.111.20.201 (talk) 09:52, 1 May 2009 (UTC)

Problems
I think there is an error in the example log-log graph of Wikipedia word frequency vs. rank. The two different lines of color cyan and blue cannot both represent 1/x^2, as they are labelled. I suspect that one is 1/x^3. Whoever has access to the code that generated the graph, please fix it!!!

—Preceding unsigned comment added by 98.222.58.129 (talk) 04:45, 8 January 2010 (UTC)


 * I agree the cyan plot is wrong but I don't think it can be 1/x3 since it nas the same slope as 1/x2. It is more like 10110/x2.  I am going to ping the creator on this - it definitely needs fixing or removing.  Spinning  Spark  17:01, 4 June 2011 (UTC)


 * Actually, all the guide plots are up the creek: they should all be going through the point (1,1). I suspect the author means $$\mathcal{O} f(x)$$ rather than $$ \sim f(x)$$  Spinning  Spark  17:22, 4 June 2011 (UTC)

The section entitled "Statistical Explanation" is clearly nonsense. Equal probability for selecting each letter means that the top of the rank order will be 26 equally probable single letter words. "b" will not occur twice as often as "a". What is Wentian Li talking about??? Clearly it is not what is said here.

—Preceding unsigned comment added by 98.222.58.129 (talk) 04:56, 8 January 2010 (UTC)

citation needed
I don't know how to add a citation needed link. It's needed for the parenthetical, (although the law holds much more strongly to natural language than to random texts)

sbump (talk) 21:42, 19 March 2010 (UTC)


 * type citation needed, it will look like this: . Cheers, — sligocki (talk) 03:21, 21 March 2010 (UTC)

Redundancy
Although the Shannon-Weaver entropy is listed in the table here, I think it is of fundamentally more importance and instructional value to list the REDUNDANCY which, in Zipf's law (s=1) is CONSTANT (and slightly less than 3). Credit to Heinz von Foerster for deriving this from fundamental principles.

98.222.58.129 (talk) 14:04, 19 February 2010 (UTC)

Dr. Ioannides's comment on this article
Dr. Ioannides has reviewed this Wikipedia page, and provided us with the following comments to improve its quality:

"Entry ignores the massive literature in urban economics, which has received enormous popularity especially after Paul Krugman drew renewed attention to it, namely that city size are Pareto, and often close Zipf distributed, in many countries and historical periods. Thus, I think it would be very important and more useful if the entry made references to some representative studies. Here are two: Gabaix, Xavier. "Zipf's Law for Cities: An Explanation." The Quarterly Journal of Economics, Vol. 114, No. 3 (Aug., 1999). And, more recently, Rossi-Hansberg, Esteban, and Mark L. J. Wright. Urban Structure and Growth April 2007, Review of Economic Studies, 74:2, 597-624 and  Ioannides, Yannis M., and  Spyros Skouras. “US City Size Distribution: Robustly Pareto, but Only in the Tail.” Journal of Urban Economics,  73, No. 1 (January,  2013) 18–29."

We hope Wikipedians on this talk page can take advantage of these comments and improve the quality of the article accordingly.

Dr. Ioannides has published scholarly research which seems to be relevant to this Wikipedia article:


 * Reference : Yannis M. Ioannides & Spyros Skouras, 2009. "Gibrat's Law for (All) Cities: A Rejoinder," Discussion Papers Series, Department of Economics, Tufts University 0740, Department of Economics, Tufts University.

ExpertIdeasBot (talk) 04:53, 16 June 2016 (UTC)


 * The article Power law seems like a better fit for everything that is described by a power law. 67.198.37.16 (talk) 18:22, 24 February 2020 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 1 one external link on Zipf's law. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive http://web.archive.org/web/20021018011011/http://planetmath.org:80/encyclopedia/ZipfsLaw.html to http://planetmath.org/encyclopedia/ZipfsLaw.html

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 22:13, 20 July 2016 (UTC)

Zipf distribution or Zipfian distribution?
Both the phrases "Zipf distribution" and "Zipfian distribution" seem interchangeable but which is preferred?

80.194.75.37 (talk) 09:39, 27 January 2017 (UTC)

Should it say signal, not symbol?
The bit about information theory in the Applications section seems like it makes more sense if the word 'symbol' is changed to 'signal.' I don't know anything about the topic, though so I don't want to change it myself.
 * Word 'symbol' is generally used in information theory - I have added '(event, signal)'. 188.146.143.213 (talk) 09:26, 27 April 2017 (UTC)


 * The section was removed pending clarification and sourcing; see below. --Jorge Stolfi (talk) 01:33, 9 May 2023 (UTC)

Relation to the Zeta distribution
Several times in both this article and Zeta distribution is it stated that while they appear similar and are often referred to by the same name, they aren't related distributions. However, given the probability mass distributions of both distributions, I don't see why they aren't related.

The pmf of Zipf(s,N) is $$f_{Zipf}(k;s,N)=\frac{1/k^s}{\sum\limits_{n=1}^N (1/n^s)}$$, while the pmf of Zeta(s) is $$ f_{\zeta}(k;s) = \frac {1/l^s}{\zeta(s)} $$.

Now, the Riemann zeta function has a very simple form when $$ s > 1 $$, being $$ \zeta(s)= \sum\limits_{n=1}^\infty (1/n^s) $$ (from Riemann zeta function).

This tells me, at least, that the pmf of Zeta(s) is simply the pmf of of Zipf(s,N) in the limit of infinite elements, $$ \lim_{N\to\infty} f_{Zipf}(k;s,N) = f_{\zeta} $$.

Therefore, I am wondering if anyone has a source or (sketch of) proof that shows they aren't related in this manner, so we can cite it - or correct the article to imply that they are in fact directly related through the aforementioned limit. EpicScizor (talk) 13:56, 13 March 2021 (UTC)

Removed unsourced section "Information theory"
The following unsourced and unclear section was deleted:
 * Information theory
 * In information theory, a symbol (event, signal) of probability $$p$$ contains $$-\log_2(p)$$ bits of information. Hence, Zipf's law for natural numbers: $$\Pr(x) \approx 1/x$$ is equivalent with number $$x$$ containing $$\log_2(x)$$ bits of information. To add information from a symbol of probability $$p$$ into information already stored in a natural number $$x$$, we should go to $$x'$$ such that $$\log_2(x') \approx \log_2(x) + \log_2(1/p)$$, or equivalently $$x' \approx x/p$$. For instance, in standard binary system we would have $$x' = 2x + s$$, what is optimal for $$\Pr(s=0) = \Pr(s=1) = 1/2$$ probability distribution. Using $$x' \approx x/p$$ rule for a general probability distribution is the base of asymmetric numeral systems family of entropy coding methods used in data compression, whose state distribution is also governed by Zipf's law.

Is it original research? Jorge Stolfi (talk) 01:22, 9 May 2023 (UTC)

Add a Non-Logarithmic graph (in the intro to the page preferably)
There is a reason the most popular youtube video on Zipfs law did not use a logarithmic graph...When you graph Zipfs law in Log scale,,,its just a linear graph...this is not understandable at a glance. A basic 1:1 graph is understandable without having to read the scale on axis (most don't even use axis its so easy). A exponential curve, sharply upwards at the end, is more understandable to the common people than anything with a logarithmic scale.

I ask we add 1 simple graph at the beginning so that a commoner can understand the curve easily.

Non-logarithmic scales are readable instantly to everyone. (logarithmic are only readable to educated people, and even they have to check the axes) Diox8tony (talk) 17:34, 5 June 2023 (UTC)

The first plot seems to be wrong.
Frequency as a function of frequency rank should be monotonically decreasing, which is not the case in the first plot of "word frequency vs frequency rank in War & Peace" (this plot --> https://en.wikipedia.org/wiki/Zipf's_law?/media/File:Zipf%27s_law_on_War_and_Peace.png#/media/File:Zipf's_law_on_War_and_Peace.png)

This plot isn't extracted from the source cited in caption (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4176592/pdf/nihms579165.pdf) but it is similar to other plots shown in this source. However all of the plots in this source are NOT monotonically decreasing, so it seems that those aren't a "Frequency as a function of frequency rank" plots.

I've never edited wikipedia before so I don't feel confident enough to propose the deletion of the plot. Qchenevier (talk) 15:13, 13 August 2023 (UTC)

Genlangs
I have removed the section on genlangs. The only citation was the original paper, which is neither notable or peer reviewed, and is of extremely low quality. Tristanjlroberts (talk) 16:42, 23 August 2023 (UTC)