Talk:Coupon collector's problem

O-notation
Regarding the numerical example in the first paragraph of this article: The mathematical analysis of the problem reveals that the expected number of trials needed grows as O(nlog(n)). For example, when n = 50 it takes about 225 samples to collect all 50 coupons. Although the O-notation is correct as it is, I think it would be better to be consistent with the following article and write n ln n + \gamma n + O(1). Especially since the numerical example is otherwise not reproducible. Unfortunately I do not know how to edit the very first paragraph of a page -- and couldn't find further information on this question. Can you help? (Netzwerkerin (talk) 22:08, 24 May 2009 (UTC))


 * To edit the very first paragraph of a page, click the "edit" button at the top of the page – the third of the buttons "article • talk • edit • ..." etc. (Or, go to My Preferences -> Gadgets, and in "User interface gadgets", turn on the option to "Add an [edit] link for the lead section of a page".) Shreevatsa (talk) 22:16, 24 May 2009 (UTC)

What does that notation even mean? What math course should I have taken to even know what O(stuff) means? 207.189.226.76 (talk) 20:36, 9 July 2012 (UTC)
 * See Big_O_notation. It's usually taught in computer science courses on theory or algorithms, not math courses.  — Preceding unsigned comment added by Acertain (talk • contribs) 18:21, 14 November 2012 (UTC)

Different probabilities
There should be a section on the solution to the problem if not all coupons are equally distributed. 173.63.91.121 (talk) 00:49, 4 September 2009 (UTC)

Problem definition
I think the problem definition could be a bit clearer. It says you are collecting random coupons with replacement. Then you are asked how many coupons you are expected to draw before collecting all n of them.

It's very easy to misunderstand this (as I did): You draw one coupon, then you put it back, which means you now have 0 coupons. You can keep drawing them, but if you keep putting them back, you'll never collect more than 1. A better definition of the problem would be:

Given n coupons, how many coupons do you expect you need to draw with replacement before having drawn each coupon at least once. —Preceding unsigned comment added by 131.155.59.211 (talk) 11:30, 21 January 2010 (UTC)

I agree with this. The problem defn. on the page was not clear to me at all. Magicmike (talk) 19:42, 22 February 2011 (UTC)

Merger Proposal
Coupon collector's problem (generating function approach) could be integrated under a section Solutions. I just don't understand any of it to be able to advise... Haruth (talk) 22:06, 9 August 2011 (UTC)
 * I'd prefer it be transwikied to wikiversity or wikibooks or something. Or maybe just deleted. I like having proofs in articles when they are short, to the point, and most importantly illuminating, and I think the parts already in the "Solution" section are good, but to me this one looks more like a long tedious calculation, written in a non-encyclopedic textbook-like style. —David Eppstein (talk) 23:09, 9 August 2011 (UTC)


 * Actually I'd rather keep it as it is, though style and form could be improved. Since problem is well known under this name, I c no reason not to have an article. Even if you argue the current article('s style) doesn't feel particularly encyclopedic, the content nevertheless is. Hence there is no treason to transfer it to wikiversity or wikibooks.
 * The integration as an example or application into a more promeinent article is an option but i don't really like that one either. Since we need to keep at least a Redirect anyhow (so that the problem can easily find by its name) and furthermore such an integration would messup the proper matching with interwikis.
 * Generally I see no issue with having (small) entries for common or well known recreational, educational, historical math problems as long as there's a proper name for it and they are covered in literature, i.e. can be properly sourced. Keep in mind WP is many things to many people and I'd assume this article certainly has its readership.
 * So all in all style and form improvements - yes, but no reason for an integration or a removal.--Kmhkmh (talk) 01:19, 10 August 2011 (UTC)
 * P.S. I didn't really look at target article for the merger before. Given that it is just about the coupon collector's problem as well and has no interwikis my comment regarding the integration doesn't really apply. Hence yes both articles should be merged indeed.--Kmhkmh (talk) 09:36, 10 August 2011 (UTC)
 * I agree with David. This kind of detailed proof is useless.  In fact, I would vote for deletion.   Mhym (talk) 07:51, 10 August 2011 (UTC)
 * Again useless to who? But more importantly from useless assessment of the of the proof doesn't follow one for the article as a whole. We still want to provide the information what the coupon collector problem is.--Kmhkmh (talk) 09:21, 10 August 2011 (UTC)


 * I didn't think anyone was talking about deleting this article, it's notable and fairly reasonably written. The generating function proof article though has no indication of notability and no citation and just looks like an exercise somebody did. I don't see anything there that is worth keeping. So I also agree with David Eppstein, transwiki or delete the generating function approach article. It is quite common to use such an approach to problems like this, if someone bothers to come up with a citation they can stick some basic stuff here about it but any details of the proof should be left to the citation. Without a citation I wouldn't even bother mentioning the approach. Dmcq (talk) 15:39, 11 August 2011 (UTC)
 * Ah ok, then I've misread this conversation as I was primarily concerned with this article. As far as generating function approach article is concerned I'm fine with a transfer and/or deletion.--Kmhkmh (talk) 16:00, 11 August 2011 (UTC)
 * For what it's worth, I found the generating function proof quite useful. While I understand that it might not fit with Wikipedia's policies, it should be moved to somewhere more appropriate, not deleted. 72.79.217.78 (talk) 14:46, 31 October 2011 (UTC)

LOL, this issue is very funny. Anyway, we should bring the article to here. --Yodamgod (talk) 15:02, 22 July 2012 (UTC)

I'd concur with a merge - at least of a type. This page talks about approximations a lot. There is a fairly simple explicit exact solution for the distribution function etc. The indicated page talks to it, but isn't very clearly presented. If there is time and effort to present clearly, should be an improvement. Not sure that there is enough interest here to do so. Paul Erickson Oct 10, 2013 — Preceding unsigned comment added by 140.101.84.1 (talk) 16:43, 10 October 2013 (UTC)

I have some experience in generating functions. It is my opinion that it is a rare enough field that 1) the exercise was probably not from a citable source, but is original work and 2) referencing David Eppstein's comment about "having proofs in articles when they are short, to the point, and most importantly illuminating", I don't think a generating function approach will be that for most people. I offer a third option. The main body of the proof could remain on its own page (cleaned up of course) and a "stub" with a few highlights from the proof and a "see the main article" link could be placed in the main article for the Coupon Collector's Problem that points back to the generating functions proof page. If the decision is made to retain the proof, I'd be willing to spend time cleaning it up and clarifying things. I'd hate to do all that work, only for the community to realize points 1) and 2) above and decide to delete or move it to a specialized site where people wouldn't mind it as is, or have plenty of specialized eyes available to clean it up. --Yoda of Borg (talk) 02:26, 20 April 2014 (UTC)

n(log(n))
I keep getting either 85 if I use (50 x (log_base_10 of 50)), or 196 if I use (50 x (log_natural of 50)), so how do you get 225? 71.139.160.159 (talk) 17:20, 28 April 2012 (UTC)
 * There are lower order terms that you're leaving out, particularly the γn term. Using the natural log, (50 ln 50) + 50γ + 1/2 ≈ 224.96. —David Eppstein (talk) 18:50, 28 April 2012 (UTC)
 * Note that γ here (which looks like y or \/ on some displays) is $$\gamma \approx 0.5772156649$$, the Euler–Mascheroni constant. —WhackTheWiki (talk) 19:01, 9 February 2014 (UTC)

On a related note, I find the use of "log" in place of "ln" referring to the natural log rather than log_10 to be confusing throughout the article. In my personal opinion it would help a lot if the notation were standardized.128.200.169.180 (talk) 18:53, 1 April 2013 (UTC)rlrogers
 * Log is the standard notation, for research mathematics. I realize there's an ISO standard that recommends ln, but it also recommends using lg for something other than its usual meaning of the binary logarithm, so I don't put a lot of credit in it. I agree that we shouldn't inconsistently mix log and ln as we currently do, though. —David Eppstein (talk) 20:18, 1 April 2013 (UTC)
 * In order to promote clarity in the face of differing standards, I propose that "loge" be used, as it is unambiguous. Currently "ln" is used for the natural log on both the pages for Harmonic Number and Euler–Mascheroni constant, so to have it as "log" here is particularly confusing.50.233.92.246 (talk) 19:56, 1 March 2019 (UTC)
 * If you're going to promote nonstandard notation that nobody else ever uses anywhere, why not go whole hog instead of continuing to be so abbreviated and cryptic? We could just use $$\operatorname{NaturalLogarithm}(x)$$. —David Eppstein (talk) 22:36, 1 March 2019 (UTC)

Coupon ?
Is this BE usage. In ordinary AE understanding you collect coupons by finding discount ads and clipping them. Having more than one per product is only rarely accepted. "Collecting" several of those to make a set isn't a common scheme. You sometimes encounter competitions where collecting various game pieces is required to enter to win. But I haven't seen those referred to as coupons in AE. Is there another word for this? I think the definition phrase at the start of the article needs to be tightened up. 99.11.160.111 (talk) 03:13, 24 August 2012 (UTC)

Real world examples
I was directed to this problem today after asking a question on a forum. The question concerned purchasing six packs of Pabst Blue Ribbon bottles, whose caps each represent one of 52 standard playing cards. Turns out the expected number of beers you would have to buy is 235.978 to get all 52, which works out to 39.33 six packs. A quick Monte-Carlo simulation showed that the expected number drops to around 226 beers if you assume that no individual six pack contains duplicates which seems to be the case so far. Just thought this was a fun example, of course there are many other real world examples of this problem as it is defined in real world terms. Perhaps eventually there could be a section on the wiki page of specific examples such as this? I love this problem!

Mark and recapture
No, this is not a related problem. In fact, it implicitly assumes that there is a large number of birds, while a CCP talks about capturing all coupons. I will rv again. Please let other editors decide. Thank you. Mhym (talk) 22:45, 18 August 2013 (UTC)
 * Of course it's not the same problem, but the very fact that you can describe it in comparison with CCP makes me think they are sufficiently related to warrant a link for people who are looking for the right formulation of their problem. I believe it is "tangentially related" and deserves a mention there as per MOS:SEEALSO. OK, I won't do it myself, but would encourage someone else to do so. --A3 nm (talk)

Tail estimates
I'm not familiar with how and/or when to do stuff, so I'll let this be done by someone more familiar with contributing to wikipedia. But I wanted to inform that the book Randomized Algorithms from 1995 would be a good citiation for the section about tail estimates. — Preceding unsigned comment added by 2A00:1398:9:FB00:6236:DDFF:FE0E:B05F (talk) 16:00, 28 January 2014 (UTC)

Answer chapter
The Answer chapter seems to be in bad shape.

Second Column
The table's second column contains useless and misleading information, as already clarified earlier in the article, the number of tries per coupon is not constant, but increases with each successive coupon obtained, so this column isn't contributing anything to the understanding of the problem, and the information is abundant as it is already implicated by the first and third column. I suggest removing this column.

Third Column
The third column provides the ceiling of the expected total number of tries, but claims this to be the expected total number of tries itself. I see no justification for using the ceiling function here, the results may be rounded (to the nearest) to an agreeable number of decimal digits, but anything else lacks a reasonable foundation. I suggest using the true values without a ceiling operation. No idea how many decimal places should be shown, what is usual in the wikipedia?

MarioVX (talk) 14:36, 1 May 2015 (UTC)

Solution section: $$c$$
The Solution section attempts to derive various equations and it makes use of the variable c, but c is randomly introduced and is never defined. At least not prior to the end of the section 50.35.103.217 (talk) 19:54, 25 July 2018 (UTC)


 * It's just an arbitrary value, and didn't need to be defined as such. –Deacon Vorbis (carbon &bull; videos) 20:19, 25 July 2018 (UTC)

Too technical
This article is not friendly to anyone that is unfamiliar with higher math; it should be using prose to explain the problem and formulas can be used to demonstrate it, but it shouldn't just have headings and jump into math. It should introduce the solution. Why is it "Calculating the expectation" or the "Calculating the variance"? The article expects that the reader understands that this is something that should be done, but it is technical knowledge to know that. The article should use prose to introduce the problem and the approach that is being taken for the solution, rather than just presenting it like a proof. Where does the body even come up with the conclusion presented in the lead that the expected number of trials is $$\Theta(n\log(n))$$? —Ost (talk) 19:14, 18 March 2019 (UTC)


 * "Higher math" is kind of subjective. Much of the material here should be understandable by someone having taken some (first or second year) undergraduate-level probability courses.  Appealing to a wider audience is always ideal, but there are limits to which that can be done.  Sure, the presentation is a bit bare-bones, but there is a gentle explanation of the problem in the lead.  If you think the organization isn't ideal, that's fair – perhaps it could do with a rigorous statement of the problem after the lead as well.  But these are all organizational issues; I don't think it will really make the article any less technical.  –Deacon Vorbis (carbon &bull; videos) 21:51, 18 March 2019 (UTC)
 * Oh, and to add, the asymptotic form of the solution is derived in the "Calculating the expectation" section. It's presented slightly differently with a bit more detail, but the statement in the lead is just a simplified restatement of that.  The numerical example and footnote 2 adds to that a bit.  –Deacon Vorbis (carbon &bull; videos) 21:56, 18 March 2019 (UTC)
 * I agree that the "higher math" is subjective, but I also I think it's unrealistic to expect that readers on Wikipedia would have any undergraduate classes, let alone some in probability. The technical note specifies to clarify "without removing the technical details", so I think it does encompass organization to make details more approachable. Our difference of opinion may be largely editorial based on Wikipedia practices. Because a lead summarizes an article, anything in it should be in the body of the article. While this lead may be a relatively concise summary of this problem and solution, it is not a summary of this article. The technical problem is linked to this because it becomes unclear to casual readers how the information in the lead it connected to the rest of the article. I'd expect a problem section that introduces the problem and what "mathematical analysis" will be used for the solution, and the solution section should explicitly make mention of the final big O notation. Conversely, the lead should at least make passing reference to the extensions and generalizations section (even if it's just to say that they exist). —Ost (talk) 14:41, 19 March 2019 (UTC)
 * I agree that the "higher math" is subjective, but I also I think it's unrealistic to expect that readers on Wikipedia would have any undergraduate classes, let alone some in probability. The technical note specifies to clarify "without removing the technical details", so I think it does encompass organization to make details more approachable. Our difference of opinion may be largely editorial based on Wikipedia practices. Because a lead summarizes an article, anything in it should be in the body of the article. While this lead may be a relatively concise summary of this problem and solution, it is not a summary of this article. The technical problem is linked to this because it becomes unclear to casual readers how the information in the lead it connected to the rest of the article. I'd expect a problem section that introduces the problem and what "mathematical analysis" will be used for the solution, and the solution section should explicitly make mention of the final big O notation. Conversely, the lead should at least make passing reference to the extensions and generalizations section (even if it's just to say that they exist). —Ost (talk) 14:41, 19 March 2019 (UTC)

Truly bad introduction
The sentence in the introduction:

"The mathematical analysis of the problem reveals that the expected number of trials needed grows as $$\Theta(n\log(n))$$.[a]"

will be completely incomprehensible to almost everyone. The footnote [a] reads:

"[a] Here and throughout this article, "log" refers to the natural logarithm rather than a logarithm to some other base. The use of Θ here invokes big O notation."

is not helpful. What does "invokes" mean here???

If the introduction requires a footnote to make sense of it, then it is written poorly.

This is ten times worse if — as is the case here — the footnote itself requires a footnote for anyone to understand it. 2601:200:C000:1A0:F974:8F91:D7D4:65C (talk) 17:22, 12 September 2021 (UTC)

Distribution function
It is stated that "[The coupon collector's problem] asks the following question: If each box of a brand of cereals contains a coupon, and there are n different types of coupons, what is the probability that more than t boxes need to be bought to collect all n coupons?" However, this question is not answered in the solution section. For the sake of completeness, it would be great if somebody could add the CDF of the number of draws. I tried to write it myself, but couldn't meet quality standards and got reverted. --Gaahsbgz (talk) 14:20, 30 December 2021 (UTC)