Wikipedia:Reference desk/Archives/Mathematics/2016 March 24

= March 24 =

Birthday paradox on random articles
If the case of the birthday problem is considered, given that there are some five million articles on Wikipedia, what is the probability of me getting two same articles at random, say, in one thousand articles? Or, conversely, how many pages do I have to open before the probability reaches 99.9%? The Average Wikipedian (talk) 15:47, 24 March 2016 (UTC)
 * From that article:

According to the approximation, the same approach can be applied to any number of "people" and "days". If rather than 365 days there are d, if there are n persons, and if n ≪ d, then using the same approach as above we achieve the result that if p(n, d) is the probability that at least two out of n people share the same birthday from a set of d available days, then:


 * $$\begin{align}

p(n, d) & \approx 1-e^{\frac{-n (n-1)}{2d}} \\ & \approx 1-e^{\frac{-n^2}{2d}}. \end{align}$$
 * For your questions, you hit 99.9% in the vicinity 8300 articles, and with 1000 articles the probability is around 9.5%. (It happens that 1000 is very much the right unit here, since the right scaling happens when you look at about the square root of the total number of articles.)
 * --JBL (talk) 16:06, 24 March 2016 (UTC)


 * how many tries would it take to hit every article in a row with no repeats via the random button? that number would have to be unbelievably huge..68.48.241.158 (talk) 18:54, 24 March 2016 (UTC)
 * probability of such, that is..as of course it's possible to happen on one's first attempt..68.48.241.158 (talk) 18:56, 24 March 2016 (UTC)
 * That would be 5 million factorial - which is an ungodly large number. Far, far to big to write down.  If there were just one million articles, the result would be about five and a half million digits long...for 5 million, much MUCH larger! SteveBaker (talk) 19:45, 24 March 2016 (UTC)
 * it's not too big to write down in some kind of notation, is it? there's an article on large numbers...with 'power towers' or something...isn't 10^(5,000,000) five million digits long??68.48.241.158 (talk) 20:20, 24 March 2016 (UTC)
 * If you're free to choose any notation, then it's just 5000000! . The exclamation point is part of the notation, not me exclaiming.
 * If you want to use decimal notation, then it might be too big to write down by hand, unless you want to make it a career, but it's certainly not too big to write down mechanically. It's obviously less than 50000005000000, which in turn is less than (107)5000000, which has only 35 million digits.  Well, maybe 35 million and one.  Heck, that's probably shorter than the US tax code. --Trovatore (talk) 18:47, 25 March 2016 (UTC)
 * If there are n articles, the probability of n randomly selected articles having no duplicates is n!/nn. Using Stirling's approximation that's roughly e−n. That may mean that the expected number of clicks before you get a run of n distinct articles is roughly en, but I'm not sure, because the runs of length n are not independent. (For example, when you get the same article twice in a row, there's no chance of "winning" for the next n−2 clicks, and that will happen roughly every n clicks.) Unfortunately, searching for keywords related to this problem has only turned up versions of the coupon collector's problem. -- BenRG (talk) 21:00, 24 March 2016 (UTC)