Wikipedia:Reference desk/Science/Birthday probability question

This came up at my conscientious objector place. We know that if there is a group of over 365 people, at least two people will have the same birthday. But if we pose the question "backwards", i.e. how large a group of people has to be so that every day of the year is a birthday of someone? The correct answer, of course, is "infinite", as there is nothing preventing, for example, everyone from being born on the same day.

But given the number of people, what is the probability of every day in the year being someone's birthday? For 1 to 364 people, it is 0, i.e. such a thing is impossible. For exactly 365 people, it is 1/(365!), i.e. 1 divided by the factorial of 365. But what is the probability for larger groups? (For simplicity, we ignore leap years.) &mdash; J I P | Talk 16:38, 27 October 2005 (UTC)


 * Your math basis is incorrect. The probability that 365 people have distinct birthdays is 365!/365^365. (1/365! is the probability that you take 365 people with distinct birthdays and, picking them one at a time, correctly pick them in birthday order). Let's work with smaller numbers: assume a 3-sided coin (it's more interesting than a two-sided, but the numbers are small). Leaving order intact, there are 3*3*3 or 27 combinations of flips with results a, b, and c.  Of these, only 3! (abc, acb, bac, bca, cab, cba) have all 3 results, for 3!/3^3 probability. For a fourth flip, there are 3^4 or 81 combinations. Every 3-ple satisfying 3 flips will work with any fourth value (that's 6*3), and every 3-ple with 2 distinct values will satisfy for one particular fourth value (that's 18*1), a grand total of 36 working combinations. For a fifth, all the working 4-tuples with any (36*3), plus all the failed 2-item 3-tuples with the needed third value (18*2*1), plus the 1-item 3-tuples with a distinct fourth value plus the needed fifth (3*2*1), a grand total of 150/243. If somebody else can come along and make this into a series... you're better with symbols than me. &mdash; Lomn | Talk / RfC 17:56, 27 October 2005 (UTC)


 * I've got a recursive solution, but I'm not sure how (or if it's possible) to get a closed formula. Let $$f(M,N)$$ be the number of ways of distributing M distinct balls among N distinct bins, with each bin containing at least one ball; and let $$g(M,N)$$ be the number of ways of distributing M distinct balls among N distinct bins, possibly having one or more bins empty.  (Throughout this, consider "balls" and "bins" to always refer to distinct balls, such as individually numbered balls, and distinct bins, so I don't have to keep specifying "distinct.")


 * Thus, the probability you are looking for is $$\frac{f(M,N)}{g(M,N)}$$. If you want to know the probability of having every day represented in a group of 1000 people, you are looking for $$\frac{f(1000,365)}{g(1000,365)}$$.


 * g is easy: $$g(M,N)=N^M$$.


 * f is the difficult one. One way to compute the number of ways of distributing M balls among N bins, with at least one ball in each bin, then subtracting the number of ways of distributing M balls among N bins, with at least one bin being empty.


 * Let's consider 7 balls placed into 4 bins. The number of ways of distributing them, with no restriction, is $$g(7,4) = 4^7 = 16384$$.  Now, we need to subtract the number of those ways which leave one or more bins empty.  Let's consider the separate cases of exactly one bin empty, exactly two bins empty, and exactly three bins empty.


 * The number of ways of placing 7 balls into 4 bins, leaving exactly 1 specific bin empty, is $$f(7,3)$$. So the number of ways to place 7 balls into 4 bins, leaving exactly 1 (but not any specific bin) empty, is $$4 \cdot f(7,3)$$.


 * Similarly, to place 7 balls into 4 bins, leaving 2 specifc bins (and no others) empty, is $$f(7,2)$$. Since there are 6 different ways to choose 2 bins to remain empty, the number of ways to place 7 balls in 4 bins, leaving exactly 2 (but any 2) empty is $$6 \cdot f(7,2)$$.


 * Likewise, there are $$4 \cdot f(7,1)$$ ways to place 7 balls in 4 bins with exactly three of them empty. Of course, $$f(7,1)=1$$.


 * Thus, $$f(7,4)=4^7-4 \cdot f(7,3)-6 \cdot f(7,2)-4 \cdot f(7,1)$$


 * To generalize:


 * $$f(M,1)=1$$


 * $$f(M,N)=N^M-\sum_{i=1}^{N-1} {N \choose i} f(M,i)$$ for $$N>1$$, where $${N \choose i} = \frac{N!}{i!(N-i)!}$$


 * It's easier to see with an example; let's look at 7 people and 4 distinct birthdates, so we're looking for $$\frac{f(7,4)}{g(7,4)}$$.


 * $$g(7,4)=4^7=16384$$


 * $$f(7,1)=1$$


 * $$f(7,2)=2^7-{2 \choose 1}f(7,1)=128-2 \cdot 1=126$$


 * $$f(7,3)=3^7-{3 \choose 2}f(7,2)-{3 \choose 1}f(7,1)=2187-3 \cdot 126-3 \cdot 1=1806$$


 * $$f(7,4)=4^7-{4 \choose 3}f(7,3)-{4 \choose 2}f(7,2)-{4 \choose 1}f(7,1)=16384-4 \cdot 1806-6 \cdot 126-4 \cdot 1=8400$$


 * Thus, the probability we are looking for is $$\frac{8400}{16384}\approx0.51$$


 * Chuck 20:09, 27 October 2005 (UTC)

A non-recursive solution
A naive application of the even-odd rule gives
 * $$P(m,n) = \sum_{k=0}^{n} (-1)^k {n \choose k} \left( \frac{n-k}{n} \right)^m$$

where P(m,n) is the probability of m people having all of n possible birthdays. At least for P(4,7) this formula gives the same answer as above, 525/1024 = 8400/16384, so I'm fairly confident it's right. (Note that the k=0 term is always 1, and the k=n term always 0 unless m=0.) I'd test it on larger numbers if I had a calculator that could evaluate binomial coefficients directly. —Ilmari Karonen 23:54, 27 October 2005 (UTC)


 * To explain that formula a little: $${n \choose k}$$ is obviously the number of ways to choose k days out of n, while $$\left( \frac{n-k}{n} \right)^m$$ is the probability that no birthday falls on any of k given days out of n. The even-odd rule then says that $$-\sum_{k=1}^{n} (-1)^k {n \choose k} \left( \frac{n-k}{n} \right)^m$$ is the probability of some day being no-one's birthday.  Subtracting that from 1 gives P(m,n) above. —Ilmari Karonen 00:10, 28 October 2005 (UTC)

I went ahead and wrote a Perl program to calculate this. The core function is: sub p { my ($m, $n) = @_; my $p = Math::BigRat->new(1); my $c = Math::BigRat->new(1); for my $k (1 .. $n) { $c *= ($n+1-$k); $c /= $k; $p += Math::BigRat->new($n-$k."/".$n)**$m * $c * ($k%2?-1:1); }     return $p; } The use of  is necessary, since otherwise one gets roundoff errors pretty quickly. I've done some optimization to avoid repeated calculations, but it's still quite slow. I'm running the calculation for P(365,365) while I'm writing this... —Ilmari Karonen 01:15, 28 October 2005 (UTC)


 * Just so you know, P(365,365) ≈ 1.4549552156187. Took me just under an hour to compute.  —Ilmari Karonen 02:28, 28 October 2005 (UTC)

Overnight, I realized that one needs to extend the sum to k=n (and note that 00=1) in order to get sensible answers for m=0. Admittedly this is a somewhat academic issue, since the m=0 case is trivial anyway, and for all other values of m that term is zero. I've nonetheless made the change to the formula and code above.

With the modification above, there's a really nice proof based on the construction of Pascal's triangle that P(0,n) = δ0n. I'm still trying to figure out a similar proof for m>0. —Ilmari Karonen 10:02, 28 October 2005 (UTC)


 * I just realized that the computation can be made much faster by taking the common factor 1/nm out of the loop. This (and a few other minor optimizations) gives the following improved routine:

sub p { my ($m, $n) = @_; my $p = Math::BigRat->new(0); my $c = Math::BigRat->new($n); for my $k (1 .. $n) { $c *= ($n-$k); $c /= $k; $p += Math::BigRat->new($n-$k)**($m-1) * $c * ($k%2?-1:1); }     return 1 + $p / Math::BigRat->new($n)**$m; }
 * With this code computing P(365,365) only takes about 15 seconds! I now feel stupid for not figuring this out sooner. —Ilmari Karonen 16:01, 28 October 2005 (UTC)

This problem arose in a discussion I had with some friends, where we were trying to work out the smallest number of friends you would need so that the probability of all your friends' birthdays covering every day of the year was greater than 50%. I implemented the above formula in PARI-GP with the suggested performance improvements ? m=2286; ? 1.*(365!/(365^m))*sum(k=0,365, (-1)^k* ( (365-k)^m )/ ( (k!)*(365-k)! )) %5 = 0.49941417128185166777177972497844837801

? m=2287; ? 1.*(365!/(365^m))*sum(k=0,365, (-1)^k* ( (365-k)^m )/ ( (k!)*(365-k)! )) %7 = 0.50037078393694674453459225997117661900

? m=2288 ? 1.*(365!/(365^m))*sum(k=0,365, (-1)^k* ( (365-k)^m )/ ( (k!)*(365-k)! )) %9 = 0.50132657144840513253848992281687139974
 * }

So the answer is 2287.

Random Guy: Sooooooo....the answer is there is someone's birthday everyday? cuz this math is kinda nose bleeding

Using an example to test above answers
Assume there are only 3 days within a year, and now assume there are three persons in total. What is the probability of these three persons's birthday being distributed in these three different days that make sure there is at least one person having birthday in each day? —Kangrinboqe Ning

Using the fomulae below:


 * $$f(M,N)=N^M-\sum_{i=1}^{N-1} {N \choose i} f(M,i)$$ for $$N>1$$, where $${N \choose i} = \frac{N!}{i!(N-i)!}$$

We can get the result is $$f(3,3)=3^3-3 \cdot f(3,2)-3 \cdot f(3,1)$$. The result is 6.

But if we use the enumeration method, we can get the result as below:

A, B, C

A, C, B

B, C, A

B, A, C

C, A, B

C, B, A

So the probability is 6/27. The above answer is correct according to the above example. Of course, the judge of correctness of the answer is finally based on the mathematical theory rather than just several simple cases.

I also get a different formulae with the same result comparable to the above two methods:


 * $$P(n,k) = (k)^n - ({k \choose k-1}(k-1)^n - {k \choose k-1}(1)^n + {k \choose 1}(1)^n)$$

where P(n,k) is the probability of n people having all of k possible birthdays.

Back to the birthday problem, assume there are 365 days within each year, and ignore of leap year. Then the probability is:


 * $$P(n,365) = (365)^n - ({365 \choose 364}(364)^n - {365 \choose 364}(1)^n + {364 \choose 1}(1)^n)$$

In calculating p(n, 365), we need to exclude the case of "n persons having the birthday in the same day" that are counted repeatly. So we need to subtract the $$ {365 \choose 364}(1)^n $$ from the $$ {365 \choose 364}(364)^n $$, and then add $${364 \choose 1}(1)^n$$