Wikipedia:Reference desk/Archives/Mathematics/2023 March 20

= March 20 =

Statistics: 365 days in a year.
Say you sample 730 random elementary students, then list out their birth day. It is not enough to evenly distribute that every day of the year is someone's birthday. Even though, 730 divides into 365 evenly. For every time 2 people have the same birth day, which is high probability, then some day of the year is not someone's birthday. But then, if you sample a million students, then that is enough for every day of the year to be someone's birthday. What would be the minimum number needed for where between 99-100% of the time, every day of the year is someones birthday? A sample population of a million people, statistics should be able to tell you what are some of the lowest-low, and highest-high, for least and most birthdays. I'm looking at where every day of the year is pretty much guaranteed 1. Let's make this non-leap year for this question, we can just multiple or divide by 4 if we needed to convert to include leap years. Thanks. 2601:249:8200:A640:A1C0:243E:3417:215C (talk) 22:39, 20 March 2023 (UTC).


 * See Coupon collector's problem (though birthdays are not uniformly distributed throughout the year: UK births by day-of-year, averaged over 1995 to 2014). catslash (talk) 23:33, 20 March 2023 (UTC)
 * $$\Theta(n\log(n))$$ --> I'm not getting 225 when I 50 log(50). What is theta? 2601:249:8200:A640:A1C0:243E:3417:215C (talk) 00:20, 21 March 2023 (UTC).
 * The use of Theta notation is somewhat unfortunate, I'm not sure it answers your question anyway. Let's simplify things a bit by putting the problem in terms coin flips: how many times do you have to flip a coin before you get Tails? Eventually you will get Tails, but it's impossible to say exactly how long it will take. You could theoretically get 20 Heads in a row; the chances are less than one in a million but it is possible. Or with a one in a billion chance you could get 30 Heads in a row, There is no number of times you can give ahead of time that would absolutely guarantee that you'd get at least one Tails since for any N there's always a 1 in 2N chance of getting all Heads. You can extend this to dice: What is the number of dice rolls you need to get a 6? Again, for a given N, there's always a small chance of getting only 1-5 for N rolls. In this case the probability is (5/6)N, which gets smaller with larger values of N, but never becomes 0. The problem with birthdays is similar, but instead of 2 possible outcomes as with dice, or 6 possible outcomes as with dice, there are 365 possible outcomes. (We're ignoring leap years.) What is the probability of not finding the birthday of Dec 31 if you get the birthdays of 730 people? It's (364/365)730=about 13%. The article talks about the Expected number, which is basically a way of giving the average over many tries, but the expected value doesn't say anything about minimum or maximum values. You have to make the question more precise to get a precise answer. We don't know what "pretty much guaranteed" means to you, 95% probability? 99.5%? 99.95%? If you mean 100% then there is no answer; there will always be a small chance you will miss a date. --RDBury (talk) 08:01, 21 March 2023 (UTC)
 * The OP asks for the lower bound for a confidence of 99%. Let $$n$$ be the number of slots (days in a year, or types of coupons). We repeatedly draw a random item and put it in the corresponding slot. We can consider the random variable $$e^{(n)}_r,$$ being the number of slots that are still empty after $$r$$ independent draws (with replacement). Then $$e^{(n)}_0 = n$$ and $$\max(0,n-r)\le e^{(n)}_r\le n-1$$ if $$r>0.$$ The question is to find a value of $$r$$ such that $$~\text{Pr}[e^{(n)}_r=0]\le 1-p,$$ where $$p$$ represents a confidence level. If I understand our article correctly, it gives the lower bound $$~\text{Pr}[e^{(n)}_r=0]\le n(1-\tfrac{1}n)^r<n\,e^{-r/n}.$$ So this should be guaranteed if $$r\ge -n\log\frac{1-p}{n}.$$ Using $$n=365,$$ this results in
 * $$r\ge 2994$$ for $$p=0.9$$;
 * $$r\ge 3247$$ for $$p=0.95$$;
 * $$r\ge 3835$$ for $$p=0.99$$;
 * $$r\ge 4675$$ for $$p=0.999.$$
 * Monte Carlo experiments suggest these bounds are pretty sharp. So the answer to the question is, about 3835 students, perhaps a few less. --Lambiam 11:59, 21 March 2023 (UTC)
 * I missed the sentence with "between 99-100%". --RDBury (talk) 16:15, 21 March 2023 (UTC)
 * Wait so there is no 100%? Can't we just plug in 100 for p in Lambian's formula? 2601:249:8200:A640:D092:7611:F9F:9A9F (talk) 22:15, 21 March 2023 (UTC).
 * No. Even if you take $$r=8000000000,$$ about the size of Earth's population, you get $$p=0.999999...9990...$$ with a string of 9518780 digits $$9$$s followed by a bunch of zeroes; incredibly close to, but still smaller than, $$1.$$ If you try to use $$p=1,$$ you run into the problem that $$\frac{1-p}n=0,$$ and $$\log 0$$ is undefined. A much simpler version of the issue is how often you need to throw a fair coin to get heads. Even after getting a run of tails for one googolplex times in a row – replace "googolplex" by any huge number you fancy – the next toss has a fifty-fifty chance of being again tails: there is no point where heads is guaranteed. --Lambiam 01:23, 22 March 2023 (UTC)
 * Oh you're right, I should have meant 1 instead of 100. So this was talked about in calculus II, convergent vs. divergent. 2601:249:8200:A640:D092:7611:F9F:9A9F (talk) 01:35, 22 March 2023 (UTC).
 * The mathematical concepts of convergence and divergence are normally used for a series, as in the question of March 21. Here I do not see a series to which we can apply it. You probably mean the related but more generally applicable concept of a limit. Here we have, in mathematical notation, $$\lim_{r\to \infty}p=1,$$ in which the limit $$1$$ is approached from below but never reached. --Lambiam 08:05, 22 March 2023 (UTC)

Btw, is my question a probability problem, or a statistics problem? Probability problems I think of as the final answer being a number between 0 and 1. Or between 0% to 100%. Although this includes a 0 to 1 problem, the final answer is not, so therefore this is actually a statistics problem? 2601:249:8200:A640:58CC:9380:9CA3:6AD (talk) 19:06, 22 March 2023 (UTC).


 * The stuff above is all an application of pure probability theory. It would become statistics if you collected the data of 3000 students, noticed that none was born on the 1st of April, and wondered if that was suspiciously unusual. A statistician would then tell you (using probability theory) that also when nothing is out of the normal, it is not at all strange to find that one or more days are no one's birthday. This is in fact more likely to happen than, after throwing two dice, finding that their values add up to exactly 9. (A probability theorist worth their mettle could also tell you that, but when the issues become more complicated you want to be able to lean on the expertise of a good statistician.) --Lambiam 21:05, 22 March 2023 (UTC)

Hey Lambiam and RDBury okay so a sample students of the same age 3835 is needed where 99% chance, or 1% chance, that there is a day of the year where nobody of that population has a birthday. But, is that only for 1 day of the 365 days where no 1 has a particular birthday? Then what would it be where 2 days of the 365 days, nobody has a birthday? 2601:249:8200:A640:283A:8CF0:E1A3:D2FD (talk) 23:43, 24 March 2023 (UTC).


 * The 1% chance includes all numbers of 1 up to 364 days that are nobody's birthday. So it also includes the unlikely case that all 3835 students were born on the 1st of April, an event that has a probability of about 4×10−9827. 365 birthday-less days can even be included, but since this is impossible, its probability is 0. --Lambiam 00:42, 25 March 2023 (UTC)