User:Heptalogos

Boy or Girl paradox
The Boy or Girl problem is a well-known example in probability theory:
 * 1. A random two-child family, whose randomly chosen child is a boy, is chosen. What is the probability that the other child is a girl?
 * 2. A random two-child family, with at least one boy, is chosen. What is the probability that the family has a girl?

Investigation of these questions reveals that their answers are very different:
 * in the first case, there are two equally probable possibilities: the other child is a boy or a girl.
 * in the second case, there are three equally probable ways in which at least one child can be a boy: only the older one, only the younger one, or both.

In order to define probabilities, the assumption is made that the ratio of boys to girls is exactly 50:50.

First question

 * 1. Let's analyze the problem by dividing it into three events:
 * a. Create a sample space of randomly chosen two-child families
 * b. Reduce the sample space to families whose randomly chosen child is a boy
 * c. Define the share of mixed families in the sample space

Explanation:
 * A. From 10,000 families, 2500 will have only girls, 2500 will have only boys, and 5000 will have a boy and a girl. (10,000)
 * B. Randomly choosing a child within every family will cut in half the sample space if only the families with a chosen boy are left. (5000)
 * C. Since a boy has been chosen and the other child has to be a girl, only mixed families meet the requirements. (2500)


 * Simple approach

P(c) = C/B = 2500/5000 = 1/2


 * Bayesian approach

$$P( c | b ) = \frac{ P( C \cap B ) } { P( B ) } = \frac{ \frac{2500} {10,000} } { \frac{5000} {10,000} } = \frac{1}{2} $$

Second question

 * 2. Let's analyze the problem by dividing it into three events:
 * a. Create a sample space of randomly chosen two-child families
 * b. Reduce the sample space to families with at least one boy
 * c. Define the share of families with a girl in the sample space

Explanation:
 * A. From 10,000 families, 2500 will have only girls, 2500 will have only boys, and 5000 will have a boy and a girl. (10,000)
 * B. Families with at least one boy (7500).
 * C. In the reduced sample space, only mixed families have girls. (5000)


 * Simple approach

P(c) = C/B = 5000/7500 = 2/3


 * Bayesian approach

$$P( c | b ) = \frac{ P( C \cap B ) } { P( B ) } = \frac{ \frac{5000} {10,000} } { \frac{7500} {10,000} } = \frac{2}{3} $$

Third question
Does the additional bit of information that the boy's name is Jacob change anything? In order to define probabilities, the assumption is made that 2% (1/50) of all boys is called Jacob.
 * 3. A random two-child family, with at least one boy whose name is Jacob, is chosen. What is the probability that it has a girl?

Explanation:
 * A. From 10,000 families, 2500 will have only girls, 2500 will have only boys, and 5000 will have a boy and a girl. (10,000)
 * B. Families with at least one boy (7500).
 * J. Families with at least one boy called Jacob (199).
 * C. In the reduced sample space, only mixed families have girls. (100)


 * 2 boys
 * {| class="wikitable"

!boy 1 !P(b1) !boy 2 !P(b2) !P(b1)P(b2) !total ! ! ! !0.0396
 * Jacob
 * 0.02
 * not-Jacob
 * 0.98
 * 0.0196
 * not-Jacob
 * 0.98
 * Jacob
 * 0.02
 * 0.0196
 * Jacob
 * 0.02
 * Jacob
 * 0.02
 * 0.0004
 * Jacob
 * 0.02
 * 0.0004
 * }
 * 0.0396 x 2500 = 99


 * mixed (boy & girl)
 * 0.02 x 5000 = 100 (since there is always exactly one boy, the table space can be divided by fifty)


 * Simple approach

P(c) = C/J = 100/199


 * Bayesian approach

$$P( c | b ) = \frac{ P( C \cap J ) } { P( J ) } = \frac{ \frac{100} {10,000} } { \frac{199} {10,000} } = \frac{100}{199} $$

So if the probability of a boy being named Jacob is 1 in 50, then the probability that the family has a girl is 100/199, or roughly 50%. But this value will change depending on the popularity of the name. At the extreme, if all boys were given the same name, then being named Jacob would provide no more information than being a boy, and thus the probability would still be 2/3 that the family has a girl. As the likelihood of the name decreases, the likelihood of the two-Jacob case also decreases, and the probability of the family having a girl approaches the limit of 50%.

If we further assume that parents never name two children with the same name, we can eliminate {Jacob, Jacob}, leaving 198 possible events; thus it would appear that the probability of the family having a girl is 100/198, or 50/99. However, there are now 50 occurrences of {Jacob, Boy not Jacob} making the probability of a girl 100/199, just as before.

Conclusion

 * Many people coming across this paradox for the first time will agree with the answer to the first question, but some may be confused by the answer to the second question. This can be explained by the fact that the first question creates a smaller sample space, because from the mixed families only half is chosen. This is because of those families a boy needs to be chosen, in contrast with the second question which only requires at least one boy, thus leaving the sample space of the mixed families unchanged.


 * The Bayesian approach is quite silly in these situations.