Wikipedia:Reference desk/Archives/Mathematics/2023 July 7

= July 7 =

Regression to the mean
Regression to the mean is a pretty old concept but I couldn't see something I thought would be mentioned there. Am I missing something? The article gives an estimate that the regression coefficient for height is 2/3. Assuming the next generation has the same distribution as their parents and the distribution for each child is the same, then I work out that the standard deviation of the heights of the children for each family as sqrt(1 - (2/3)^2) or about 3/4 of the overall standard deviation. Hope I've got that right! Does this come under something else perhaps? NadVolum (talk) 22:39, 7 July 2023 (UTC)


 * Let's assume, to keep this tractable, that the heights of any two parents are iid random variables with the same distribution as the population. Also, for simplicity, introduce the concept of z-height, which is a linear transformation of height $$h$$ to z-height $$z=(h-\mu)/\sigma,$$ so that the z-height can be treated as having the standard normal distribution. Also, let's simply define the mid-parent height as the arithmetic mean $$(z_\text{♂}+z_\text{♀})/2$$ of the two heights of the parents of a child. Let $$c$$ be the linear regression coefficient of child height (the dependent variable) with respect to mid-parent height, so we can write the child height as $$\zeta=c(z_\text{♂}+z_\text{♀})/2+r,$$ in which the random variable $$r$$ is the residual. By definition, the expected value of $$r$$ is $$0$$; moreover, it may be assumed to be independent of the mid-parent height. Now $$\operatorname{Var}(\zeta)=c^2(\operatorname{Var}(\text{♂}){+}\operatorname{Var}(z_\text{♀}))/4+\operatorname{Var}(r).$$ We have defined z-height such that $$\operatorname{Var}(\text{♂})=\operatorname{Var}(z_\text{♀})=1.$$ If the height-reproducing process is stationary (the z-height $$\zeta$$ of offspring also has the standard normal distribution), also $$\operatorname{Var}(\zeta)=1.$$ This then implies that $$\operatorname{Var}(r)=1-c^2/2.$$ Specifically, when $$c=2/3,$$ this comes out as $$7/9,$$ not as $$1-(2/3)^2=5/9.$$ --Lambiam 12:03, 8 July 2023 (UTC)
 * Yes I'd assumed in effect a single parent with the original distribution rather than two parents at random from it which would make the standard deviation of height for the children neary 90% of that of the population as a whole - which I think is a bit surprising. But don't you think this is interesting enough it is surprising people don't seem to have made the calculation never mind shown how it works out in a case with two parents like this? And for height I must admit it does seem to me the association is rather random rather than tall people marrying tall ones and short marrying short! In other cases like batters hitting in a second season compared to the first there would not be two parents but there may be other factors and they may be interesting. NadVolum (talk) 12:41, 8 July 2023 (UTC)
 * It is interesting, but see WP:NOR. Trying to orient implication in the direction of cause → effect, it may be better to interpret the relation as $$\operatorname{Var}(z)=\operatorname{Var}(r)/(1-c^2/2)$$ giving the steady-state population variance, given the regression coefficient and residual variance. BTW, many studies have found that preferences for similar height in mating selection are reflected in a correlation between the heights of actual couples. --Lambiam 14:02, 8 July 2023 (UTC)