User:Tenoc.1776/StatProbSet2

http://www.codecogs.com/latex/eqneditor.php http://de.wikibooks.org/wiki/Mathematik:_Statistik:_Normalverteilung

39,41,47,58,65,37,37,49,56,59,62,36,48,52,64,29,44,47,49,52,53,54,72,50,50

1a
$$A(x)=\frac{39+41+47+...+72+50+50}{25}=50$$

$$\sigma(x)=\sqrt{\frac{(39-50)^{2}+(41-50)^{2}+...+(50-50)^{2}+(50-50)^{2}}{25}}=10$$

1b
Number of scores within 1.25 SDs:

$$25 * P(|x| < 1.25* \sigma(x)) = 25 * (1-2*P(x < 1.25* \sigma(x)))= 25*(1 - 2 * 0.1056) = 25* 0.7887 = 19.7175$$

1c
The number of scores between 37,5 and 62,5 is 19.

2a
$$ y = \alpha + \beta x$$

$$\beta = r \frac{\sigma_y}{\sigma_x}$$

$$\beta =0.25*\frac{$26000}{$39000}=\frac{1}{6} \approx 0.17$$

$$\alpha=\bar y-\beta \bar x =\bar y- r \frac{\sigma_y}{\sigma_x} \bar x$$

$$\alpha = $33000 - \frac{1}{6} * $54000 = $24000 $$

$$ y = $24000 + \frac{1}{6} x$$

2b
$$\bar x^*=\bar x * 1.1 = $59400$$

$$\bar y^*=\bar y * 1.1 = $36300$$

$$\sigma_{x^*}=\sigma_x * 1.1 = $42900$$

$$\sigma_{y^*}=\sigma_y * 1.1 = $28600$$

$$r^* = r = 0.25$$

$$ y^* = \alpha^* + \beta^* x^*$$

$$\beta^* = r^* \frac{\sigma_{y^*}}{\sigma_{x^*}}$$

$$\beta^* =0.25*\frac{$28600}{$42900} = \frac{1}{6} \approx 0.17$$

$$\alpha^*=\bar y^* -\beta^* \bar x^* =\bar y^* - r^* \frac{\sigma_{y^*}}{\sigma_{x^*}} \bar x^*$$

$$\alpha^* = $36300 - \frac{1}{6} * $59400= $26400 $$

$$ y^* = $26400 + \frac{1}{6} x^*$$

3a/b
While increasing lines will show the positive association between lead levels and IQ, they would not automatically be the best choice to most closely approximate the correlation and will in many cases be misleading (for example if the line is too steep). The regression line is the most accurate visualization of the weak positive association in the data. According to the linear least squares method, the regression line minimizes the sum of the squared distances between the values from the dataset and the responses predicted by this approximation.

3c/d/e
Any line he uses (including the regression line) will have an r.m.s. error unless there is perfect correlation, as every line will be an approximation. With the given weak positive association in the data, perfect correlation not the case. As pointed out above, the regression line by definition has the smallest r.m.s. error.

4a
The parameter εi is an unobserved random variable that is used to express noise in the data set, i.e. effects outside the two measured variables that influence the relation between those variables.

4b
$$ \beta  = \frac{\frac{1}{n} \sum (x_{i}-\bar{x})(y_{i}-\bar{y}) }{\frac{1}{n} \sum (x_{i}-\bar{x})^2 } = \frac{ \operatorname{Cov}[x,y] }{ \operatorname{Var}[x] } = r \frac{\sigma_y}{\sigma_x} $$

$$q.e.d.$$

4c/d
$$\bar y=10$$

$$\bar x=6.25$$

$$\sigma_y \approx 1.87$$

$$\sigma_x \approx 1.48$$

$$Cov_{x,y} = \frac{11}{4} = 2.75 $$

$$r_{x,y}= \frac{Cov_{x,y}}{\sigma_x \sigma_y} \approx 0.99$$

$$\beta_2 = r \frac{\sigma_y}{\sigma_x} \approx 1.26$$

$$\beta_1 = \bar y -\beta_2 \bar x = 2.13$$