User:Ygraigarw/sandbox

Fitting the regression line
Suppose there are n data points $$y_i$$ {yi, xi}, where i = 1, 2, …, n are generated from the model:


 * $$ y = \alpha + \beta x + \varepsilon, \,$$


 * $$ y = \alpha_i + \beta x + \varepsilon, \,$$

where * and * are unknown constants, * is a known explanatory variable and * and * are random.

The goal is to find the equation of the straight line which would provide a ‘‘best’‘ fit for the data points:


 * $$ yi, \,$$


 * $$ y_i = \alpha + \beta x_i + \varepsilon_i, \,$$

Here the ‘‘best’‘ will be understood as the line that minimizes the sum of squared residuals of the linear regression model. In other words, \hat{α} and \hat{β} solve the following minimization problem:


 * $$\text{Find }\min_{\alpha,\,\beta}Q(\alpha,\beta),\text{ where } Q(\alpha,\beta) = \sum_{i=1}^n\hat{\varepsilon}_i^{\,2} = \sum_{i=1}^n (y_i - \alpha - \beta x_i)^2\ $$

$$