User:Stephen1729/sandbox

The Method of Least Squares

=Problem Statement=

Given $$n$$ 2-dimensional data points $$(x_1,y_1),(x_2,y_2),\ldots ,(x_n,y_n)$$ we will look at how to use the the method of least squares to determine the coefficients $$a_0,a_1, a_2, \ldots, a_k $$ of the polynomial $$P_k$$ which fits our data the best.

$$ P_k(x) = a_kx^k + a_{k-1}x^{k-1} + \cdots + a_1x + a_0 $$

But what does it mean for $$P_k(x)$$ to be a best fit? The following applet from the popular website KhanAcademy gives a nice illustration of what is means for a line to be a "best fitting line":

Fitting A Line To Data

=History=



It is difficult to determine the exact origins of The Method of Least Squares due to it's simplicity, practicality and numerous applications. A usual use for least squares was to fit a function to model the path or orbit of a celestial body. By observing the night sky, measurements can be taken and used to predict a path or orbit. One of the earliest and most thorough treatments of the method of least squares comes from Adrien-Marie Legendre in his work "Nouvelles méthodes pour la détermination des orbites des comètes".



Of course, such an important mathematical method is not so easily attributed to one person, and there is much dispute over the original discoverer of the method. In the paper "Gauss and the Invention of Least Squares" Stephen M. Stigler writes "Adrien Marie Legendre published the method in 1805, and American, Robert Adrain, published the method in late 1808 or early 1809, and Carl Friedrich Gauss published the method in 1809".

We may never know who discovered the method first. It is well known however, that Carl Friedrich Gauss extended the idea of least squares with an error term that is distributed as a Gaussian distribution. This is the familiar least squares that is used so often today.

=Geometry of Least Squares=

Two Dimensional


Consider a set of $$n$$ data points $$(x_i,y_i) \in \mathbb{R}^2 $$ for $$i=1,2,\ldots ,n $$.

Let us consider the problem of fitting a straight line to this data. Assume that there exists some straight line given by:

$$ y = a_1 x_i + a_0 $$

We call the distance between this hypothetical line and a given data point $$y_i$$ the error or residual. We denote the $$i^{th}$$ error as

$$ e_i = y - a_1 x_i + a_0 $$

Then in order to determine the "line of best fit" we seek to minimize the total or sum of all errors.

$$ E = \sum_{i=1}^{n} ( y - a_1 x_i + a_0 )^2 $$

But why exactly do we want to do this? Consider the following interactive diagram made by Bill Finzer using the program Geometers Sketchpad.

Multi-Dimensional


Consider a set of $$n$$ data points $$(x_{i,1},x_{2,i}, \ldots, x_{p,i}) \in \mathbb{R}^p $$ for $$i=1,2,\ldots ,n $$.

Now since our data is $p-$dimensional, instead of fitting a $p-$dimensional line as we did when we had $1-$dimensional data, we will fit a $2-$dimensional surface to this data. In the picture on the left, this surface is represented by a green plane.

We can see two fitted vectors in this span.

We can also see the observed value of $(p-1)-$.

The deviation from the fitted value, or the residual is shown as a dotted line.

=Linear Least Squares=



For linear least squares the problem statement reduces to finding a line of best fit for $$n$$ 2-dimensional data points. The line is given by

$$ P_1(x) = a_1x + a_0 $$

We have $$n$$ error's $$ e_i = y - a_1 x_i + a_0 $$

We let $y$ be the total sum of squared errors.

$$ E = \sum_{i=1}^{n} ( y_i - a_1 x_i + a_0 )^2 $$

Then in order to determine the "line of best fit" we seek to minimize the total or sum of all errors. We note that $E$ is a function of the two parameters $$a_1,a_0$$, thus in order to minimize this function we have to take partial derivatives with respect to each of the variables, and solve for their values when the derivative is equal to 0. Thus we get two equations:

$$\begin{align} \frac{\partial E}{\partial a_1} = \sum_{i=1}^{n} ( y - a_1 x_i + a_0 )^2 \\[6pt] \frac{\partial E}{\partial a_0} = \sum_{i=1}^{n} ( y - a_1 x_i + a_0 )^2 \end{align}$$

so solving these equations simultaneously for the coefficients gives:

$$ \hat a_1 = \frac{ \sum_{i=1}^{n}{x_{i}y_{i}} - \frac1n \sum_{i=1}^{n}{x_{i}}\sum_{i=1}^{n}{y_{i}}}{\sum_{i=1}^{n}{x_{i}^2} - \frac1n (\sum_{i=1}^{n}{x_{i}})^2 } $$

$$ \hat a_0 = \bar{y} - \hat a_1\bar{x} $$

=Polynomial Least Squares=



The concept of fitting a $$k^{th}$$ order polynomial is exactly the same as in the linear case. The course text presents all of the long equations, but here we will just give an overview.

For the general polynomial $$P_k(x)$$ given by:

$$ P_k(x) = a_kx^k + a_{k-1}x^{k-1} + \cdots + a_1x + a_0 $$

We may estimate the coefficients in a similar way as for the linear case. We want to minimize the least squares error:

$$ E = \sum_{i=1}^{n} ( y_i - P_k(x_i) )^2 $$

$$ E = \sum_{i=1}^{n} ( y_i - (a_kx^k + a_{k-1}x^{k-1} + \cdots + a_1x + a_0) )^2 $$

We do this by taking the partial derivative of $$E$$ with respect to each of the parameters $$a_0, a_1, \ldots, a_k $$. Thus we will obtain a system of $E$ Normal Equations. We set this system equal to 0 and then solve for each of the parameters. The solutions are those that minimize the sum of squared errors, and we denote them $$\hat{a_0}, \hat{a_1}, \ldots, \hat{a_k} $$.

=Applications=



Least Squares is used extensively in social and physical sciences. Physicists, Engineers, Psychologists, and Managers, utilize least squares to fit functions to data to obtain summaries and estimates of the data, and to make predictions. Consider a problem related to social science taken from the book Applied Multivariate Statistical Analysis.

A social scientist has collected data on $$n=50$$ Salespeople on two variables $$x=$$ "Sales Growth" and $$y=$$ "Mathematics Test". We perform the method of least squares to estimate the parameters of the line which fits the data best. When we plot this line through the data we see that it is increasing. However, it would be incorrect to assume that learning more mathematics will improve selling ability. This graph only shows that there is a correlation between mathematics and sales, it does not express the cause of the relationship.

=Conclusion=

The Method of Least Squares is one of my favorite bits of maths. It is super useful, and so simple that even a child can understand it. The Method of Least Squares was invented in order to approximate the "best" result given a series of results. So, if we are making some kind of measurements with an imprecise instrument, instead of buying a better instrument, we can take several measurements, and use the method of least squares to estimate the best measurement, at absolutely no additional cost! Nowadays, everywhere we find data and information, the method of least squares is also usually present.

=References=