Wikipedia:Reference desk/Archives/Mathematics/2014 November 28

= November 28 =

Analysis
My boss said that we need a program that periodically checks a database sequence's current number, compares it with previous values and estimates when the sequence is going to be exceeded. We both know that for one value this is impossible, and for two values it can be linearly extrapolated from their difference. But how should it be done for more than two values? Should I try to average throughout every value measured so far or just concentrate on the latest values? How does one extrapolate from the difference between more than two values anyway? J I P &#124; Talk 16:03, 28 November 2014 (UTC)


 * The method depends mainly on the nature of the sequence nature. If you expect the value oscillates somehow around some otherwise constant mean value, then you extrapolate the mean of several recent values. If you expect some linear trend, then you do a linear regression. Generally you need a model of the sequence behavior, next fit the model to data by adjusting its parameters, then read the future values calculated from the model. See Extrapolation. --CiaPan (talk) 16:42, 28 November 2014 (UTC)


 * I'm afraid I didn't understand very much from the linear regression article. Is there some sort of beginners' guide to it available? J I P  &#124; Talk 17:54, 28 November 2014 (UTC)
 * You may find it worthwhile to look at exponential smoothing, which considers a weighted moving average of all past data values, the weights diminishing as the data ages. The aim is to respond to real underlying changes but ignore random ones (impossible to achieve exactly, of course, but due analysis of past data can give better discrimination between the two effects). Various refinements to the basic technique exist, e.g. reviewing/setting the value of the smoothing constant by considering past behaviour, and it can (like all practical forecasting methods) get very complicated, but if it looks a possibility for your needs, in terms of quality and practicability, you can always get deeper into it. →86.171.209.142 (talk) 19:16, 29 November 2014 (UTC)


 * Probably the least complicated way to extrapolate from more than two points is to apply simple linear regression to the last N values for some predetermined value of N. If you let the x values represent the time of observation, and y values denote the values observed, then it fits a line through the values and you can use this to estimate y values at future times.  More specifically:


 * $$ y_\text{estimated} = \alpha + \beta\, x$$
 * Where for N value pairs $$(x_i, y_i)$$:
 * $$\begin{align}\beta & = \frac{ \sum_{i=1}^{N} (x_{i}-\bar{x})(y_{i}-\bar{y}) }{ \sum_{i=1}^{N} (x_{i}-\bar{x})^2 } \\

\alpha & = \bar{y} - \beta\,\bar{x}\end{align}$$
 * That's easy to do, though it won't necessarily be very accurate if the data isn't fairly linear. Dragons flight (talk) 05:50, 2 December 2014 (UTC)
 * What do $$\bar{x}$$ and $$\bar{y}$$ mean here? The mean of $$x_i$$ and $$y_i$$ respectively? J I P  &#124; Talk 07:21, 3 December 2014 (UTC)
 * Yes, the respective means. Dragons flight (talk) 17:00, 3 December 2014 (UTC)
 * It is also possible to replace the means with any arbitrary point that you want to require that the estimating line pass through, e.g. $$(\bar{x}, \bar{y}) \rightarrow (x_\text{required}, y_\text{required})$$ everywhere. For example, if you are projecting the near future, you might get better results by requiring that the line pass exactly through the most recent value.  How well that works is likely to depend on how noisy the data is.  If the data has a lot of short-term noise on it, then you are probably better off using the means.  Dragons flight (talk) 17:18, 3 December 2014 (UTC)
 * Thanks! This looks fairly usable. I'll keep this mind once my boss gives me a go-ahead to actually start implementing the program. J I P  &#124; Talk 19:31, 3 December 2014 (UTC)
 * You're welcome. :-)  Dragons flight (talk) 19:52, 3 December 2014 (UTC)
 * LaTeX minor formatting fix to make a small vertical gap between equations. --CiaPan (talk) 10:04, 3 December 2014 (UTC)