Wikipedia:Reference desk/Archives/Mathematics/2020 July 29

= July 29 =

What are some of the benefits of fitting the relationship between two variables to an equation for a straight line?
174.63.21.117 (talk) 17:03, 29 July 2020 (UTC)


 * To start, idleness is the beginning of all vice, so it gives you something to do when you have nothing better to do. :) I assume that the question is, more precisely: when fitting a curve to two variables from a given data set, what are the advantages of selecting a straight line as the model, rather than some other curve (exponential, sinusoidal, ...)? The answer may depend on the goal you are trying to achieve by this curve fitting. One advantage of the linear model over other models is the mathematical simplicity. But that is not much good if the linear model was not appropriate to start with. In many cases, you can know on a priori grounds that a linear model is not appropriate. If you are looking at the onset of an epidemic and want to extrapolate its development for the next week, the most reasonable model is an exponential curve. If you are looking at seasonal variations of some variable and wish to fit them to a curve, use the first few terms of a Fourier series. In many cases the data set is one that was obtained by observation, and the values of the variables may be affected by fluctuations depending on other variables that it would be too difficult or not worth the effort to incorporate in the model. So then the idea is that there is an idealized curve and that the data set, when plotted, is somewhat randomly scattered around the curve – some points are higher, some are lower, but you are aiming for a smooth curve that kind of takes the middle road through the cloud of points. Now if you zoom in sufficiently on a smooth curve, it will locally begin to resemble a straight line. So if there is indeed a relationship between the variables that can be represented by the model of being randomly scattered around a smooth curve, and the variation in the values of the variables is relatively small so that only a small part of the curve is relevant, then it becomes reasonable to simplify matters and approximate that small part by a straight line. Another issue is statistical hypothesis testing, specifically the hypothesis that an apparent relationship between two variables is unlikely to be the result of random fluctuations. By using more complicated models, such as higher-degree polynomials, you can always get a very good or even perfect fit. But if you go that way, you lose the ability to use a powerful test. The more parameters there are in the model, the better the fit has to be for it to be deemed significantly better than random. --Lambiam 20:24, 29 July 2020 (UTC)


 * As a minor addition to this, though straight lines have an infinite variety of position and slope, there is only one kind of straightness, so are easier to identify than particular non-straight curves. That is why testing for a linear fit via transformed data can be helpful. A power law becomes linear after taking the logarithm of both variables, an exponential one does after taking the logarithm of the dependent variable. → 2A00:23C6:AA08:E500:B4DA:1B0B:8941:ABF8 (talk) 14:21, 30 July 2020 (UTC)