Wikipedia:Reference desk/Archives/Mathematics/2007 April 15

= April 15 =

Math question
Re linear relationships and Correlation coefficients and regression lines:

Given a series of data, how would I construct the regession line (the specific problem I'm tackling gives a table if a random sample of 10 subjects):

The questions are "Write down the equation of the regression line of shoe size (y) on height (x), giving your answer in the form y = mx + c?" and "Find the correlation coefficient" How do I do I find them? Thanks -- Neutralitytalk 19:37, 15 April 2007 (UTC)
 * We don't do homework, but Least squares kind of gives the whole thing away.  x42bn6  Talk 21:56, 15 April 2007 (UTC)

Line of Best Fit
1. Can a line of best fit be a curve?

2. What is the best way to draw a line of best fit by hand after plotting the scatter points? —The preceding unsigned comment was added by Vertciel (talk • contribs) 21:29, 15 April 2007 (UTC).
 * Yes. If you have a scatter plot that looks awfully like y=x2, for example, then you might actually consider the fact there is a quadratic correlation instead of trying to put a straight line through the plot.
 * Regression analysis helps to calculate the line of best fit (see linear regression, for example). But the quickest way is to kind of do it yourself - plot a line you think it alright, and then try and transpose/rotate the line until the distances between the points and the line seem to be smallest - but this is what regression does.  The best part is, that the line will be the line of best fit if you calculate it using regression (whether or not that line is good for the data is another story).  An example: Linear least squares.   x42bn6  Talk 21:49, 15 April 2007 (UTC)
 * You might also want to look at the section above - it is discussing vaguely the same thing.  x42bn6  Talk 21:56, 15 April 2007 (UTC)

- Thanks for your reply. Could you explain when a line of best fit would be better drawn as a curve rather than a straight line?
 * Put simply, when the data looks like a curve and not a line. --YbborTalk Survey! 01:18, 16 April 2007 (UTC)


 * BTW, I am not a statistician, but I have had to deal with certain statistics models before. Before you start to fit any curves you can think of to your data, you may want to get some intuition about why the data ought to fit a curve and which curve it ought to fit. And, unless you have a huge number of unbiased points, its best to fit to fewer parameters, not more, or you may be guilty of overfitting. But other than those comments, the one other thing you could do is try them and see what fits better.  But even if the curve of your choice fits your samples better, that does not mean it will better fit the population as a whole.  Another reason to stick with the linear regression would be to follow the KISS Principle.  Root4(one) 04:33, 16 April 2007 (UTC)


 * I wouldn't call a curve a line (except for the exceptional case when the curve is a line). There is some material in Curve fitting and Nonlinear regression.
 * Note that in standard linear regression using ordinary least squares what is being minimized is the vertical discrepancy between the data points and the line, and not the (Euclidean) distance – which is scale-sensitive. This does make a difference; "hand-drawn" lines for best "optical" fit tend to have a higher slope than lines resulting from computation. (See the issue raised at Reference desk/Archives/Mathematics/2007 January 28 and my response near the end.)
 * There is no simple answer to the question on when to use a linear model and when a nonlinear model (and if so, which one). In many cases where social scientists use linear regression, the assumption that a linear model is appropriate is totally unwarranted. The problem with nonlinear models is that there are so many to choose from that you can always find one that gives good fit, even if your data are the outcomes of the lottery. But if physicists had always stuck to linear models, the most important laws of physics would not have been found. If you can tell us what it is you are trying to do (source of the data, purpose of having a curve fit to it), we might be able to give more specific advice. --Lambiam Talk  05:03, 16 April 2007 (UTC)


 * I use gnuplot for this kind of problem. You can put in your data and some candidate funcions, and it will perform least square fitting for you. The graphic approach invites you to "play" with the candidate functions, until you find something simple with a good fit. And let me chime in: Please show us the data. —The preceding unsigned comment was added by 84.187.59.188 (talk) 23:02, 16 April 2007 (UTC).