Wikipedia:Reference desk/Archives/Mathematics/2013 August 9

= August 9 =

DIfference of averages versus average of differences
I was recently listening to a news report about sleep and the full moon, and they reported the findings of a study as "on nights with a full moon, people went to sleep five minutes later and slept 20 minutes less". I wondered why they didn't phrase it as "on nights with a full moon, people went to sleep five minutes later and got up 15 minutes sooner". I then realized the two formutations might not be identical, as depending on the underlying distributions and correlations between the two, the mean of the durations might not match the difference between the mean start point and mean stop point. Am I correct in this deduction, or are the two always equivalent, regardless of the distribution? What would be the properties of the start and stop time distributions which would force it to be true (e.g. if the start/stop times are normally distributed, does that restrict the mean of the difference)? - For simplicity sake I'm conceptualizing the problem as everyone having identical normal start/stop times, and individual variation only occurs in the full moon start/stop/duration, but if relaxing that assumption changes things, I'd be happy to hear how. -- 71.35.121.78 (talk) 04:54, 9 August 2013 (UTC)


 * Is this related to Simpson's paradox ? StuRat (talk) 04:59, 9 August 2013 (UTC)


 * There would be a mismatch only if there was a different sample size, i.e. if it wasn't the same number of people who reported start time, stop time and duration. sum(x)/n - sum(y)/n = sum(x-y)/n regardless of distribution.→31.53.1.113 (talk) 08:01, 9 August 2013 (UTC)


 * You're also assuming that the total sleep time is start-finish, and ignoring the possibility of interrupted sleep. MChesterMC (talk) 08:23, 9 August 2013 (UTC)
 * It is a fact that if X and Y have expectations ("means") E(X) and E(Y), then E(X+Y) = E(X) + E(Y) whether or not X and Y are independent.   So your deduction is correct (provided the relevant means exist, which is pretty much beyond doubt). HTH, 09:14, 9 August 2013 (UTC)
 * You seem all to be assuming that this is a question to which mathematics can be applied... My personal guess would be that the data for the study was achieved by letting people answer questionnaires.  In other words, I suspect that the times the people fell asleep simply are assumed to be the times they gave as answers to a questionnaire question, and likewise for how long they slept. However, you cannot use the answers for these two questions to decide what the answer would be to the question of when they woke up. People are not obliged to answer in a mathematically consistent way. (Besides, as MChesterMC noted, they also are not obliged to sleep uninterrupted.) JoergenB (talk) 01:34, 11 August 2013 (UTC)


 * Thanks all. The use of the definition of mean and the expectation value phrasing had me slapping my forehead at the simplicity of the answer. The heart of my question was a pure mathematical one: under what conditions is the difference in averages the same as the average of differences. The study was simply what got me thinking on those lines. I agree that the choice of phrasing was down to the particulars of how they conducted the study. -- 71.35.121.78 (talk) 18:32, 11 August 2013 (UTC)

Polynomial regression vs. Legendre polynomials
Given some data $$(x_i,y_i), i=1,\ldots,N$$, one can find a polynomial of given order $$n<N$$ which is the best fit to this data, in the sense of minimising residuals. Doing this in a numerically stable way is apparently somewhat tricky.

On the other hand, one can construct a piecewise constant or perhaps piecewise linear function from the data $$(x_i,y_i)$$, and project this function on the first $$n$$ Legendre polynomials. The result is also a best-fit polynomial, though in a slightly different sense.

Is anything known about the relationship between these 2 best-fit polynomials? Is there a good reason to prefer one over the other? 176.62.208.227 (talk) 13:26, 9 August 2013 (UTC)