Wikipedia:Reference desk/Archives/Mathematics/2015 November 12

= November 12 =

Test for significance
fstoppers did a comparison of cameras, asking people to say which image looked the best. The results were:


 * 590 Camera 1
 * 701 camera 2
 * 384 camera 3
 * 856 no difference

How can you test this for statistical significance (taking into account the ones that saw no difference)? Bubba73 You talkin' to me? 00:50, 12 November 2015 (UTC)


 * The Pearson chi-square test is your friend here. I would suggest analysing the respondents who expressed a preference.  In R the test is as follows:

> o <- c(590,701,384) > e <- mean(o) > e [1] 558.3333 > sum( (o-e)^2/e) [1] 92.68418 > pchisq(92.68,df=2,lower.tail=F) [1] 7.495381e-21 >

Very certainly significant. HTH, Robinh (talk) 03:14, 12 November 2015 (UTC)


 * Thanks, but "I would suggest analysing the respondents who expressed a preference." - that ignores the 33.8% that saw no difference. Is that valid?  As an extreme example, suppose that there were two cameras being compared, 1000 people, 4 preferred #1, 10 preferred #2, and 986 had no preference.  I don't think you could ignore the 986.  Bubba73 You talkin' to me? 03:44, 12 November 2015 (UTC)
 * Hello. Of course you could ignore the 986 and focus on the 14.  Think about the millions of people who didn't answer your question; it's OK to ignore them.  Why not ignore the 986 and make inferences about the subpopulation who express a preference from your 14 observations? Robinh (talk) 08:30, 12 November 2015 (UTC)
 * You can always say "98.6% of my sample expressed no preference", so you're not totally ignoring them. You're just more interested in the respondents who did express a preference. HTH, Robinh (talk) 08:37, 12 November 2015 (UTC)
 * If you ignore the 986 and use only the 14, you won't get valid results. Bubba73 You talkin' to me? 18:13, 12 November 2015 (UTC)
 * What do you mean by "valid"? You will get a valid answer, perhaps for a different question than the one you intended. -- Meni Rosenfeld (talk) 18:29, 12 November 2015 (UTC)
 * What I mean is, given the data, including the ones that saw no difference, are the results statistically significant. Bubba73 You talkin' to me? 19:00, 12 November 2015 (UTC)


 * Note that "statistical significance" has a specific meaning which is distinct from practical significance. If you have, say, 100 people preferring #1, 200 people preferring #2, and 1,000,000 people having no preference, there is no contradiction between the effect size being extremely small, and the fact that what little effect there is is highly statistically significant. -- Meni Rosenfeld (talk) 09:58, 12 November 2015 (UTC)

In order for "significance" to make sense you need to formulate some null hypothesis. Then you may compute if your data significantly rejects your null hypothesis or not. Bo Jacoby (talk) 08:41, 12 November 2015 (UTC).
 * The null hypothesis is that there is no difference, but how do you take into account the ~33% that saw no difference? Bubba73 You talkin' to me? 18:16, 12 November 2015 (UTC)
 * The existence of people who see no difference is perfectly consistent with there being a difference. I personally can't see a difference between basketball and football, but there still is one.
 * The opposite is not true - the existence of people who see a difference (beyond what random variation would allow), is not consistent with the lack of a difference. If there truly is no difference, there can't be people who see one.
 * Consider my example above - there is an obvious and highly significant difference. But the difference is extremely small - 99.999% of people can't see it. -- Meni Rosenfeld (talk) 18:29, 12 November 2015 (UTC)


 * Here is the original data and study. Their conclusion is that most people can't tell the difference.  Is that valid, given the entire data?  Bubba73 You talkin' to me? 19:02, 12 November 2015 (UTC)
 * Note that "most people can't tell the difference" and "there's no difference" are very different null hypotheses. Robinh's calculations clearly show the latter is rejected.
 * As for the former, I'd approach it like this - if at most half the people can see the difference, then that's 1266 out of 2531. So 856 admitted to seeing no difference, 410 chose one randomly, and at most 1265 chose one based on actual preference (I'm assuming the test was done properly, with unlabeled examples in random order). The data can be easily explained with the assumption that those 410 were distributed equally between cameras, and the rest indicated an actual preference. So the data does not refute the hypothesis.
 * The data doesn't refute the reverse, either. If I understand their experiment correctly, this kind of experiment cannot refute it. If they really want to "conclude" that most people can't tell the difference (rather than just fail to refute it), they'll have to do something like asking the same person several times to choose a photograph, and check his responses for consistency. -- Meni Rosenfeld (talk) 19:54, 12 November 2015 (UTC)


 * As far as I can tell they presented the images in the same order to everyone. (The survey is still up, and I don't think the survey software supports randomizing the camera numbers, and they identified the numbers with camera makers when presenting the results.) I wouldn't be surprised to see a large bias for some numbers over others even if they'd used three copies of the same image, so I don't think the chi-square test proves much. But I suppose we could treat this as a theoretical exercise about a hypothetical well-conducted survey. -- BenRG (talk) 20:25, 12 November 2015 (UTC)
 * Has the bias for some numbers over others been tested? Bo Jacoby (talk) 05:16, 14 November 2015 (UTC).


 * Yes, it appears that everyone saw the same page. One of my points was asking if the data really says what they say.  You hit upon the other point too - "a theoretical exercise about a hypothetical well-conducted survey" where you have the category of replies that don't see a difference.  You can't ignore them, you can't treat them as a fourth category (like the other three), and you can't divide them evenly into the other three categories.  Bubba73 You talkin' to me? 16:45, 17 November 2015 (UTC)