Talk:P-hacking

New Version
This is a suggested new version of this article. The goal is to describe p-hacking more precisely in order to make the concept more understandable to a wider audience:

Researchers usually want to find and publish positive results, e.g., a new drug is more effective than older drugs. This desire often lures them into manipulating the relevant test data so that the published version of their findings shows a statistically significant result. Manipulations might include starting over by discarding old test data and generating a new set, rewording the test objectives, adding new test data to older data, deleting data outliers, and so on. These researchers usually are not being deliberately dishonest; rather, they just want to tailor the test data to achieve a favorable result.

"P-hacking," says Simonsohn, "is trying multiple things until you get the desired result. "

"P" here refers to the p-value, which is the probability that a null hypothesis is true given the test data. The researchers' goal, of course, is to achieve a small p-value and, thus, to demonstrate that the null hypothesis is probably false and that an alternative hypothesis, e.g., the new drug is better, is probably true.

The problem with P-hacking is that it creates an illusion of statistical significance. If all of the test evidence is reviewed by a well-trained statistician—not just the final P-hacked evidence— he or she is likely to assert that the p-value for the null hypothesis should be bigger, i.e., the evidence for the desired alternative hypothesis is actually weaker. And testing in the future is likely to reveal that the claimed positive result was actually non-existent.