Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 12 submissions in the queue.
posted by Blackmoore on Monday November 17 2014, @07:30PM   Printer-friendly
from the whats-the-chance-this-makes-the-front-page dept.

Research articles published in scientific journals are routinely backed by studies which show statistical significance with a p-value less than 0.05 (or sometimes 0.01), i.e. less than five (or one) percent probability that the null hypothesis is correct. By implication, the data shows a greater than 95 (or 99) percent chance that the alternative hypothesis proposed by the researchers is correct. It was noticed a long time ago, however, that it's quite possible for an incorrect or even outlandish hypothesis to be validated as statistically significant, for the same reason that an avid poker player will occasionally draw a three-of-a-kind in five-card draw; low probability events sometimes happen. Or, as Richard Feynman once told a roomful of CalTech freshmen:

“You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!”

Andrew Gelman and Eric Loken take a crack at the conundrum of statistical significance in an essay published in American Scientist. The authors note that the problem of false statistical significance is well known in the scientific community, and is particularly likely to surface if researchers slice and dice their data every which way until they find something that appears to be statistically significant, a phenomenom which has come to be known as "p-hacking".

Even when researchers are well-intentioned and do not (consciously) engage in p-hacking, though, studies are susceptible to false significance. The authors discuss several examples of published papers that claimed rather questionable results based on experimental data. In one, a Cornell professor found evidence for extra-sensory perception (ESP) in college students when visualizing erotic images (but not for non-erotic ones). In another, researchers studying the effect of menstrual cycles and female voting patterns concluded that single women were more liberal (and hence more likely to vote for Barack Obama in 2012) during ovulation, while married women were more conservative (and more likely to vote for Mitt Romney) during ovulation. The effect of the menstrual cycle was shown to be huge (a 17 percentage point swing among conservative women). Gelman, who happens to have studied the behavior of American voters in some detail, called B.S. on this result.

Gelman and Loken pick apart these studies, noting that the researchers could have just as convincingly concocted opposite or different hypotheses if the data had turned another way.

The authors conclude:

We are hardly the first to express concern over the use of p-values to justify scientific claims, or to point out that multiple comparisons invalidate p-values. Our contribution is simply to note that because the justification for p-values lies in what would have happened across multiple data sets, it is relevant to consider whether any choices in analysis and interpretation are data dependent and would have been different given other possible data.

So it turns out that even some of the 53 percent of statistics that weren't made up on the spot may be on slippery ground.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by zeigerpuppy on Tuesday November 18 2014, @12:31AM

    by zeigerpuppy (1298) on Tuesday November 18 2014, @12:31AM (#117031)

    One of the biggest problems with the methodology of p values relates to the underlying assumptions.
    These are essentially the frequentist position that there is some true distribution that is fixed and that we approximate by sampling. Of course our samples are incomplete and the more times we look the more chance we will find false correlations.
    The main problem with this approach is that we have a limited ability to integrate past knowledge, in fact we are statistically penalized for resampling.
    This is where Bayesian statistics can help. In this approach we are encouraged to sample and build up hypotheses that are tested against new data.
    I would argue the Bayesian framework is also better for resolving scientific conflicts where opposing camps of researchers refuse to adequately compare their hypotheses and resort to further sub sampling.

    Some further discussion here http://stats.stackexchange.com/questions/22/bayesian-and-frequentist-reasoning-in-plain-english [stackexchange.com]

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2