The Conundrum of Statistical Significance

posted by Blackmoore on Monday November 17 2014, @07:30PM

from the whats-the-chance-this-makes-the-front-page dept.

An Anonymous Coward writes:

Research articles published in scientific journals are routinely backed by studies which show statistical significance with a p-value less than 0.05 (or sometimes 0.01), i.e. less than five (or one) percent probability that the null hypothesis is correct. By implication, the data shows a greater than 95 (or 99) percent chance that the alternative hypothesis proposed by the researchers is correct. It was noticed a long time ago, however, that it's quite possible for an incorrect or even outlandish hypothesis to be validated as statistically significant, for the same reason that an avid poker player will occasionally draw a three-of-a-kind in five-card draw; low probability events sometimes happen. Or, as Richard Feynman once told a roomful of CalTech freshmen:

“You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!”

Andrew Gelman and Eric Loken take a crack at the conundrum of statistical significance in an essay published in American Scientist. The authors note that the problem of false statistical significance is well known in the scientific community, and is particularly likely to surface if researchers slice and dice their data every which way until they find something that appears to be statistically significant, a phenomenom which has come to be known as "p-hacking".

Even when researchers are well-intentioned and do not (consciously) engage in p-hacking, though, studies are susceptible to false significance. The authors discuss several examples of published papers that claimed rather questionable results based on experimental data. In one, a Cornell professor found evidence for extra-sensory perception (ESP) in college students when visualizing erotic images (but not for non-erotic ones). In another, researchers studying the effect of menstrual cycles and female voting patterns concluded that single women were more liberal (and hence more likely to vote for Barack Obama in 2012) during ovulation, while married women were more conservative (and more likely to vote for Mitt Romney) during ovulation. The effect of the menstrual cycle was shown to be huge (a 17 percentage point swing among conservative women). Gelman, who happens to have studied the behavior of American voters in some detail, called B.S. on this result.

Gelman and Loken pick apart these studies, noting that the researchers could have just as convincingly concocted opposite or different hypotheses if the data had turned another way.

The authors conclude:

We are hardly the first to express concern over the use of p-values to justify scientific claims, or to point out that multiple comparisons invalidate p-values. Our contribution is simply to note that because the justification for p-values lies in what would have happened across multiple data sets, it is relevant to consider whether any choices in analysis and interpretation are data dependent and would have been different given other possible data.

So it turns out that even some of the 53 percent of statistics that weren't made up on the spot may be on slippery ground.

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In