from the whats-the-chance-this-makes-the-front-page dept.
Research articles published in scientific journals are routinely backed by studies which show statistical significance with a p-value less than 0.05 (or sometimes 0.01), i.e. less than five (or one) percent probability that the null hypothesis is correct. By implication, the data shows a greater than 95 (or 99) percent chance that the alternative hypothesis proposed by the researchers is correct. It was noticed a long time ago, however, that it's quite possible for an incorrect or even outlandish hypothesis to be validated as statistically significant, for the same reason that an avid poker player will occasionally draw a three-of-a-kind in five-card draw; low probability events sometimes happen. Or, as Richard Feynman once told a roomful of CalTech freshmen:
“You know, the most amazing thing happened to me tonight... I saw a car with the license plate ARW 357. Can you imagine? Of all the millions of license plates in the state, what was the chance that I would see that particular one tonight? Amazing!”
Andrew Gelman and Eric Loken take a crack at the conundrum of statistical significance in an essay published in American Scientist. The authors note that the problem of false statistical significance is well known in the scientific community, and is particularly likely to surface if researchers slice and dice their data every which way until they find something that appears to be statistically significant, a phenomenom which has come to be known as "p-hacking".
Even when researchers are well-intentioned and do not (consciously) engage in p-hacking, though, studies are susceptible to false significance. The authors discuss several examples of published papers that claimed rather questionable results based on experimental data. In one, a Cornell professor found evidence for extra-sensory perception (ESP) in college students when visualizing erotic images (but not for non-erotic ones). In another, researchers studying the effect of menstrual cycles and female voting patterns concluded that single women were more liberal (and hence more likely to vote for Barack Obama in 2012) during ovulation, while married women were more conservative (and more likely to vote for Mitt Romney) during ovulation. The effect of the menstrual cycle was shown to be huge (a 17 percentage point swing among conservative women). Gelman, who happens to have studied the behavior of American voters in some detail, called B.S. on this result.
Gelman and Loken pick apart these studies, noting that the researchers could have just as convincingly concocted opposite or different hypotheses if the data had turned another way.
The authors conclude:
We are hardly the first to express concern over the use of p-values to justify scientific claims, or to point out that multiple comparisons invalidate p-values. Our contribution is simply to note that because the justification for p-values lies in what would have happened across multiple data sets, it is relevant to consider whether any choices in analysis and interpretation are data dependent and would have been different given other possible data.
So it turns out that even some of the 53 percent of statistics that weren't made up on the spot may be on slippery ground.
(Score: 1) by Anonoob on Monday November 17 2014, @07:52PM
go and read a text book p-value does not mean what you think it means - (nuff said at least to 5% significance).
(Score: 0) by Anonymous Coward on Monday November 17 2014, @07:56PM
The p-values at least discourage reporting data that is so shitty that it can't be hacked to be significant.
Replication of results by independent groups or future research that builds off of the initial study are a better measure of what is real.
(Score: 2) by danmars on Monday November 17 2014, @08:02PM
Yet another article that says picking and choosing conclusions based on a data set is not science. This article suggests preregistration to solve the cherrypicking of conclusions problem, and pre-publication replication of results to solve the .05 problem. Both are reasonable suggestions, but (I predict) neither will happen unless they are a condition of funding for the experiment or for publication.
I'd lean toward the replication requirement as long as it's by an independent third party, and you need to replicate twice if it's a not pre-registered.
Basically, statistics work best if we understand and compensate for known types of errors. This article reiterates that such actions are necessary and that these types of errors matter. Please correct me if I'm wrong.
(Score: 0) by Anonymous Coward on Tuesday November 18 2014, @04:32PM
Yup, get data, make hypothesis based on the data, get DIFFERENT data, test the hypothesis.
(Score: 2) by edIII on Monday November 17 2014, @08:46PM
LOL. This is what we need. Money being spent on stupid crap to predict whether or not women on the period will vote for a candidate.
It's like we need Sally Struthers to eat a small child and look for her shadow this February to predict a Democratic or Republican candidate in the White House.
Technically, lunchtime is at any moment. It's just a wave function.
(Score: 2) by opinionated_science on Monday November 17 2014, @09:50PM
If the population at large was well educated in statistics (mathematics will do), there would be a lot less poverty because the objective evaluation of opportunity would be possible.
So no more lotteries, scratch cards, online poker. Probably a bunch of secular converts when they realise how made up dogma manipulates them due to a phenomenally unlikely .
But I'm not holding my breath...
(Score: 3, Insightful) by edIII on Tuesday November 18 2014, @01:02AM
Are the odds equally phenomenally unlikely that I will figure out what you meant to say?
But I'm not holding my breath... ;)
Also:
Lotteries & Scratch Cards - I play the lottery. Often. "You can't win, if you don't play". I'll stop playing the lottery when I stop hitting the damn power/mega ball so much. They tease me horribly. Scratch cards have remained enjoyable to me as entertainment because they *USE* the math to engineer the statistics so that I am winning the $500 prizes often enough. Either that, or I am a lucky bastard and you should throw me off a bridge. Regardless of whatever mental deficiencies I possess, they are selling me a product I am willing to buy for entertainment purposes.
This all works because the lottery is ostensibly for the public good and education right? So why would I object to paying a little more taxes, for good causes, that has a known mathematical probability of delivering me enough wealth such that I become one of the shiny happy people, meet Jennifer Lawrence (you had a head accident like 50 first dates or something), and copulate like rabbits on a private island someplace?
Don't deny me my dreams
Poker - Now this just sounds like you got your ass handed to you, since Poker is only *mostly* about the "luck". It's also about psychology and more than a little math, but mostly about mind fucking people. For money. It's like being paid as a psychologist, without the degree and moral frameworks. If you believe in the statistics that much..... can we get together and play some cards?
Moreover, men do not go to bars because they know the odds. They know the odds, and they are still in the bars. I expect more of the same.
Technically, lunchtime is at any moment. It's just a wave function.
(Score: 0) by Anonymous Coward on Monday November 17 2014, @09:20PM
This is the foundation of "social science", and only the "rigorous" among them even bother this much.
(Score: 0) by Anonymous Coward on Monday November 17 2014, @09:40PM
This is just a hit job on scientists.
NOTT EVEN NEWS
(Score: 2) by gidds on Monday November 17 2014, @09:57PM
Obligatory xkcd: Significant [xkcd.com]
[sig redacted]
(Score: 2) by maxwell demon on Tuesday November 18 2014, @08:34PM
Rarely an xkcd is as significant to the story as this one.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by TrumpetPower! on Monday November 17 2014, @10:25PM
Another way to look at it...if you're using a p-value of 0.05 as your standard, if everything is perfect and ideal, you'd still expect about 5% of your findings to be bogus.
To me, a p-value should be an initial filter. It should not be used to determine what is and isn't presumably really real, but, rather, to toss out the worst of the chaff so you can focus your attention on the remaining 5% that at least has a good chance of being what you're looking for.
But, until you've done the thorough validation, you either shouldn't be publishing stuff that passes a p-value test at all, or you should be publishing the entire list with instructions to others of, "This is the space we've done the preliminary search on; you probably don't need to bother with the stuff that failed the test but we think there might be something promising in the remainder."
b&
All but God can prove this sentence true.
(Score: 0) by Anonymous Coward on Monday November 17 2014, @10:44PM
That's why studies need to be repeated with greater sample sizes, sound methodology, by independent groups.
(Score: 2) by zeigerpuppy on Tuesday November 18 2014, @12:31AM
One of the biggest problems with the methodology of p values relates to the underlying assumptions.
These are essentially the frequentist position that there is some true distribution that is fixed and that we approximate by sampling. Of course our samples are incomplete and the more times we look the more chance we will find false correlations.
The main problem with this approach is that we have a limited ability to integrate past knowledge, in fact we are statistically penalized for resampling.
This is where Bayesian statistics can help. In this approach we are encouraged to sample and build up hypotheses that are tested against new data.
I would argue the Bayesian framework is also better for resolving scientific conflicts where opposing camps of researchers refuse to adequately compare their hypotheses and resort to further sub sampling.
Some further discussion here http://stats.stackexchange.com/questions/22/bayesian-and-frequentist-reasoning-in-plain-english [stackexchange.com]
(Score: 2) by TGV on Tuesday November 18 2014, @01:17PM
Fomr Nature: http://www.nature.com/news/scientific-method-statistical-errors-1.14700 [nature.com]
The best quote for me is:
To ignore [the odds that a real effect was there in the first place] would be like waking up with a headache and concluding that you have a rare brain tumour — possible, but so unlikely that it requires a lot more evidence to supersede an everyday explanation such as an allergic reaction. The more implausible the hypothesis — telepathy, aliens, homeopathy — the greater the chance that an exciting finding is a false alarm, no matter what the P value is.
(Score: 2) by maxwell demon on Tuesday November 18 2014, @09:11PM
Best summarized in the old sentence: Extraordinary claims need extraordinary evidence.
However the problem is exactly what is an "implausible" hypothesis. A homoeopath will not consider homoeopathy implausible, for example.
The Tao of math: The numbers you can count are not the real numbers.