from the finding-significance dept.
Psychologist Daniël Lakens disagrees with a proposal to redefine statistical significance to require a 0.005 p-value, and has crowdsourced an alternative set of recommendations with 87 co-authors:
Psychologist Daniël Lakens of Eindhoven University of Technology in the Netherlands is known for speaking his mind, and after he read an article titled "Redefine Statistical Significance" on 22 July 2017, Lakens didn't pull any punches: "Very disappointed such a large group of smart people would give such horribly bad advice," he tweeted.
In the paper, posted on the preprint server PsyArXiv, 70 prominent scientists argued in favor of lowering a widely used threshold for statistical significance in experimental studies: The so-called p-value should be below 0.005 instead of the accepted 0.05, as a way to reduce the rate of false positive findings and improve the reproducibility of science. Lakens, 37, thought it was a disastrous idea. A lower α, or significance level, would require much bigger sample sizes, making many studies impossible. Besides. he says, "Why prescribe a single p-value, when science is so diverse?"
Lakens and others will soon publish their own paper to propose an alternative; it was accepted on Monday by Nature Human Behaviour, which published the original paper proposing a lower threshold in September 2017. The content won't come as a big surprise—a preprint has been up on PsyArXiv for 4 months—but the paper is unique for the way it came about: from 100 scientists around the world, from big names to Ph.D. students, and even a few nonacademics writing and editing in a Google document for 2 months.
Lakens says he wanted to make the initiative as democratic as possible: "I just allowed anyone who wanted to join and did not approach any famous scientists."
Statistician Valen Johnson and 71 other researchers have proposed a redefinition of statistical significance in order to cut down on irreproducible results, especially those in the biomedical sciences. They propose "to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005" in a preprint article that will be published in an upcoming issue of Nature Human Behavior:
A megateam of reproducibility-minded scientists is renewing a controversial proposal to raise the standard for statistical significance in research studies. They want researchers to dump the long-standing use of a probability value (p-value) of less than 0.05 as the gold standard for significant results, and replace it with the much stiffer p-value threshold of 0.005.
Backers of the change, which has been floated before, say it could dramatically reduce the reporting of false-positive results—studies that claim to find an effect when there is none—and so make more studies reproducible. And they note that researchers in some fields, including genome analysis, have already made a similar switch with beneficial results.
"If we're going to be in a world where the research community expects some strict cutoff ... it's better that that threshold be .005 than .05. That's an improvement over the status quo," says behavioral economist Daniel Benjamin of the University of Southern California in Los Angeles, first author on the new paper, which was posted 22 July as a preprint article [open, DOI: 10.17605/OSF.IO/MKY9J] [DX] on PsyArXiv and is slated for an upcoming issue of Nature Human Behavior. "It seemed like this was something that was doable and easy, and had worked in other fields."
But other scientists reject the idea of any absolute threshold for significance. And some biomedical researchers worry the approach could needlessly drive up the costs of drug trials. "I can't be very enthusiastic about it," says biostatistician Stephen Senn of the Luxembourg Institute of Health in Strassen. "I don't think they've really worked out the practical implications of what they're talking about."
They have proposed a P-value of 0.005 because it corresponds to Bayes factors between approximately 14 and 26 in favor of H1 (the alternative hypothesis), indicating "substantial" to "strong" evidence, and because it would reduce the false positive rate to levels they have judged to be reasonable "in many fields".
Is this good enough? Is it a good start?