Psychologist Daniël Lakens disagrees with a proposal to redefine statistical significance to require a 0.005 p-value, and has crowdsourced an alternative set of recommendations with 87 co-authors:
Psychologist Daniël Lakens of Eindhoven University of Technology in the Netherlands is known for speaking his mind, and after he read an article titled "Redefine Statistical Significance" on 22 July 2017, Lakens didn't pull any punches: "Very disappointed such a large group of smart people would give such horribly bad advice," he tweeted.
In the paper, posted on the preprint server PsyArXiv, 70 prominent scientists argued in favor of lowering a widely used threshold for statistical significance in experimental studies: The so-called p-value should be below 0.005 instead of the accepted 0.05, as a way to reduce the rate of false positive findings and improve the reproducibility of science. Lakens, 37, thought it was a disastrous idea. A lower α, or significance level, would require much bigger sample sizes, making many studies impossible. Besides. he says, "Why prescribe a single p-value, when science is so diverse?"
Lakens and others will soon publish their own paper to propose an alternative; it was accepted on Monday by Nature Human Behaviour, which published the original paper proposing a lower threshold in September 2017. The content won't come as a big surprise—a preprint has been up on PsyArXiv for 4 months—but the paper is unique for the way it came about: from 100 scientists around the world, from big names to Ph.D. students, and even a few nonacademics writing and editing in a Google document for 2 months.
Lakens says he wanted to make the initiative as democratic as possible: "I just allowed anyone who wanted to join and did not approach any famous scientists."
(Score: 5, Informative) by AthanasiusKircher on Sunday January 28 2018, @10:33PM
From page 15 of the preprint:
This all sounds eminently reasonable. Focus on a single statistical parameter is never a good thing to determine whether a result is meaningful. People can p-hack at 0.005 just as they have at 0.05. If you think that's harder, you haven't thought about how easy it is in psychology when you're measuring a bunch of parameters and now just have to find a few more interesting ways to create combinations of data points that are hackable. I've seen plenty of studies which have claimed p thresholds of 0.005 or 0.001 or even more, but it's clear upon closer examination that they got those results through a combination of stuff like p-hacking, bad data collection, bad interpretation, biasing (either conscious or unconscious) the experimental design or calculation of results, etc.
So, focusing on a variety of stats that may or may not have particular relevance to a particular situation is good. Even better is the call for review of statistical standards BEFORE data collection. If you set your thresholds for significance or whatever, outline exactly how you plan to collect and manipulate the data, etc. IN ADVANCE, it's a lot harder (short of outright fabrication of data) to "massage" things to find something of supposed "significance." (And please note that a lot of this is likely done unintentionally: people just don't have an intuitive sense of how different stats or types of analysis may suddenly alter the thresholds or ease with which they can appear to have an interesting result.)
Sure, it's hard to set up these sorts of things for thorough statistical review in advance for an exploratory study where you're not quite sure what you may find. But in that case, you can be honest about how vague the results are -- and then follow-up studies with more rigor can be designed if some preliminary finding seems worthwhile. The point is being transparent about how the statistical standards were created for a particular study and how they were then applied to the data and interpreted.