Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Sunday January 28 2018, @08:52PM   Printer-friendly
from the finding-significance dept.

Psychologist Daniël Lakens disagrees with a proposal to redefine statistical significance to require a 0.005 p-value, and has crowdsourced an alternative set of recommendations with 87 co-authors:

Psychologist Daniël Lakens of Eindhoven University of Technology in the Netherlands is known for speaking his mind, and after he read an article titled "Redefine Statistical Significance" on 22 July 2017, Lakens didn't pull any punches: "Very disappointed such a large group of smart people would give such horribly bad advice," he tweeted.

In the paper, posted on the preprint server PsyArXiv, 70 prominent scientists argued in favor of lowering a widely used threshold for statistical significance in experimental studies: The so-called p-value should be below 0.005 instead of the accepted 0.05, as a way to reduce the rate of false positive findings and improve the reproducibility of science. Lakens, 37, thought it was a disastrous idea. A lower α, or significance level, would require much bigger sample sizes, making many studies impossible. Besides. he says, "Why prescribe a single p-value, when science is so diverse?"

Lakens and others will soon publish their own paper to propose an alternative; it was accepted on Monday by Nature Human Behaviour, which published the original paper proposing a lower threshold in September 2017. The content won't come as a big surprise—a preprint has been up on PsyArXiv for 4 months—but the paper is unique for the way it came about: from 100 scientists around the world, from big names to Ph.D. students, and even a few nonacademics writing and editing in a Google document for 2 months.

Lakens says he wanted to make the initiative as democratic as possible: "I just allowed anyone who wanted to join and did not approach any famous scientists."


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by AthanasiusKircher on Sunday January 28 2018, @10:33PM

    by AthanasiusKircher (5291) on Sunday January 28 2018, @10:33PM (#629617) Journal

    From page 15 of the preprint:

    We have two key recommendations. First, we recommend that the label “statistically significant” should no longer be used. Instead, researchers should provide more meaningful interpretations of the theoretical or practical relevance of their results. Second, authors should transparently specify—and justify—their design choices. Depending on their choice of statistical approach, these may include the alpha level, the null and alternative models, assumed prior odds, statistical power for a specified effect size of interest, the sample size, and/or the desired accuracy of estimation. We do not endorse a single value for any design parameter, but instead propose that authors justify their choices before data are collected. Fellow researchers can then evaluate these decisions, ideally also prior to data collection, for example, by reviewing a Registered Report submission. Providing researchers (and reviewers) with accessible information about ways to justify (and evaluate) design choices, tailored to specific research areas, will improve current research practices.

    This all sounds eminently reasonable. Focus on a single statistical parameter is never a good thing to determine whether a result is meaningful. People can p-hack at 0.005 just as they have at 0.05. If you think that's harder, you haven't thought about how easy it is in psychology when you're measuring a bunch of parameters and now just have to find a few more interesting ways to create combinations of data points that are hackable. I've seen plenty of studies which have claimed p thresholds of 0.005 or 0.001 or even more, but it's clear upon closer examination that they got those results through a combination of stuff like p-hacking, bad data collection, bad interpretation, biasing (either conscious or unconscious) the experimental design or calculation of results, etc.

    So, focusing on a variety of stats that may or may not have particular relevance to a particular situation is good. Even better is the call for review of statistical standards BEFORE data collection. If you set your thresholds for significance or whatever, outline exactly how you plan to collect and manipulate the data, etc. IN ADVANCE, it's a lot harder (short of outright fabrication of data) to "massage" things to find something of supposed "significance." (And please note that a lot of this is likely done unintentionally: people just don't have an intuitive sense of how different stats or types of analysis may suddenly alter the thresholds or ease with which they can appear to have an interesting result.)

    Sure, it's hard to set up these sorts of things for thorough statistical review in advance for an exploratory study where you're not quite sure what you may find. But in that case, you can be honest about how vague the results are -- and then follow-up studies with more rigor can be designed if some preliminary finding seems worthwhile. The point is being transparent about how the statistical standards were created for a particular study and how they were then applied to the data and interpreted.

    Starting Score:    1  point
    Moderation   +4  
       Interesting=1, Informative=3, Total=4
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5