Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday July 26 2017, @10:39AM   Printer-friendly
from the probably-a-good-idea dept.

Statistician Valen Johnson and 71 other researchers have proposed a redefinition of statistical significance in order to cut down on irreproducible results, especially those in the biomedical sciences. They propose "to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005" in a preprint article that will be published in an upcoming issue of Nature Human Behavior:

A megateam of reproducibility-minded scientists is renewing a controversial proposal to raise the standard for statistical significance in research studies. They want researchers to dump the long-standing use of a probability value (p-value) of less than 0.05 as the gold standard for significant results, and replace it with the much stiffer p-value threshold of 0.005.

Backers of the change, which has been floated before, say it could dramatically reduce the reporting of false-positive results—studies that claim to find an effect when there is none—and so make more studies reproducible. And they note that researchers in some fields, including genome analysis, have already made a similar switch with beneficial results.

"If we're going to be in a world where the research community expects some strict cutoff ... it's better that that threshold be .005 than .05. That's an improvement over the status quo," says behavioral economist Daniel Benjamin of the University of Southern California in Los Angeles, first author on the new paper, which was posted 22 July as a preprint article [open, DOI: 10.17605/OSF.IO/MKY9J] [DX] on PsyArXiv and is slated for an upcoming issue of Nature Human Behavior. "It seemed like this was something that was doable and easy, and had worked in other fields."

But other scientists reject the idea of any absolute threshold for significance. And some biomedical researchers worry the approach could needlessly drive up the costs of drug trials. "I can't be very enthusiastic about it," says biostatistician Stephen Senn of the Luxembourg Institute of Health in Strassen. "I don't think they've really worked out the practical implications of what they're talking about."

They have proposed a P-value of 0.005 because it corresponds to Bayes factors between approximately 14 and 26 in favor of H1 (the alternative hypothesis), indicating "substantial" to "strong" evidence, and because it would reduce the false positive rate to levels they have judged to be reasonable "in many fields".

Is this good enough? Is it a good start?

OSF project page. If you have trouble downloading the PDF, use this link.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by Virindi on Wednesday July 26 2017, @11:44AM (5 children)

    by Virindi (3484) on Wednesday July 26 2017, @11:44AM (#544598)

    And worse, working with preexisting data makes it very tempting to fit your hypothesis to the data. Same with the general disdain for "our hypothesis was disproved" papers.

    That's the real problem that needs to be addressed and changing p-values does not directly address it. The probability of SOME pattern appearing in random noise is high, and people are picking their theory to fit that pattern. Then they are using statistical methods based on a "formulate hypothesis"->"gather data"->"check against hypothesis" model. Big Data is the worst for this, it seems.

    Starting Score:    1  point
    Moderation   +2  
       Insightful=2, Total=2
    Extra 'Insightful' Modifier   0  

    Total Score:   3  
  • (Score: 2) by FakeBeldin on Wednesday July 26 2017, @12:26PM (3 children)

    by FakeBeldin (3360) on Wednesday July 26 2017, @12:26PM (#544605) Journal

    The model you quote seems apt.
    My worry is that nowadays, it seems more often it seems to be:

    gather data -> formulate hypothesis -> investigate data -> adapt hypothesis to investigation -> check hypothesis -> publish

    Validation sets are too often used to formulate the hypothesis.

    • (Score: 1) by Virindi on Wednesday July 26 2017, @02:19PM

      by Virindi (3484) on Wednesday July 26 2017, @02:19PM (#544644)

      Yep that's what I was saying :)

      It's lazy mode.

      Then of course there is the whole other category of "models which we can't properly test so we just rely on care and the authors being at a good institution", which is a similar problem.

    • (Score: 2) by cafebabe on Wednesday July 26 2017, @02:41PM (1 child)

      by cafebabe (894) on Wednesday July 26 2017, @02:41PM (#544653) Journal

      It would be an improvement if multiple theories were proposed and theories which didn't fit were discarded. This may appear less honed but tweaking a hypothesis prior to publication is akin to one of Rudyard Kipling's Just So Stories. Science should have predictive power and be falsifiable. If there is nothing to predict and nothing to falsify then it isn't science.

      --
      1702845791×2
      • (Score: 0) by Anonymous Coward on Wednesday July 26 2017, @03:58PM

        by Anonymous Coward on Wednesday July 26 2017, @03:58PM (#544694)

        It would be an improvement if multiple theories were proposed and theories which didn't fit were discarded.

        Improvement? Without that you have no science.

  • (Score: 2) by maxwell demon on Wednesday July 26 2017, @07:07PM

    by maxwell demon (1608) on Wednesday July 26 2017, @07:07PM (#544804) Journal

    And worse, working with preexisting data makes it very tempting to fit your hypothesis to the data. Same with the general disdain for "our hypothesis was disproved" papers.

    On the other hand, you do want some means against "we invent a wild hypothesis just in order to promptly disprove it". You don't want articles like:

    Watching Doctor Who does not cause broken legs

    Are you more likely to break your leg if you regularly watch Doctor Who? By comparing the number of fractures from watchers of Doctor Who versus watchers of Star Trek or Babylon 5 showed no correlations. The comparison between Star Trek and Babylon 5 watchers is inconclusive; more research is required.

    --
    The Tao of math: The numbers you can count are not the real numbers.