Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday July 26 2017, @10:39AM   Printer-friendly
from the probably-a-good-idea dept.

Statistician Valen Johnson and 71 other researchers have proposed a redefinition of statistical significance in order to cut down on irreproducible results, especially those in the biomedical sciences. They propose "to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005" in a preprint article that will be published in an upcoming issue of Nature Human Behavior:

A megateam of reproducibility-minded scientists is renewing a controversial proposal to raise the standard for statistical significance in research studies. They want researchers to dump the long-standing use of a probability value (p-value) of less than 0.05 as the gold standard for significant results, and replace it with the much stiffer p-value threshold of 0.005.

Backers of the change, which has been floated before, say it could dramatically reduce the reporting of false-positive results—studies that claim to find an effect when there is none—and so make more studies reproducible. And they note that researchers in some fields, including genome analysis, have already made a similar switch with beneficial results.

"If we're going to be in a world where the research community expects some strict cutoff ... it's better that that threshold be .005 than .05. That's an improvement over the status quo," says behavioral economist Daniel Benjamin of the University of Southern California in Los Angeles, first author on the new paper, which was posted 22 July as a preprint article [open, DOI: 10.17605/OSF.IO/MKY9J] [DX] on PsyArXiv and is slated for an upcoming issue of Nature Human Behavior. "It seemed like this was something that was doable and easy, and had worked in other fields."

But other scientists reject the idea of any absolute threshold for significance. And some biomedical researchers worry the approach could needlessly drive up the costs of drug trials. "I can't be very enthusiastic about it," says biostatistician Stephen Senn of the Luxembourg Institute of Health in Strassen. "I don't think they've really worked out the practical implications of what they're talking about."

They have proposed a P-value of 0.005 because it corresponds to Bayes factors between approximately 14 and 26 in favor of H1 (the alternative hypothesis), indicating "substantial" to "strong" evidence, and because it would reduce the false positive rate to levels they have judged to be reasonable "in many fields".

Is this good enough? Is it a good start?

OSF project page. If you have trouble downloading the PDF, use this link.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by RamiK on Wednesday July 26 2017, @11:38AM (3 children)

    by RamiK (1813) on Wednesday July 26 2017, @11:38AM (#544594)

    If anything, it will only make things worse as seeking cures instead of treatments will become even harder to financially justify. Moreover, it will shift the risk-profit equation for hiding negative findings and inflating positive results towards the wrong direction: Already, failing an experiment at human trials means over $20million down the drain. But by making things more expensive by requiring 10 times the sample size, the risks for lying will stay the same while the potential loses would only increase.

    Overall, the only reasonable solutions I've heard so far were requiring the disclosure of research funding and the results of all failed experiments regardless of NDAs.

    --
    compiling...
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 1, Informative) by Anonymous Coward on Wednesday July 26 2017, @02:32PM

    by Anonymous Coward on Wednesday July 26 2017, @02:32PM (#544650)

    I don't see a problem with that at all. Vioxx alone was approved and wound up costing the company that produced it billions of dollars because of the various law suits. Same goes for Johnson & Johnson's talcum powder products.

    Companies somehow find the money for the research, might as well make them improve their standards to the point where they work. Ultimately, calling it failure when a trial results in a different result than you expected is ignorant. That's not a failure, that's science functioning as it's supposed to. Failure is taking that data and repackaging it for use as a new paper rather than as a new hypothesis to make predictions based on and test.

    Medical research is crap research for other reasons. Changing from 5% to 0.5% isn't going to make it that much harder to conduct studies. Medical research is going to be crap because there are few longitudinal studies that look at the effects of drugs and treatments over the long term and it's considered unethical to withhold treatments that are believed effective. The result is that our understanding of whether or not treatments work and if they're even safe is extremely limited.

  • (Score: 0) by Anonymous Coward on Thursday July 27 2017, @07:41AM (1 child)

    by Anonymous Coward on Thursday July 27 2017, @07:41AM (#545044)

    THIS IS THE WHOLE POINT. By the time you get to human testing if you're not extremely confident you can hit a 99.5% threshold, then you shouldn't be testing that product.

    I'm fine with it encouraging falsification. The nice thing is that the tighter the threshold for significance becomes, the more evident any falsification becomes which can result in actions against companies. E.g. imagine the threshold for significance was 0.000000001. It's reasonable safe to say that anything hat hits that that threshold is either genuine, or the 'scientists' behind it faked their data. The current 95% leaves a lot of room for plausible deniability.

    In my opinion professional medical research companies are not a great thing. Human longevity has been largely improved by relatively simple things like better hygenic habits, food cleanliness, and then some medicines that could hit pretty much any threshold of significance like penicillin, anesthetics, and certain high reliability low-side effect vaccines like smallpox. And then there's perhaps the biggest reason for the increase in longevity - peace. War kills people not just in war, but in the disruption of civil order with things like food production and distribution. We live in an era of unprecedented peace, relative to the past.

    • (Score: 0) by Anonymous Coward on Thursday July 27 2017, @08:12AM

      by Anonymous Coward on Thursday July 27 2017, @08:12AM (#545064)

      E.g. imagine the threshold for significance was 0.000000001. It's reasonable safe to say that anything hat hits that that threshold is either genuine, or the 'scientists' behind it faked their data.

      The problem isn't whether they detected something "genuine", it is whether they are detecting something anyone should care about. For example you can download a database of p-values from here: https://github.com/jtleek/tidypvals [github.com] then sort the database by pvalue:

      require(tidypvals)

      allp = allp[order(allp$pvalue),]
      allp = allp[allp$pvalue > 0,]

      head(allp, 20)

      Here is the top hit, it is a totally meaningless p-value because it "detects" that if you plug different info into different equations they won't give you exactly the same answer:

      Table 4 shows the McFadden’s and McKelvey and Zavoina pseudo-r2 values for the empty and full models with and without distance to TSS for suggestively and significantly trait-associated SNPs. The logistic regression model without the distance to TSS for the significantly associated SNPs explained 11-25% of the observed variance, which was an increase of 4-11% when compared to the empty model, which only included the effects of the genotyping arrays. An ANOVA test, using a chi-squared test, showed the difference between the two models to be significant (Deviance = 1501.00, P-value = 3.13 × 10^-309).

      https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3600003/ [nih.gov]

      BTW, that p-value (3.13e-309) is much smaller than your 1e-9 (~12k p-values smaller than that are in the database). The vast majority of p-values are meaningless like this, there is no reason to care at all about what their value is.