Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday July 26 2017, @10:39AM   Printer-friendly
from the probably-a-good-idea dept.

Statistician Valen Johnson and 71 other researchers have proposed a redefinition of statistical significance in order to cut down on irreproducible results, especially those in the biomedical sciences. They propose "to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005" in a preprint article that will be published in an upcoming issue of Nature Human Behavior:

A megateam of reproducibility-minded scientists is renewing a controversial proposal to raise the standard for statistical significance in research studies. They want researchers to dump the long-standing use of a probability value (p-value) of less than 0.05 as the gold standard for significant results, and replace it with the much stiffer p-value threshold of 0.005.

Backers of the change, which has been floated before, say it could dramatically reduce the reporting of false-positive results—studies that claim to find an effect when there is none—and so make more studies reproducible. And they note that researchers in some fields, including genome analysis, have already made a similar switch with beneficial results.

"If we're going to be in a world where the research community expects some strict cutoff ... it's better that that threshold be .005 than .05. That's an improvement over the status quo," says behavioral economist Daniel Benjamin of the University of Southern California in Los Angeles, first author on the new paper, which was posted 22 July as a preprint article [open, DOI: 10.17605/OSF.IO/MKY9J] [DX] on PsyArXiv and is slated for an upcoming issue of Nature Human Behavior. "It seemed like this was something that was doable and easy, and had worked in other fields."

But other scientists reject the idea of any absolute threshold for significance. And some biomedical researchers worry the approach could needlessly drive up the costs of drug trials. "I can't be very enthusiastic about it," says biostatistician Stephen Senn of the Luxembourg Institute of Health in Strassen. "I don't think they've really worked out the practical implications of what they're talking about."

They have proposed a P-value of 0.005 because it corresponds to Bayes factors between approximately 14 and 26 in favor of H1 (the alternative hypothesis), indicating "substantial" to "strong" evidence, and because it would reduce the false positive rate to levels they have judged to be reasonable "in many fields".

Is this good enough? Is it a good start?

OSF project page. If you have trouble downloading the PDF, use this link.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Wednesday July 26 2017, @02:26PM (3 children)

    by Anonymous Coward on Wednesday July 26 2017, @02:26PM (#544647)

    Not really, increasing the amount of data that you're crunching doesn't guarantee better results if the data you're crunching is based on crap data. For example, it doesn't really matter how many shoe sizes you've collected if you're trying to determine the kind of paintings somebody likes. The two things are effectively completely dissimilar and as such, you're not going to get a meaningful result. It gets even worse when you start combining more and more things.

    The data sciences are getting to be a cargo cult where companies keep collecting more and more data hoping to figure out what to do with it, but not paying attention to other issues like contamination.

    Raising the threshold reduces the noise because it means you need a stronger correlation before something is reported on. Yes, it does somewhat discourage new research, but let's be honest about the way that people have used new research to justify all sorts of things only to find out that it was a fluke or a mistake. You can still do new research, the problem is like the replication experiments, it's not sexy, so it can be a challenge to get funding for it, even though it's a terribly important part of the experiment.

    The other thing this does is somewhat slow the speed of advancement as we need to be more sure than with the current recommended value. But, let's be honest, for the most part we're at a point where we can afford to slow research down in order to get results that are an order of magnitude more reliable. What we can't particularly afford is to have a bunch of unreliable science that we're not even sure if it's right.

  • (Score: 2) by cafebabe on Wednesday July 26 2017, @03:31PM (2 children)

    by cafebabe (894) on Wednesday July 26 2017, @03:31PM (#544675) Journal

    The two things are effectively completely dissimilar and as such, you're not going to get a meaningful result. It gets even worse when you start combining more and more things.

    Someone may have to correct my figures but, as I understand, accuracy is proportional to the square root of the number of samples. So, doubling sample quality requires quadrupling the number of samples. (Workload increases by a factor of four to gain one additional bit of accuracy.) To improve accuracy by a factor of 10 requires more than three quadruplings of sample data. With far less effort (and cost), it is easier to collect more variables. Cross-corollation may be completely random but opportunities to find a pattern are O(n^2). If any corollation meets an arbitrary standard then it is a positive result to publish even if it cannot be replicated.

    --
    1702845791×2
    • (Score: 0) by Anonymous Coward on Wednesday July 26 2017, @03:40PM

      by Anonymous Coward on Wednesday July 26 2017, @03:40PM (#544681)

      You're talking about precision, not accuracy.

      If your input data are, for some reason, skewed to give a misleading result, then a larger data set will not improve accuracy. You will, with greater precision, zero in on your skewed answer.

    • (Score: 0) by Anonymous Coward on Wednesday July 26 2017, @05:43PM

      by Anonymous Coward on Wednesday July 26 2017, @05:43PM (#544758)

      If you have biased sampling, data that's not applicable or just weird data adding more won't help.

      You have to have a decent model and decent data to have any hope of making a meaningful conclusion. The Stanford Prison Experiment never replicated because they randomly found more psychopaths than normal. The study was fine, but adding more data points would only help if they weren't selecting from a population with an abnormal number of psychopaths. Otherwise they'd get the same results with more decimal places.