Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday April 19 2019, @06:34PM   Printer-friendly
from the significant-change dept.

In science, the success of an experiment is often determined by a measure called "statistical significance." A result is considered to be "significant" if the difference observed in the experiment between groups (of people, plants, animals and so on) would be very unlikely if no difference actually exists. The common cutoff for "very unlikely" is that you'd see a difference as big or bigger only 5 percent of the time if it wasn't really there — a cutoff that might seem, at first blush, very strict.

It sounds esoteric, but statistical significance has been used to draw a bright line between experimental success and failure. Achieving an experimental result with statistical significance often determines if a scientist's paper gets published or if further research gets funded. That makes the measure far too important in deciding research priorities, statisticians say, and so it's time to throw it in the trash.

More than 800 statisticians and scientists are calling for an end to judging studies by statistical significance in a March 20 comment published in Nature. An accompanying March 20 special issue of the American Statistician makes the manifesto crystal clear in its introduction: "'statistically significant' — don't say it and don't use it."

There is good reason to want to scrap statistical significance. But with so much research now built around the concept, it's unclear how — or with what other measures — the scientific community could replace it. The American Statistician offers a full 43 articles exploring what scientific life might look like without this measure in the mix.

Statistical Significance

Is is time for "P is less than or equal to 0.05" to be abandoned or changed ??


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by ikanreed on Friday April 19 2019, @06:49PM (2 children)

    by ikanreed (3164) Subscriber Badge on Friday April 19 2019, @06:49PM (#832244) Journal

    Decentralize.

    I've read the back and forth on this. The biggest contingent just wants to make "chance of happening randomly" less relevant. It's still going to be something you'll want to do an analysis of. "Oh wow, I found an effect size of 100% in this sample" then your sample size is 3, and you do the p analysis and find it could happen randomly within your distribution 1 out of 5 times?

    The main problem we have is that "significant" is frequently not significant in the real and intuitive sense, in that it doesn't inform us of something predictive.

    My opinion is that the thing to do is up front hypotheses, before any analysis or data collection is done. It would do more to cure the p-hacking than any amount of stricture about what kinds of analysis are "good enough".

    Starting Score:    1  point
    Moderation   +1  
       Insightful=1, Total=1
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 0) by Anonymous Coward on Friday April 19 2019, @07:57PM

    by Anonymous Coward on Friday April 19 2019, @07:57PM (#832263)

    What does "happen randomly" mean? There are always multiple models of "random chance" available derived from different assumptions. Eg, binomial vs poisson binomial.

    https://en.m.wikipedia.org/wiki/Binomial_distribution [wikipedia.org]
    https://en.m.wikipedia.org/wiki/Poisson_binomial_distribution [wikipedia.org]

    You are testing the validity of the assumptions, not chance.

  • (Score: 5, Insightful) by jb on Saturday April 20 2019, @01:07AM

    by jb (338) on Saturday April 20 2019, @01:07AM (#832389)

    My opinion is that the thing to do is up front hypotheses, before any analysis or data collection is done. It would do more to cure the p-hacking than any amount of stricture about what kinds of analysis are "good enough".

    Precisely. And that's the way things were for decades (or even centuries, depending on which field of science you're interested in), until the current fad of "junk science" took off.

    The problem is right there in the opening sentence of TFS:

    In science, the success of an experiment is often determined by a measure called "statistical significance."

    When I was at university (a long time ago), doing that would have earned me a fail. It was drummed into us over & over again that inferential statistics (of any kind) are only useful as a sanity check, after the fact. Reversing the order of things (by just running a bunch of stats on an existing data set then manufacturing a hypothesis afterwards to fit the strongest statistical result) was regarded, quite rightly, as cheating, since such "results" are meaningless.

    There's nothing wrong with using suitable statistics to help confirm the validity of a properly designed experiment after its completion. But using them to come up with a proposition to "test" (it's no test at all by then) is more akin to astrology than science...