Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Sunday January 28 2018, @08:52PM   Printer-friendly
from the finding-significance dept.

Psychologist Daniël Lakens disagrees with a proposal to redefine statistical significance to require a 0.005 p-value, and has crowdsourced an alternative set of recommendations with 87 co-authors:

Psychologist Daniël Lakens of Eindhoven University of Technology in the Netherlands is known for speaking his mind, and after he read an article titled "Redefine Statistical Significance" on 22 July 2017, Lakens didn't pull any punches: "Very disappointed such a large group of smart people would give such horribly bad advice," he tweeted.

In the paper, posted on the preprint server PsyArXiv, 70 prominent scientists argued in favor of lowering a widely used threshold for statistical significance in experimental studies: The so-called p-value should be below 0.005 instead of the accepted 0.05, as a way to reduce the rate of false positive findings and improve the reproducibility of science. Lakens, 37, thought it was a disastrous idea. A lower α, or significance level, would require much bigger sample sizes, making many studies impossible. Besides. he says, "Why prescribe a single p-value, when science is so diverse?"

Lakens and others will soon publish their own paper to propose an alternative; it was accepted on Monday by Nature Human Behaviour, which published the original paper proposing a lower threshold in September 2017. The content won't come as a big surprise—a preprint has been up on PsyArXiv for 4 months—but the paper is unique for the way it came about: from 100 scientists around the world, from big names to Ph.D. students, and even a few nonacademics writing and editing in a Google document for 2 months.

Lakens says he wanted to make the initiative as democratic as possible: "I just allowed anyone who wanted to join and did not approach any famous scientists."


Original Submission

Related Stories

Scientists Propose Change in P-Value Threshold for Statistical Significance 38 comments

Statistician Valen Johnson and 71 other researchers have proposed a redefinition of statistical significance in order to cut down on irreproducible results, especially those in the biomedical sciences. They propose "to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005" in a preprint article that will be published in an upcoming issue of Nature Human Behavior:

A megateam of reproducibility-minded scientists is renewing a controversial proposal to raise the standard for statistical significance in research studies. They want researchers to dump the long-standing use of a probability value (p-value) of less than 0.05 as the gold standard for significant results, and replace it with the much stiffer p-value threshold of 0.005.

Backers of the change, which has been floated before, say it could dramatically reduce the reporting of false-positive results—studies that claim to find an effect when there is none—and so make more studies reproducible. And they note that researchers in some fields, including genome analysis, have already made a similar switch with beneficial results.

"If we're going to be in a world where the research community expects some strict cutoff ... it's better that that threshold be .005 than .05. That's an improvement over the status quo," says behavioral economist Daniel Benjamin of the University of Southern California in Los Angeles, first author on the new paper, which was posted 22 July as a preprint article [open, DOI: 10.17605/OSF.IO/MKY9J] [DX] on PsyArXiv and is slated for an upcoming issue of Nature Human Behavior. "It seemed like this was something that was doable and easy, and had worked in other fields."

But other scientists reject the idea of any absolute threshold for significance. And some biomedical researchers worry the approach could needlessly drive up the costs of drug trials. "I can't be very enthusiastic about it," says biostatistician Stephen Senn of the Luxembourg Institute of Health in Strassen. "I don't think they've really worked out the practical implications of what they're talking about."

They have proposed a P-value of 0.005 because it corresponds to Bayes factors between approximately 14 and 26 in favor of H1 (the alternative hypothesis), indicating "substantial" to "strong" evidence, and because it would reduce the false positive rate to levels they have judged to be reasonable "in many fields".

Is this good enough? Is it a good start?

OSF project page. If you have trouble downloading the PDF, use this link.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: -1, Flamebait) by Anonymous Coward on Sunday January 28 2018, @09:04PM (2 children)

    by Anonymous Coward on Sunday January 28 2018, @09:04PM (#629571)

    A p-value less than 3.0 is completely worthless. I mean, statistics is just a bunch of voodoo anyway, just like economics. And psychology? Well, let's not go there...

    • (Score: 1, Funny) by Anonymous Coward on Sunday January 28 2018, @11:45PM (1 child)

      by Anonymous Coward on Sunday January 28 2018, @11:45PM (#629634)

      Flamebait?! What, you can't say fuck on the internet anymore? What's the matter with you people?

      Oh! I get it. I was modded by a statistician. Getting a bit sensitive are we?

      • (Score: 2) by kazzie on Monday January 29 2018, @08:08AM

        by kazzie (5309) Subscriber Badge on Monday January 29 2018, @08:08AM (#629741)

        The probability that you were modded down by a statistician is... er... let's not go there.

  • (Score: 1, Troll) by Anonymous Coward on Sunday January 28 2018, @09:08PM (6 children)

    by Anonymous Coward on Sunday January 28 2018, @09:08PM (#629572)

    Stats are super delicate beast. But if you had the quantitative mind to work proper stats, you wouldn't be doing psychs/social science in the first place, would you.

    • (Score: 4, Touché) by MichaelDavidCrawford on Monday January 29 2018, @12:05AM (5 children)

      by MichaelDavidCrawford (2339) <mdcrawford@gmail.com> on Monday January 29 2018, @12:05AM (#629640) Homepage Journal

      Without a doubt psych and social will one day be quantitative

      Consider the origin of chemistry

      --
      My United States Social Security Number Is 518-92-8663
      • (Score: 0) by Anonymous Coward on Monday January 29 2018, @12:23AM (4 children)

        by Anonymous Coward on Monday January 29 2018, @12:23AM (#629646)

        Origin of chemistry is cooking. What's your point?

        • (Score: 3, Informative) by MichaelDavidCrawford on Monday January 29 2018, @01:21AM (2 children)

          by MichaelDavidCrawford (2339) <mdcrawford@gmail.com> on Monday January 29 2018, @01:21AM (#629662) Homepage Journal

          Tkdfrbjf gb

          --
          My United States Social Security Number Is 518-92-8663
          • (Score: -1, Troll) by Anonymous Coward on Monday January 29 2018, @06:52AM (1 child)

            by Anonymous Coward on Monday January 29 2018, @06:52AM (#629729)

            People here at SN indulge you because of your mental illness, but that schtick is done. Get back on med or GTFO.

            • (Score: 1, Insightful) by Anonymous Coward on Monday January 29 2018, @07:27PM

              by Anonymous Coward on Monday January 29 2018, @07:27PM (#629979)

              Actually "Tkdfrbjf gb" is about all one can say to an idiot with little knowledge of history but an insufferable ego that thinks it has all the answers. I found it to be quite appropriate, insightful even.

        • (Score: 2) by darkfeline on Tuesday January 30 2018, @04:17AM

          by darkfeline (1030) on Tuesday January 30 2018, @04:17AM (#630182) Homepage

          The origin of chemistry is alchemy.

  • (Score: -1, Flamebait) by Anonymous Coward on Sunday January 28 2018, @09:29PM

    by Anonymous Coward on Sunday January 28 2018, @09:29PM (#629584)

    "I just allowed anyone who wanted to join and did not approach any famous scientists."

    So he did not approach the Runaway1956? Major methodological malfeasance, Monty!!

  • (Score: 4, Interesting) by Anonymous Coward on Sunday January 28 2018, @09:57PM (3 children)

    by Anonymous Coward on Sunday January 28 2018, @09:57PM (#629598)

    I've seen them all, PhD students, Post-Docs, P.I.s and Professors failing at statistics... When asked about it you often get the answer "but researcher x is also doing it like that". The same here... the p-value isn't that important. Statistics is a tool, it is a process to analyse your data, to get to know it, see where the errors are in your data and see if you could apply methods to get around those flaws. The statistical tests are done during that analysis, not a result of it. They, at first, have to convince YOU as a scientist to see if the collected data is usable and trust them enough to support your hypothesis. The calculated p-value gives YOU a clue about how much to trust that data, not as a cut-off point for accepting it or not.

    • (Score: 3, Interesting) by bradley13 on Monday January 29 2018, @09:00AM (2 children)

      by bradley13 (3053) Subscriber Badge on Monday January 29 2018, @09:00AM (#629750) Homepage Journal

      Exactly: statistics is a tool. p 0.05 is just fine for most things, as long as you defined your hypothesis ahead of time. The problem comes with things like p-hacking, where people runs all sorts of random correlations, find one that just happens to be p 0.05, and claim that as a useful result. Setting the standard threshold to p 0.005 won't change abuses like that - it just makes them a bit more difficult. However, for legitimate researchers who know what they're doing, it will make life unnecessary more difficult.

      --
      Everyone is somebody else's weirdo.
      • (Score: 0) by Anonymous Coward on Monday January 29 2018, @09:42AM (1 child)

        by Anonymous Coward on Monday January 29 2018, @09:42AM (#629756)

        There aren't many good researchers in the social sciences, anyway. You can tell because the good ones will have little else to say than, "More research is needed." The social sciences try to measure things that are entirely subjective and largely unprovable (like how "happy" someone is), and therefore can almost never be truly conclusive. But even the honest researchers will have their studies misrepresented by the media to serve some ridiculous agenda, so we're doomed.

        • (Score: 3, Funny) by TheRaven on Monday January 29 2018, @11:47AM

          by TheRaven (270) on Monday January 29 2018, @11:47AM (#629785) Journal
          That's not limited to social sciences. In computer science, our statistical errors are largely irrelevant because our experimental errors are often an order of magnitude higher. Different cache sizes, code layouts, small changes to pipeline structures, and so on can have a ±50% effect on the results, so arguing about a 0.5% error from misapplied statistics is largely irrelevant. And, yes, I do get cranky every time I read a CGO paper that shows a speedup that's well within the margins of experimental error and doesn't even try to apply statistics (we ran the experiment 10 times, discarded the outliers with no explanation, and present the mean with no indication of distribution, and our algorithm gives a 15% speedup on average, computed by subtracting the before and after values from each benchmark in a cherry-picked subset of SPEC and averaging them).
          --
          sudo mod me up
  • (Score: 3, Funny) by Anonymous Coward on Sunday January 28 2018, @10:02PM (2 children)

    by Anonymous Coward on Sunday January 28 2018, @10:02PM (#629602)

    Are 100 scientists from around the world a large enough sample size?

    • (Score: 2, Offtopic) by realDonaldTrump on Monday January 29 2018, @12:07AM

      by realDonaldTrump (6614) Subscriber Badge on Monday January 29 2018, @12:07AM (#629641) Homepage Journal

      Daniël from Holland is disappointed. He doesn't know about our Senate. Our Senate is a joke. We have 100 individuals in there, they're not loyal to their parties. They don't represent the will of the American people. And they can't make up their own minds! So we have a PART TIME government, it doesn't work on the weekends. Or Monday mornings.

      The polls, they say I have the most loyal people. Did you ever see that? Where I could stand in the middle of the Senate and shoot somebody and I wouldn’t lose any voters, okay? It’s, like, incredible.

      --
      Text TRUMP to 88022 to join the 🚂 #TrumpTrain [facebook.com]
    • (Score: 1) by DECbot on Monday January 29 2018, @04:14PM

      by DECbot (832) on Monday January 29 2018, @04:14PM (#629852) Journal

      I have fuzzy recollections of a paper involving "grumpy old men," "damn kids," and "my lawn!" that concluded that a sample size of 1 is the minimum necessary to represent the views of like minded individuals. The real challenge was proving they were all liked minded. Tell me, these 100 scientist, what do they think about their lawn?

      --
      cats~$ sudo chown -R us /home/base
  • (Score: 5, Informative) by AthanasiusKircher on Sunday January 28 2018, @10:33PM

    by AthanasiusKircher (5291) Subscriber Badge on Sunday January 28 2018, @10:33PM (#629617) Journal

    From page 15 of the preprint:

    We have two key recommendations. First, we recommend that the label “statistically significant” should no longer be used. Instead, researchers should provide more meaningful interpretations of the theoretical or practical relevance of their results. Second, authors should transparently specify—and justify—their design choices. Depending on their choice of statistical approach, these may include the alpha level, the null and alternative models, assumed prior odds, statistical power for a specified effect size of interest, the sample size, and/or the desired accuracy of estimation. We do not endorse a single value for any design parameter, but instead propose that authors justify their choices before data are collected. Fellow researchers can then evaluate these decisions, ideally also prior to data collection, for example, by reviewing a Registered Report submission. Providing researchers (and reviewers) with accessible information about ways to justify (and evaluate) design choices, tailored to specific research areas, will improve current research practices.

    This all sounds eminently reasonable. Focus on a single statistical parameter is never a good thing to determine whether a result is meaningful. People can p-hack at 0.005 just as they have at 0.05. If you think that's harder, you haven't thought about how easy it is in psychology when you're measuring a bunch of parameters and now just have to find a few more interesting ways to create combinations of data points that are hackable. I've seen plenty of studies which have claimed p thresholds of 0.005 or 0.001 or even more, but it's clear upon closer examination that they got those results through a combination of stuff like p-hacking, bad data collection, bad interpretation, biasing (either conscious or unconscious) the experimental design or calculation of results, etc.

    So, focusing on a variety of stats that may or may not have particular relevance to a particular situation is good. Even better is the call for review of statistical standards BEFORE data collection. If you set your thresholds for significance or whatever, outline exactly how you plan to collect and manipulate the data, etc. IN ADVANCE, it's a lot harder (short of outright fabrication of data) to "massage" things to find something of supposed "significance." (And please note that a lot of this is likely done unintentionally: people just don't have an intuitive sense of how different stats or types of analysis may suddenly alter the thresholds or ease with which they can appear to have an interesting result.)

    Sure, it's hard to set up these sorts of things for thorough statistical review in advance for an exploratory study where you're not quite sure what you may find. But in that case, you can be honest about how vague the results are -- and then follow-up studies with more rigor can be designed if some preliminary finding seems worthwhile. The point is being transparent about how the statistical standards were created for a particular study and how they were then applied to the data and interpreted.

  • (Score: 4, Insightful) by opinionated_science on Sunday January 28 2018, @11:34PM (6 children)

    by opinionated_science (4031) on Sunday January 28 2018, @11:34PM (#629630)

    When used properly statistics are a beautiful tool to observer the world around us.

    But you must factor in the sample size, and the balance of probabilities that data is correct.

    If you don't know intimately what Bayesian or the Central limit theorems describe, quit trying to comment now - that is the ground floor in analysis...

    • (Score: 2) by deadstick on Monday January 29 2018, @12:12AM

      by deadstick (5110) on Monday January 29 2018, @12:12AM (#629642)

      Statistics is like dynamite. Use it properly, and you can move mountains. Use it improperly, and the mountain will come down on you.

    • (Score: 3, Interesting) by FatPhil on Monday January 29 2018, @12:27AM

      by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Monday January 29 2018, @12:27AM (#629648) Homepage
      But you don't need to know how to do any analysis if you never need to do any analysis because you've not passed the earlier hurdle that lets you have data to analyse! If you aren't familiar with Simpson's Paradox, and Reversion to Mean - you shouldn't even be collecting the data!
      --
      Life is a precious commodity. A wise investor would get rid of it when it has the highest value.
    • (Score: 5, Interesting) by requerdanos on Monday January 29 2018, @12:29AM (2 children)

      by requerdanos (5997) Subscriber Badge on Monday January 29 2018, @12:29AM (#629650) Journal

      If you don't know intimately what Bayesian or the Central limit theorems describe, quit trying to comment now - that is the ground floor in analysis...

      While based on solid information, this isn't necessarily good advice.

      There are disciplines involved here other than probability theory, even though probability theory is at the root of what's going on.

      Specifically:

      Reporters (from "credentialed journalist" all the way to "dude I have this great science blog") form a group that needs desperately to understand how to read a scientific study and interface with, be the recipients of, the information calculated through probability theory.

      Random idiots (from "I am pretty smart, and I like to weigh information carefully" all the way to "wow I better forward this clickbait to everyone just in case it's true") form another group, a further step removed, that need to learn how to call bullshit on the Reporters who say "New Study: Green Jelly Beans Linked To Acne [explainxkcd.com]" instead of blindly parroting what they say*.

      Also involved are everyone else (all over the spectrum) who might be affected, which includes just about everyone.

      I want to hear quality comments** from people who represent them, and from people who have useful advice for them.

      ----------
      *Which has resulted in large numbers of people believing headlines, in sequence, of "New Study: Coffee Bad For You," "New Study: Coffee Not Bad For You After All," "New Study: Coffee Bad For You," "New Study: Coffee Not Bad For You After All," "New Study: Coffee Bad For You," "New Study: Coffee Not Bad For You After All," "New Study: Coffee Bad For You," "New Study: Coffee Not Bad For You After All." Despite it being impossible for two opposite oversimplifications to be true in the same universe.

      ** No. If you have to ask, then that would not be a quality comment. Thank you.

      • (Score: 2) by tfried on Monday January 29 2018, @10:27AM

        by tfried (5534) on Monday January 29 2018, @10:27AM (#629762)

        There are disciplines involved here other than probability theory

        I second that, but I think you forgot the most important example: (Quasi-) experimental design. Statistical analysis may tell you that you (probably) have some non-random relation in your data, but it won't tell you why that relation is in your data. Is it what you think it is? Or is it just some source of bias in your data collection, your measurements not reflecting what you think they do, some correlation with a third variable that you forgot about (or chose to ignore), regression to the mean, temporary effects, ...

        Alpha errors are an annoying source of noise in the discourse, but at least it's relatively easy to identify any papers that are in danger, here (admittedly it may be harder to identify alpha error inflation due to excessive comparisons that are not necessarily reported). But I'll bet, even at today's "significance levels", alpha errors are outnumbered three to one by non-numerical screw-ups among published papers.

        Heck, at least today there is a tiny bit of awareness that maybe a published result should be taken with care, until confirmed by several independent people using independent measurements and designs. I'm afraid, all the current discussion will yield is a "solution" (be it stricter p-values or something, anything, else) that will make everybody feel safe and "correct", without actually helping at all (but probably raising the entry barriers to low budget independent studies).

      • (Score: 2) by opinionated_science on Monday January 29 2018, @02:31PM

        by opinionated_science (4031) on Monday January 29 2018, @02:31PM (#629807)

        the media has problem with all maths - they often quote numbers with no reference to the mean and variance of the distribution.

        This is middle school level maths, and an interesting proxy to why so much bad stuff goes on,contrary to the data.

        The fact that many respectable journals can barely keep the statistics correct, suggests this is quite widespread...

    • (Score: 0) by Anonymous Coward on Monday January 29 2018, @07:39PM

      by Anonymous Coward on Monday January 29 2018, @07:39PM (#629986)

      Your username isn't joking around!!

  • (Score: 2) by Entropy on Monday January 29 2018, @01:27AM (2 children)

    by Entropy (4228) on Monday January 29 2018, @01:27AM (#629663)

    So someone couldn't find a statistically significant relationship between something that they really wanted to, and they try to re-define what statistically significant is instead of just accepting what they found. That's how I read that load of crap, anyway.

    • (Score: 2) by c0lo on Monday January 29 2018, @07:09AM

      by c0lo (156) on Monday January 29 2018, @07:09AM (#629733)

      Sounds like someone couldn't find a relationship..
      So someone couldn't find a statistically significant relationship between something that they really wanted to, and they try to re-define what statistically significant is instead of just accepting what they found. That's how I read that load of crap, anyway.

      Ah, it started so promising.
      Like a new hypothesis on the (inverse) correlation between the chance of the author to get a date IRL and the "number of useless scientific papers" or "time spent in science activism".
      What a letdown.

    • (Score: 0) by Anonymous Coward on Monday January 29 2018, @05:01PM

      by Anonymous Coward on Monday January 29 2018, @05:01PM (#629874)

      The 0.05 to 0.005 move was actually designed to do EXACTLY the opposite of this. That is, it was designed to make it harder to infer fake relationships. For comparison, the 5-sigma standard used in physics is 0.0000006, a much harder standard to meet.

(1)