Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Friday June 19 2015, @06:47AM   Printer-friendly
from the big-data-little-analysis dept.

Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.

"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.

In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.

"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.

This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.

The original article came from phys.org.

[Related]: Big Data - Overload


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @07:44AM

    by Anonymous Coward on Friday June 19 2015, @07:44AM (#198146)

    I'm very much afraid your conclusion is correct, but your cause is by far not the only one leading ti that result. p-tweaking would come to mind as another significant problem, as well as observation (or publication) bias and I'm sure a competent statistician could name ten more without even thinking.

  • (Score: 1, Interesting) by Anonymous Coward on Friday June 19 2015, @07:56AM

    by Anonymous Coward on Friday June 19 2015, @07:56AM (#198152)

    I'm not sure I explained my point fully. P-tweaking/hacking/whatever is a red herring. Even without that, the process of testing a null hypothesis makes no sense*. That is just more BS stacked on BS. Publication bias is similarly only a problem if the way you test your theories checking a null hypothesis. Require precise a priori predictions that are independently tested, then publication bias is no problem. There is no reason for people to be publishing every little thing they do, let alone doing that and spreading misinformation by trying to make it seem important. This was all explained long ago for those who find the time to check:

    Meehl, Paul E. (1967). "Theory-Testing in Psychology and Physics: A Methodological Paradox". Philosophy of Science 34 (2): 103–115. doi:10.1086/288135
    http://mres.gmu.edu/pmwiki/uploads/Main/Meehl1967.pdf [gmu.edu]

    *If the hypothesis to be nullified is predicted by a theory then the concept is fine. The Meehl paper is about that.

    • (Score: 2, Insightful) by khallow on Friday June 19 2015, @10:04PM

      by khallow (3766) Subscriber Badge on Friday June 19 2015, @10:04PM (#198453) Journal

      Even without that, the process of testing a null hypothesis makes no sense*.

      We don't always know enough to make theories. But collecting data is a much lower threshold. Ultimately, null hypothesis testing is about attempting to answer "Is there something interesting happening here?" One doesn't need a theory to see something of note.

      • (Score: 0) by Anonymous Coward on Friday June 19 2015, @11:16PM

        by Anonymous Coward on Friday June 19 2015, @11:16PM (#198485)

        One doesn't need a theory to see something of note.

        One also doesn't need to test an arbitrary null hypothesis to see if a p-value is less than an arbitrary number. I am all for data collection and aggregation, for example check out Project Tycho: http://www.tycho.pitt.edu/index.php. [pitt.edu] I love the philosophy behind it, the usefulness, everything. That is a major contribution to science without any null hypotheses being tested. Just as important are the contributions of original people who collected the data (CDC, NNDS, etc). No need for anyone to decide "is there anything interesting". That comes later.

        • (Score: 2, Interesting) by khallow on Saturday June 20 2015, @02:09PM

          by khallow (3766) Subscriber Badge on Saturday June 20 2015, @02:09PM (#198676) Journal

          One also doesn't need to test an arbitrary null hypothesis to see if a p-value is less than an arbitrary number.

          But one can do that and sometimes useful things come as a result. Keep in mind that this sort of thing is the basis of a lot of "big data" research. You have a huge, incomprehensible pile of data and you want to find significant patterns in the data.

          • (Score: 0) by Anonymous Coward on Sunday June 21 2015, @07:20AM

            by Anonymous Coward on Sunday June 21 2015, @07:20AM (#198982)

            But one can do that and sometimes useful things come as a result.

            I searched. I wasted so much of my life (~2 years) searching. You apparently know an obvious example. What it is it?

            • (Score: 1) by khallow on Monday June 22 2015, @12:27AM

              by khallow (3766) Subscriber Badge on Monday June 22 2015, @12:27AM (#199233) Journal

              You apparently know an obvious example.

              Marketing at a retail store. They have years of transactions by thousands of customers. While you can go through the effort of making formal hypotheses and then testing them, it's far more timely to start by filtering for correlations that break the null hypothesis and go from that point to find real patterns of buyer behavior.

      • (Score: 0) by Anonymous Coward on Friday June 19 2015, @11:27PM

        by Anonymous Coward on Friday June 19 2015, @11:27PM (#198489)

        I forgot to mention: All accurate data is interesting, if a detailed report of the methodology is included alongside.

  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @08:15AM

    by Anonymous Coward on Friday June 19 2015, @08:15AM (#198156)

    Also, even without p-hacking, if you don't say exactly when you'll stop the study beforehand and rules for outliers/etc the p-value is uninterpretable even as a measure of the null hypothesis (which is usually pointless to test anyway, no the two groups of people/rats/cells did not come from the same hypothetical infinite distribution):

    There are many different intentions for generating the space of possible tnull values and, hence, many different p values and confidence intervals for a single set of data.

    http://www.indiana.edu/~kruschke/articles/Kruschke2012JEPG.pdf [indiana.edu]

    I like Krushke's idea of a ROPE (region of practical equivalence) but really it is of limited use except in cases where you are really, really confident that you are measuring the right thing.