Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Friday June 19 2015, @06:47AM   Printer-friendly
from the big-data-little-analysis dept.

Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.

"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.

In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.

"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.

This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.

The original article came from phys.org.

[Related]: Big Data - Overload


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2, Insightful) by khallow on Friday June 19 2015, @10:04PM

    by khallow (3766) Subscriber Badge on Friday June 19 2015, @10:04PM (#198453) Journal

    Even without that, the process of testing a null hypothesis makes no sense*.

    We don't always know enough to make theories. But collecting data is a much lower threshold. Ultimately, null hypothesis testing is about attempting to answer "Is there something interesting happening here?" One doesn't need a theory to see something of note.

    Starting Score:    1  point
    Moderation   +1  
       Insightful=1, Total=1
    Extra 'Insightful' Modifier   0  

    Total Score:   2  
  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @11:16PM

    by Anonymous Coward on Friday June 19 2015, @11:16PM (#198485)

    One doesn't need a theory to see something of note.

    One also doesn't need to test an arbitrary null hypothesis to see if a p-value is less than an arbitrary number. I am all for data collection and aggregation, for example check out Project Tycho: http://www.tycho.pitt.edu/index.php. [pitt.edu] I love the philosophy behind it, the usefulness, everything. That is a major contribution to science without any null hypotheses being tested. Just as important are the contributions of original people who collected the data (CDC, NNDS, etc). No need for anyone to decide "is there anything interesting". That comes later.

    • (Score: 2, Interesting) by khallow on Saturday June 20 2015, @02:09PM

      by khallow (3766) Subscriber Badge on Saturday June 20 2015, @02:09PM (#198676) Journal

      One also doesn't need to test an arbitrary null hypothesis to see if a p-value is less than an arbitrary number.

      But one can do that and sometimes useful things come as a result. Keep in mind that this sort of thing is the basis of a lot of "big data" research. You have a huge, incomprehensible pile of data and you want to find significant patterns in the data.

      • (Score: 0) by Anonymous Coward on Sunday June 21 2015, @07:20AM

        by Anonymous Coward on Sunday June 21 2015, @07:20AM (#198982)

        But one can do that and sometimes useful things come as a result.

        I searched. I wasted so much of my life (~2 years) searching. You apparently know an obvious example. What it is it?

        • (Score: 1) by khallow on Monday June 22 2015, @12:27AM

          by khallow (3766) Subscriber Badge on Monday June 22 2015, @12:27AM (#199233) Journal

          You apparently know an obvious example.

          Marketing at a retail store. They have years of transactions by thousands of customers. While you can go through the effort of making formal hypotheses and then testing them, it's far more timely to start by filtering for correlations that break the null hypothesis and go from that point to find real patterns of buyer behavior.

  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @11:27PM

    by Anonymous Coward on Friday June 19 2015, @11:27PM (#198489)

    I forgot to mention: All accurate data is interesting, if a detailed report of the methodology is included alongside.