Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Friday June 19 2015, @06:47AM   Printer-friendly
from the big-data-little-analysis dept.

Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.

"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.

In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.

"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.

This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.

The original article came from phys.org.

[Related]: Big Data - Overload


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @06:18PM

    by Anonymous Coward on Friday June 19 2015, @06:18PM (#198350)

    I should also note that this issue of correlations was noted by Meehl many decades ago with regards to psychology (his area of expertise):

    Everyone familiar with psychological research knows that numer- ous “puzzling, unexpected” correlations pop up all the time, and that it requires only a moderate amount of motivation-plus-ingenuity to construct very plausible alternative theoretical explanations for them.

    These armchair considerations are borne out by the finding that in psychological and sociological investigations involving very large numbers of subjects, it is regularly found that almost all correlations or differences between means are statisti- cally significant. See, for example, the papers by Bakan [1] and Nunnally [8]. Data currently being analyzed by Dr. David Lykken and myself, derived from a huge sample of over 55,000 Minnesota high school seniors, reveal statistically significant relationships in 91% of pairwise associations among a congeries of 45 miscel-laneous variables such as sex, birth order, religious preference, number of siblings, vocational choice, club membership, college choice, mother’s education, dancing, interest in woodworking, liking for school, and the like. The 9% of non-significant associations are heavily concentrated among a small minority of variables having dubious reliability, or involving arbitrary groupings of non-homogeneous or non-monotonic sub-categories. The majority of variables exhibited significant relation- ships with all but three of the others, often at a very high confidence level (p [less than] 10–6).

    Meehl, Paul E. (1967). "Theory-Testing in Psychology and Physics: A Methodological Paradox". Philosophy of Science 34 (2): 103–115. doi:10.1086/288135 http://mres.gmu.edu/pmwiki/uploads/Main/Meehl1967.pdf [gmu.edu] [gmu.edu]

    Thinking about it more. The idea that correlations are rare is in conflict with the universal law of gravitation:

    Every point mass attracts every single other point mass by a force pointing along the line intersecting both points.

    https://en.wikipedia.org/wiki/Newton%27s_law_of_universal_gravitation [wikipedia.org]

    So basing your reasoning off a method that assumes useful information can be gathered from the mere existence of a correlation is in conflict with both the evidence, and our most cherished and successful physical theories. It is pseudoscience.

  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @09:47PM

    by Anonymous Coward on Friday June 19 2015, @09:47PM (#198447)

    What the fuck does gravity have to do with correlations? You're seeing all kinds of correlations that don't exist and working with lots of assumptions that don't match with reality, that must be why you're so confused about this.

    • (Score: 0) by Anonymous Coward on Friday June 19 2015, @10:06PM

      by Anonymous Coward on Friday June 19 2015, @10:06PM (#198454)

      The gravity example shows that it is commonly believed that everything in the universe affects everything else, however minutely. This idea has been put forth in other ways as well:

      "Everything is related to everything else, but near things are more related than distant things."

      https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography [wikipedia.org]

      I assure you, I am not the confused one.