Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 9 submissions in the queue.
posted by cmn32480 on Friday June 19 2015, @06:47AM   Printer-friendly
from the big-data-little-analysis dept.

Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.

"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.

In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.

"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.

This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.

The original article came from phys.org.

[Related]: Big Data - Overload


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by VortexCortex on Friday June 19 2015, @05:02PM

    by VortexCortex (4067) on Friday June 19 2015, @05:02PM (#198321)

    What other theory would predict that my outlandish experiment was incorrect but the opposite presumption arising from skepticism of the claim? The null hypothesis is merely healthy skepticism being formalized to ensure experiments are disprovable.

    I'm not sure how the null hypothesis was explained to you, but all it really means is this: Correlation is not Causation. A null hypothesis is merely the companion to any hypothesis that states the hypothesis is false, that the causal relationship you're testing might not exist. Thus, to prove the conclusions drawn from a correlation (to test the prediction of a hypothesis) one must disprove the corresponding null hypothesis. This is how disprovability of science exists. In statistics the null hypothesis is what forces you to consider and eliminate other outside factors and verify the strength of a correlation. One must demonstrate that the hypothesis is significantly more likely to be true than false; That the hypothesis is more true than its null hypothesis is; That the correlation you've discovered is actually indicator of causation, and it's not some other factor.

    For instance: My hypothesis is that stepping on cracks leads to broken backs. My experiment can show that whenever someone steps on a crack in the floor or pavement a short time later someone's back gets broken within a radius proportional to population density. Should science now accept the Broken Back Crack hypothesis as a proven theory? Without having to disprove the null hypothesis I can conclude that stepping on cracks really does break backs, I can show you the statistics of the events' correlation as evidence, and I can lobby for legislation mandating my friends at Crack Stuffer's Inc. be employed immediately via government contract in order to prevent public harm. The companion null hypothesis would say: Stepping on cracks doesn't lead to broken backs. Shouldn't it be required this is disproved via at least eliminating other likely causes prior to accepting my original hypothesis as true? One way to disprove the null hypothesis is by proving an Alternate Hypothesis correct, e.g., an Alternate Hypothesis can be proposed that broken backs are caused by direct impact trauma to the spine, and demonstrate that broken backs are far more strongly correlated thereto. Or perhaps the null hypothesis can be restated as, "Stepping on cracks is correlated as strongly to breaking of backs as it is to any other event in nature", and a series of Alternate Hypotheses would seek to demonstrate that "Stepping on cracks is as strongly or less correlated to broken backs as stepping on cracks is correlated to causing trees to fall, doors to shut, or a person to sneeze, fart or queef."

    A null hypothesis can not prove a causal claim on its own, it only exists to disprove claims. My experiment that shows stepping on cracks is as likely to cause farts or queefs as to cause a broken back should not be considered to support the causal relation or result in legislation requiring back injury precautions be made immediately available to those demonstrating a misfortunate gaseous expulsion or accidental crack stepping to their health care provider. Sometimes disproving a null hypothesis will lead to a discovery, but that discovery must then be tested again and it will have its own null hypothesis to disprove.

    Without the null hypothesis needing to be disproved I can demonstrate the correlation between any two randomly selected events, and you can perform repeated experiments that will seem to verify my bogus claims. However, if the correlation is no stronger than between any other proposed cause or effect then the correlation demonstrated is statistically insignificant (it has not disproven the null hypothesis). All the null hypothesis means is: Correlation is not causation. It doesn't get much simpler than this.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Funny) by VortexCortex on Friday June 19 2015, @05:11PM

    by VortexCortex (4067) on Friday June 19 2015, @05:11PM (#198326)

    One way to disprove the null hypothesis is by proving an Alternate Hypothesis correct

    Should read: One way to disprove the hypothesis is by proving no Alternate Hypothesis is correct.

  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @06:03PM

    by Anonymous Coward on Friday June 19 2015, @06:03PM (#198345)

    Thanks for the response, I will read you post in more detail later. For now I note that you seem to be assuming that correlations are rare (eg between stepping on cracks and broken backs). Common sense tells me that, where there are more cracks to step on, people are more likely to trip and have broken backs. So I consider your null hypothesis to be implausible. Rejecting it offers no information about the validity of your original theory.

    This assumption that most things are uncorrelated is dead wrong in the case of living things. Everything is cycling daily, monthly, yearly, etc (check out human birth rates, death rates). The default assumption should be that everything is correlated with each other, if only via the earth's cycles.

    Finding a correlation tells you nothing more than that you spent enough money to get enough data to see it. Your theory needs to predict a specific relationship with a certain shape/magnitude/whatever it is you think. A theory that can only predict "some relationship" is worthless.

  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @06:18PM

    by Anonymous Coward on Friday June 19 2015, @06:18PM (#198350)

    I should also note that this issue of correlations was noted by Meehl many decades ago with regards to psychology (his area of expertise):

    Everyone familiar with psychological research knows that numer- ous “puzzling, unexpected” correlations pop up all the time, and that it requires only a moderate amount of motivation-plus-ingenuity to construct very plausible alternative theoretical explanations for them.

    These armchair considerations are borne out by the finding that in psychological and sociological investigations involving very large numbers of subjects, it is regularly found that almost all correlations or differences between means are statisti- cally significant. See, for example, the papers by Bakan [1] and Nunnally [8]. Data currently being analyzed by Dr. David Lykken and myself, derived from a huge sample of over 55,000 Minnesota high school seniors, reveal statistically significant relationships in 91% of pairwise associations among a congeries of 45 miscel-laneous variables such as sex, birth order, religious preference, number of siblings, vocational choice, club membership, college choice, mother’s education, dancing, interest in woodworking, liking for school, and the like. The 9% of non-significant associations are heavily concentrated among a small minority of variables having dubious reliability, or involving arbitrary groupings of non-homogeneous or non-monotonic sub-categories. The majority of variables exhibited significant relation- ships with all but three of the others, often at a very high confidence level (p [less than] 10–6).

    Meehl, Paul E. (1967). "Theory-Testing in Psychology and Physics: A Methodological Paradox". Philosophy of Science 34 (2): 103–115. doi:10.1086/288135 http://mres.gmu.edu/pmwiki/uploads/Main/Meehl1967.pdf [gmu.edu] [gmu.edu]

    Thinking about it more. The idea that correlations are rare is in conflict with the universal law of gravitation:

    Every point mass attracts every single other point mass by a force pointing along the line intersecting both points.

    https://en.wikipedia.org/wiki/Newton%27s_law_of_universal_gravitation [wikipedia.org]

    So basing your reasoning off a method that assumes useful information can be gathered from the mere existence of a correlation is in conflict with both the evidence, and our most cherished and successful physical theories. It is pseudoscience.

    • (Score: 0) by Anonymous Coward on Friday June 19 2015, @09:47PM

      by Anonymous Coward on Friday June 19 2015, @09:47PM (#198447)

      What the fuck does gravity have to do with correlations? You're seeing all kinds of correlations that don't exist and working with lots of assumptions that don't match with reality, that must be why you're so confused about this.

      • (Score: 0) by Anonymous Coward on Friday June 19 2015, @10:06PM

        by Anonymous Coward on Friday June 19 2015, @10:06PM (#198454)

        The gravity example shows that it is commonly believed that everything in the universe affects everything else, however minutely. This idea has been put forth in other ways as well:

        "Everything is related to everything else, but near things are more related than distant things."

        https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography [wikipedia.org]

        I assure you, I am not the confused one.