Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.
"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.
In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.
"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.
This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.
The original article came from phys.org.
[Related]: Big Data - Overload
(Score: 2) by VortexCortex on Friday June 19 2015, @05:02PM
What other theory would predict that my outlandish experiment was incorrect but the opposite presumption arising from skepticism of the claim? The null hypothesis is merely healthy skepticism being formalized to ensure experiments are disprovable.
I'm not sure how the null hypothesis was explained to you, but all it really means is this: Correlation is not Causation. A null hypothesis is merely the companion to any hypothesis that states the hypothesis is false, that the causal relationship you're testing might not exist. Thus, to prove the conclusions drawn from a correlation (to test the prediction of a hypothesis) one must disprove the corresponding null hypothesis. This is how disprovability of science exists. In statistics the null hypothesis is what forces you to consider and eliminate other outside factors and verify the strength of a correlation. One must demonstrate that the hypothesis is significantly more likely to be true than false; That the hypothesis is more true than its null hypothesis is; That the correlation you've discovered is actually indicator of causation, and it's not some other factor.
For instance: My hypothesis is that stepping on cracks leads to broken backs. My experiment can show that whenever someone steps on a crack in the floor or pavement a short time later someone's back gets broken within a radius proportional to population density. Should science now accept the Broken Back Crack hypothesis as a proven theory? Without having to disprove the null hypothesis I can conclude that stepping on cracks really does break backs, I can show you the statistics of the events' correlation as evidence, and I can lobby for legislation mandating my friends at Crack Stuffer's Inc. be employed immediately via government contract in order to prevent public harm. The companion null hypothesis would say: Stepping on cracks doesn't lead to broken backs. Shouldn't it be required this is disproved via at least eliminating other likely causes prior to accepting my original hypothesis as true? One way to disprove the null hypothesis is by proving an Alternate Hypothesis correct, e.g., an Alternate Hypothesis can be proposed that broken backs are caused by direct impact trauma to the spine, and demonstrate that broken backs are far more strongly correlated thereto. Or perhaps the null hypothesis can be restated as, "Stepping on cracks is correlated as strongly to breaking of backs as it is to any other event in nature", and a series of Alternate Hypotheses would seek to demonstrate that "Stepping on cracks is as strongly or less correlated to broken backs as stepping on cracks is correlated to causing trees to fall, doors to shut, or a person to sneeze, fart or queef."
A null hypothesis can not prove a causal claim on its own, it only exists to disprove claims. My experiment that shows stepping on cracks is as likely to cause farts or queefs as to cause a broken back should not be considered to support the causal relation or result in legislation requiring back injury precautions be made immediately available to those demonstrating a misfortunate gaseous expulsion or accidental crack stepping to their health care provider. Sometimes disproving a null hypothesis will lead to a discovery, but that discovery must then be tested again and it will have its own null hypothesis to disprove.
Without the null hypothesis needing to be disproved I can demonstrate the correlation between any two randomly selected events, and you can perform repeated experiments that will seem to verify my bogus claims. However, if the correlation is no stronger than between any other proposed cause or effect then the correlation demonstrated is statistically insignificant (it has not disproven the null hypothesis). All the null hypothesis means is: Correlation is not causation. It doesn't get much simpler than this.
(Score: 3, Funny) by VortexCortex on Friday June 19 2015, @05:11PM
One way to disprove the null hypothesis is by proving an Alternate Hypothesis correct
Should read: One way to disprove the hypothesis is by proving no Alternate Hypothesis is correct.
(Score: 0) by Anonymous Coward on Friday June 19 2015, @06:03PM
Thanks for the response, I will read you post in more detail later. For now I note that you seem to be assuming that correlations are rare (eg between stepping on cracks and broken backs). Common sense tells me that, where there are more cracks to step on, people are more likely to trip and have broken backs. So I consider your null hypothesis to be implausible. Rejecting it offers no information about the validity of your original theory.
This assumption that most things are uncorrelated is dead wrong in the case of living things. Everything is cycling daily, monthly, yearly, etc (check out human birth rates, death rates). The default assumption should be that everything is correlated with each other, if only via the earth's cycles.
Finding a correlation tells you nothing more than that you spent enough money to get enough data to see it. Your theory needs to predict a specific relationship with a certain shape/magnitude/whatever it is you think. A theory that can only predict "some relationship" is worthless.
(Score: 0) by Anonymous Coward on Friday June 19 2015, @06:18PM
I should also note that this issue of correlations was noted by Meehl many decades ago with regards to psychology (his area of expertise):
Meehl, Paul E. (1967). "Theory-Testing in Psychology and Physics: A Methodological Paradox". Philosophy of Science 34 (2): 103–115. doi:10.1086/288135 http://mres.gmu.edu/pmwiki/uploads/Main/Meehl1967.pdf [gmu.edu] [gmu.edu]
Thinking about it more. The idea that correlations are rare is in conflict with the universal law of gravitation:
https://en.wikipedia.org/wiki/Newton%27s_law_of_universal_gravitation [wikipedia.org]
So basing your reasoning off a method that assumes useful information can be gathered from the mere existence of a correlation is in conflict with both the evidence, and our most cherished and successful physical theories. It is pseudoscience.
(Score: 0) by Anonymous Coward on Friday June 19 2015, @09:47PM
What the fuck does gravity have to do with correlations? You're seeing all kinds of correlations that don't exist and working with lots of assumptions that don't match with reality, that must be why you're so confused about this.
(Score: 0) by Anonymous Coward on Friday June 19 2015, @10:06PM
The gravity example shows that it is commonly believed that everything in the universe affects everything else, however minutely. This idea has been put forth in other ways as well:
https://en.wikipedia.org/wiki/Tobler%27s_first_law_of_geography [wikipedia.org]
I assure you, I am not the confused one.