Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.
"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.
In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.
"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.
This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.
The original article came from phys.org.
[Related]: Big Data - Overload
(Score: 0) by Anonymous Coward on Friday June 19 2015, @08:21AM
Why? The null hypothesis is the specific one. The alternative is so vague as to be useless...It could mean anything. Once again this was dealt with long ago by Paul Meehl:
Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology. Journal of Consulting and Clinical Psychology
1978, Vol. 46, 806-834.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.200.7648&rep=rep1&type=pdf [psu.edu]
(Score: 0) by Anonymous Coward on Friday June 19 2015, @10:44AM
If you didn't know why, then there would be no point in checking it, would there?
I mean, nobody comes up with hypotheses by throwing dice or something. Or at least if someone does, I'd see that as good reason not to trust him.
But of course how to come up with that hypothesis is not a subject of a statistics class. So as far as statistics is concerned that hypothesis is arbitrary.
It's just like Newtonian mechanics will happily let you calculate the trajectory of a moon at any given position and speed. Whether it makes sense to calculate that trajectory is not a question of Newtonian mechanics. As far as Newtonian mechanics is concerned, that position and speed is arbitrary.
Of course in practice you'll not insert arbitrary values, but values of the real moon, or values of a satellite that you want to put there. But that's not the business of Newtonian mechanics, that's the business of whatever field uses Newtonian mechanics for its calculations. An astronomer will be interested in the orbit of an existing object, a space agency may be more interested in the possible orbits of a satellite, and an artillery expert couldn't care less about orbits, but will want to calculate trajectories of projectiles on earth. All of them use Newtonian mechanics, but for each of them, what values make sense to put into it and what values don't is different.
With statistics, it's the same. Statistics doesn't care where your hypotheses or your data come from. If a certain hypothesis makes sense or not is not a matter of statistics, it is a matter of whatever field the statistics is applied to. Asking statistics to answer that question is wrong, and therefore asking for that question to be answered in a statistics course is wrong.
And as of your quote: If the theory tells me for every month of several years exactly the number of days it rains in that month, I'll certainly take the theory very serious even if it cannot tell me the exact dates of the rain.
Yes, telling for just one month is not convincing. But that's not because it's statistics, but because it's insufficient data.
(Score: 0) by Anonymous Coward on Friday June 19 2015, @02:57PM
I meant: Why is the alternative hypothesis the one I am interested in. Why not set the null hypothesis to that? This would make much more sense, but I was taught to do the opposite.
(Score: 0) by Anonymous Coward on Friday June 19 2015, @09:34PM
Because its the one you want to test.
Because then it wouldn't be a "null" hypothesis. The "null" hypothesis is the hypothesis that the variable or whatever it is you're looking at has no effect, while the "alternate" hypothesis (the other hypothesis, the non-null one) is the hypothesis that the variable or whatever you're looking at does have an effect.
Your teacher must have been terrible if you don't even understand what a null hypothesis is supposed to be.
(Score: 0) by Anonymous Coward on Friday June 19 2015, @10:03PM
The null hypothesis is the hypothesis to be nullified. It can be anything. In practice it has come to be usually "no effect, no correlation" (this is called the "nil null" hypothesis) and that is the problem. The null hypothesis is nearly always false in that case, the only time it isn't is if you are looking for differences regarding something that does not exist (eg ESP). It is easily proved that two groups of people did not come from the same hypothetical infinite distribution. The infinite hypothetical distribution is not real, therefore the two groups could not have been sampled from it.
If you mess up the experiment, data analysis, or data entry, it will exaggerate this. If there are any differences at baseline, it will exaggerate the magnitude of the deviation from the nil null hypothesis. If there are non-causative correlations in play, it will exaggerate it. It is false to begin with and literally anything else will magnify that. Rejecting the nil null hypothesis contains no useful information. If you think it does, you are confused. This idea has caused mass confusion, just as Ronald Fisher predicted it would. And he is partly to blame, for popularizing the p-value!
Fisher, R N (1958). "The Nature of Probability" (PDF). Centennial Review 2: 261–274. http://www.york.ac.uk/depts/maths/histstat/fisher272.pdf [york.ac.uk]
This guy who came up with this nil null idea is likely the same one who invented the ACT. It appears he was writing an introductory textbook to stats for educators in the late 1930s and got confused between two different approaches to statistics (Neyman-Pearson's and Fisher's). I recommend anyone interested in science put some effort into learning about this.