Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Friday June 19 2015, @06:47AM   Printer-friendly
from the big-data-little-analysis dept.

Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.

"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.

In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.

"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.

This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.

The original article came from phys.org.

[Related]: Big Data - Overload


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @02:57PM

    by Anonymous Coward on Friday June 19 2015, @02:57PM (#198260)

    I meant: Why is the alternative hypothesis the one I am interested in. Why not set the null hypothesis to that? This would make much more sense, but I was taught to do the opposite.

  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @09:34PM

    by Anonymous Coward on Friday June 19 2015, @09:34PM (#198441)

    I meant: Why is the alternative hypothesis the one I am interested in.

    Because its the one you want to test.

    Why not set the null hypothesis to that?

    Because then it wouldn't be a "null" hypothesis. The "null" hypothesis is the hypothesis that the variable or whatever it is you're looking at has no effect, while the "alternate" hypothesis (the other hypothesis, the non-null one) is the hypothesis that the variable or whatever you're looking at does have an effect.

    Your teacher must have been terrible if you don't even understand what a null hypothesis is supposed to be.

    • (Score: 0) by Anonymous Coward on Friday June 19 2015, @10:03PM

      by Anonymous Coward on Friday June 19 2015, @10:03PM (#198451)

      The null hypothesis is the hypothesis to be nullified. It can be anything. In practice it has come to be usually "no effect, no correlation" (this is called the "nil null" hypothesis) and that is the problem. The null hypothesis is nearly always false in that case, the only time it isn't is if you are looking for differences regarding something that does not exist (eg ESP). It is easily proved that two groups of people did not come from the same hypothetical infinite distribution. The infinite hypothetical distribution is not real, therefore the two groups could not have been sampled from it.

      If you mess up the experiment, data analysis, or data entry, it will exaggerate this. If there are any differences at baseline, it will exaggerate the magnitude of the deviation from the nil null hypothesis. If there are non-causative correlations in play, it will exaggerate it. It is false to begin with and literally anything else will magnify that. Rejecting the nil null hypothesis contains no useful information. If you think it does, you are confused. This idea has caused mass confusion, just as Ronald Fisher predicted it would. And he is partly to blame, for popularizing the p-value!

      "We are quite in danger of sending highly trained and highly intelligent young men out into the world with tables of erroneous numbers under their arms, and with a dense fog in the place where their brains ought to be. In this century, of course, they will be working on guided missiles and advising the medical profession on the control of disease, and there is no limit to the extent to which they could impede every sort of national effort."

        Fisher, R N (1958). "The Nature of Probability" (PDF). Centennial Review 2: 261–274. http://www.york.ac.uk/depts/maths/histstat/fisher272.pdf [york.ac.uk]

      This guy who came up with this nil null idea is likely the same one who invented the ACT. It appears he was writing an introductory textbook to stats for educators in the late 1930s and got confused between two different approaches to statistics (Neyman-Pearson's and Fisher's). I recommend anyone interested in science put some effort into learning about this.