Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Friday June 19 2015, @06:47AM   Printer-friendly
from the big-data-little-analysis dept.

Dramatic increases in data science education coupled with robust evidence-based data analysis practices could stop the scientific research reproducibility and replication crisis before the issue permanently damages science's credibility, asserts Roger D. Peng in an article in the newly released issue of Significance magazine.

"Much the same way that epidemiologist John Snow helped end a London cholera epidemic by convincing officials to remove the handle of an infected water pump, we have an opportunity to attack the crisis of scientific reproducibility at its source," wrote Peng, who is associate professor of biostatistics at the Johns Hopkins Bloomberg School of Public Health.

In his article titled "The Reproducibility Crisis in Science"—published in the June issue of Significance, a statistics-focused, public-oriented magazine published jointly by the American Statistical Association (ASA) and Royal Statistical Society—Peng attributes the crisis to the explosion in the amount of data available to researchers and their comparative lack of analytical skills necessary to find meaning in the data.

"Data follow us everywhere, and analyzing them has become essential for all kinds of decision-making. Yet, while our ability to generate data has grown dramatically, our ability to understand them has not developed at the same rate," he wrote.

This analytics shortcoming has led to some significant "public failings of reproducibility," as Peng describes them, across a range of scientific disciplines, including cancer genomics, clinical medicine and economics.

The original article came from phys.org.

[Related]: Big Data - Overload


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @07:39AM

    by Anonymous Coward on Friday June 19 2015, @07:39AM (#198145)

    What got me was (paraphrasing): " our ability to generate data has grown, while our ability to analyse has not".

    This was inevitable, IMNSHO, as soon as "publish or perish" put the number of articles above their content when determining a prime motivating factor (namely: will a scientist's work put food on the table).

    You want lots of articles? Whats the easiest way to get that?

    a) Write lots of articles proposing existing insights on a small, hard-to-generate data set (hint: will rightfully get you fired for copying others' work)

    b) Write lots of articles proposing new insights on a small, hard-to-generate data set (hint: this is actually extremely difficult, will get you fired for not writing lots of articles, rightfully or not)

    c) Generate (a.k.a. data mining) lots of articles proposing trivial insights on a small, hard-to-generate data set (hint: will rightfully get you fired for selling random fluctuations as insights)

    d) Write lots of articles proposing existing insights on huge, easily generated datasets (hint: same as for small datasets you will rightfully get fired, just a little later)

    e) Write lots of articles proposing new insights on huge, easily generated datasets (hint: this is actually even more extremely difficult than for small datasets - I'm looking at you CERN - but will not get you fired because the three organizations worldwide who are doing it actually have a clue. Unluckily, chances are you're not working for them so it's not putting food on *your* table)

    f) Generate (a.k.a. data mining) lots of articles proposing trivial insights on huge, easily generated datasets (hint: same as for small datatsets, but will suprisingly *not* get you fired because realizing you're providing c)-with-more-data, which would get you fired, instead of b) or e), which everybody wants, would require an additional instance of e), which nobody does anyway and would be more work for checking than for the actual work and such is not cost effective)

    So, which one is easiest for the average scientiest, i.e. about three quarters of total? Right. The option with the worst outcome.

    I even have a solution for you that sounds easy, although it most certainly has its own set of problems: dear MBAs, no matter what you learned at school, science has no relation to a production line. So please stop using methods invented for the latter to manage the former.

    CAN YOU GET THAT THROUGH YOUR F****** HEADS?!?! (hint: if you do, which I very much doubt, noticable results will start showing up in 30-40 years of things getting better only very gradually)

  • (Score: 0) by Anonymous Coward on Friday June 19 2015, @07:54AM

    by Anonymous Coward on Friday June 19 2015, @07:54AM (#198150)

    Write one article on some data and submit it to many different journals over time. Keep changing the title and modifying the abstract and hope no one notices.

    I've known people who have made a career out of this.

    • (Score: 0) by Anonymous Coward on Friday June 19 2015, @07:57AM

      by Anonymous Coward on Friday June 19 2015, @07:57AM (#198153)

      That's c), albeit with the simplest imaginable data-mining algorithm.

  • (Score: 2) by kaszz on Friday June 19 2015, @02:54PM

    by kaszz (4211) on Friday June 19 2015, @02:54PM (#198258) Journal

    dear MBAs, no matter what you learned at school, science has no relation to a production line. So please stop using methods invented for the latter to manage the former.

    We have no metric for customers thinking our products sucks so it must not be important.. *droning away* ;-)

  • (Score: 1) by virens on Friday June 19 2015, @07:41PM

    by virens (5530) on Friday June 19 2015, @07:41PM (#198391)

    AC, you got the symptom right ("publish or perish"), but the rest of the post veers off the course. The problem is that the academic system itself consists of business slime and corporate-type morons who want everything to be immediately applicable. They even made TMT (thirty meter telescope) a fucking corporation! That's the problem: the system itself accepts only business slime. It is disgusting to see all those gantt charts, meetings, powerpoint presentations and other typical corporate moronity.

    The judgement of a scientist by the sheer number of publications is like judging an artist on the sheer number of paintings he produced. If that was true, we would never have seen Mona Lisa or The Sistine Chapel. And it only gets worse - that's why I'm desperately want out, oh I don't know, even to go as low as high-school teacher.

  • (Score: 1) by khallow on Friday June 19 2015, @11:03PM

    by khallow (3766) Subscriber Badge on Friday June 19 2015, @11:03PM (#198481) Journal

    dear MBAs, no matter what you learned at school, science has no relation to a production line.

    That is often false. Your CERN example is a research approach that looks a lot like a production line in several different ways.