Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Tuesday May 05 2015, @12:41AM   Printer-friendly
from the dwarf-gold-fever-infecting-standards dept.

A group of Cambridge (UK) computer scientists have set a new gold standard for openness and reproducibility in research by sharing the more than 200 GB of data and 20,000 lines of code behind their latest results - an unprecedented degree of openness in a peer-reviewed publication. The researchers hope that this new gold standard will be adopted by other fields, increasing the reliability of research results, especially for work which is publicly funded.

The researchers are presenting their results at a talk today at the 12th USENIX Symposium on Networked Systems Design and Implementation (NSDI) in Oakland, California.

In recent years there's been a great deal of discussion about so-called 'open access' publications - the idea that research publications, particularly those funded by public money, should be made publicly available.

Computer science has embraced open access more than many disciplines, with some publishers sub-licensing publications and allowing authors to publish them in open archives. However, as more and more corporations publish their research in academic journals, and as academics find themselves in a 'publish or perish' culture, the reliability of research results has come into question.

http://phys.org/news/2015-05-gold-standard.html

[Also Covered By]: http://www.eurekalert.org/pub_releases/2015-05/uoc-ngs043015.php

[Source]: http://www.cam.ac.uk/research/news/new-gold-standard-established-for-open-and-reproducible-research

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by zeigerpuppy on Tuesday May 05 2015, @01:09AM

    by zeigerpuppy (1298) on Tuesday May 05 2015, @01:09AM (#178890)

    there are some excellent resources for reproducible research;
    particularly this book: https://www.crcpress.com/product/isbn/9781466572843 [crcpress.com]

    • (Score: 3, Insightful) by bradley13 on Tuesday May 05 2015, @06:18AM

      by bradley13 (3053) on Tuesday May 05 2015, @06:18AM (#178969) Homepage Journal

      I don't get it either. Back when I was active in research, I published my code and data online, for anyone to download. IIRC my entire research group at UT Austin did the same. Science is all about reproducible results; in computer science, reproducing results requires access to the underlying code and data.

      Publishing your code and data has been voluntary up to now. The only thing that needs to change: all serious journals need to require full disclosure and put the links right in the articles. Referees should check that the disclosed information is available and complete.

      One problem remains: people move around a lot, and leave a trail of data tied to old accounts that are eventually deleted, or at least no longer maintained. For published articles, a copy of the code and data should be hosted by the conference or by the publishers, so that it remains available long-term. Hey, the journals would actually be providing a tangible service, in return for their crazy prices!

      --
      Everyone is somebody else's weirdo.
  • (Score: 0) by Anonymous Coward on Tuesday May 05 2015, @01:19AM

    by Anonymous Coward on Tuesday May 05 2015, @01:19AM (#178894)

    Liars. You're all liars. All of you liars.

  • (Score: 0) by Anonymous Coward on Tuesday May 05 2015, @01:23AM

    by Anonymous Coward on Tuesday May 05 2015, @01:23AM (#178895)

    Reliable until they find the critical bug in those 20k lines of code :!

    • (Score: 0) by Anonymous Coward on Tuesday May 05 2015, @05:55AM

      by Anonymous Coward on Tuesday May 05 2015, @05:55AM (#178966)

      That is exactly what it means.

  • (Score: 2) by Non Sequor on Tuesday May 05 2015, @01:54AM

    by Non Sequor (1005) on Tuesday May 05 2015, @01:54AM (#178901) Journal

    Analysis of reproducibility and data sharing in scientific studies.

    (For all I know that's the title. I didn't see an obvious link to the actual paper and the press release published by phys.org and Eurekalert doesn't mention the topic of the work. I have a suspicion that the university press relations staff asked the researchers to say something interesting about their work and what they came up with was that they released more data and code than any paper they could remember off the tops of their heads.)

    --
    Write your congressman. Tell him he sucks.
    • (Score: 2) by frojack on Tuesday May 05 2015, @02:59AM

      by frojack (1554) on Tuesday May 05 2015, @02:59AM (#178915) Journal

      The bottom link eventually states that they were studying data center efficiency.

      All in all, I'm not impressed. All that data is probably non transferable to any other data center, any code written is probably spread sheets (and 20,000 lines isn't much anyway). So to me its not clear that any actual science was done, but even if there was some, the data dump is useless to anyone else.

      I question the value of all that data, software, not only in this case, but in most fields of study. The point of re-doing any study, is seldom to repeat it step by step hoping to find something different, or chance upon an error. 100 repetitions of a flawed protocol are not useful. Just about nobody is going to use any software developed from one study. (i've seen some scientist's software, and it wasn't pretty).

      Rather, you want to confirm the outcome, or refute the findings. If you develop your own study, and you confirm the result, that confirmation is much more valuable than repeating the prior study exactly, right down to using their software. If your results don't match then you can look for differences, or bugs in your design, or their design.

      Clearly you can't repeat every study, due to the costs. In those cases, forensic examination of their data and their procedures, and even their software might be warranted.

      But surely not in an efficiency study of one data center.

      --
      No, you are mistaken. I've always had this sig.
  • (Score: 2) by kaszz on Tuesday May 05 2015, @02:47AM

    by kaszz (4211) on Tuesday May 05 2015, @02:47AM (#178912) Journal

    How have they solved the distribution of 200 GB of data?

    • (Score: 0) by Anonymous Coward on Tuesday May 05 2015, @03:03AM

      by Anonymous Coward on Tuesday May 05 2015, @03:03AM (#178920)

      We have this thing called the Internet.

    • (Score: 2) by Gravis on Tuesday May 05 2015, @03:15AM

      by Gravis (4596) on Tuesday May 05 2015, @03:15AM (#178931)

      no, somebody else did: bittorrent.

      • (Score: 2) by frojack on Tuesday May 05 2015, @03:44AM

        by frojack (1554) on Tuesday May 05 2015, @03:44AM (#178943) Journal

        So, about 25 full lenght 1080P HD movies worth of useless data? Seriously, who the hell is going to re-seed that?

        --
        No, you are mistaken. I've always had this sig.
    • (Score: 3, Informative) by WhiteSpade on Tuesday May 05 2015, @08:24AM

      by WhiteSpade (301) on Tuesday May 05 2015, @08:24AM (#178993)

      I work in a neuroscience research lab, and we distribute our data (~400GB worth) using quite a few different methods [studyforrest.org], including git-annex [branchable.com]. Others in my lab are working to take git-annex one step further with datalad [github.com] (terrible name, I know) to make it easier for scientists to discover, use, manage, and share their data.

      On the non-distribution side, we publish all our code, all our data, hell even the paper [f1000research.com] for the above data is open source on GitHub [github.com]. The makefile generates the paper with all figures.

      Our tools are all open source. Any tools we write (such as pymvpa [pymvpa.org]) are released as open source, and we do all our collection and analysis on Neurodebian [debian.net], where we package and distributes neuroscience software for Debian/*buntu. You can say we believe in open science. ;-)

      I'm glad to see others publishing like this. It is unfortunately all too rare, but it's slowly becoming more and more common.

      ---Alex