Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday March 18 2015, @06:51PM   Printer-friendly
from the data-is-power dept.

Large datasets and predictive analytics software are a fertile field for innovation, but while excellent open source tools like Sci-Py, R, etc are freely available, the datasets are not. A Computerworld article notes that the scarcity of large publicly available data collections has led to a database released for a competition by Netflix half a decade ago now being constantly used in computer science research.

Australia's government does provide an easy way to find, access and reuse some public datasets, but most public and private databases are silo-ed away from experimenters. The Open Data Handbook offers some guidelines for defining openness in data, but offers little in ways to drive organisations to make their datasets available.

So do we need a GPL for data, and if so, what would it look like?

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by kaszz on Wednesday March 18 2015, @07:39PM

    by kaszz (4211) on Wednesday March 18 2015, @07:39PM (#159555) Journal

    Human readable in its raw form may be hard for really large datasets. But one can document the format properly. And including the original dataset may be quite unpractical.

    Some governments also publish datasets freely.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2, Disagree) by TLA on Wednesday March 18 2015, @07:46PM

    by TLA (5128) on Wednesday March 18 2015, @07:46PM (#159562) Journal

    that's why I qualified it with "...where practical". :)

    --
    Excuse me, I think I need to reboot my horse. - NCommander