Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday March 18 2015, @06:51PM   Printer-friendly
from the data-is-power dept.

Large datasets and predictive analytics software are a fertile field for innovation, but while excellent open source tools like Sci-Py, R, etc are freely available, the datasets are not. A Computerworld article notes that the scarcity of large publicly available data collections has led to a database released for a competition by Netflix half a decade ago now being constantly used in computer science research.

Australia's government does provide an easy way to find, access and reuse some public datasets, but most public and private databases are silo-ed away from experimenters. The Open Data Handbook offers some guidelines for defining openness in data, but offers little in ways to drive organisations to make their datasets available.

So do we need a GPL for data, and if so, what would it look like?

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Wednesday March 18 2015, @08:06PM

    by Anonymous Coward on Wednesday March 18 2015, @08:06PM (#159571)

    Good point. Many data sets at various institutions could be useful if publicized, but often contain various forms of personal information that needs to be anonymized, which may be trickier than it seems. In either case, both this cleaning and the exporting and packaging of the data in a suitable format for publishing takes time and effort, which translates to labor costs.

    Why spend a significant fraction of your operating budget to give outsiders free stuff when you could also spend that towards your mission statement or a better coffee machine. Other than some good pr, the incentives are just not there.

  • (Score: 2, Informative) by Anonymous Coward on Wednesday March 18 2015, @08:18PM

    by Anonymous Coward on Wednesday March 18 2015, @08:18PM (#159579)

    > Other than some good pr, the incentives are just not there.

    It should be made part of the grant requirements. Just like it is starting to be for open access publication. [wikipedia.org]

    However, I too am super wary of any 'anonymization' of personal data. Anything that is not personal, like climate data, or number of animals per square meter, etc, that's no problem. Personal data though is so easy to fuck up when anonymize because much of the problem comes from cross-referencing it with other databases so what seems anonymous to the person who only looks at that one data set turns out not to be anonymous once in the full context of the world at large... [techdirt.com] Anything close to that may need a different policy.

    • (Score: 2) by kaszz on Wednesday March 18 2015, @11:08PM

      by kaszz (4211) on Wednesday March 18 2015, @11:08PM (#159643) Journal

      Next question is if the person doing the anonymization is competent enough. How does one measure? And can the workplace hire and retain such people?