Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday March 18 2015, @06:51PM   Printer-friendly
from the data-is-power dept.

Large datasets and predictive analytics software are a fertile field for innovation, but while excellent open source tools like Sci-Py, R, etc are freely available, the datasets are not. A Computerworld article notes that the scarcity of large publicly available data collections has led to a database released for a competition by Netflix half a decade ago now being constantly used in computer science research.

Australia's government does provide an easy way to find, access and reuse some public datasets, but most public and private databases are silo-ed away from experimenters. The Open Data Handbook offers some guidelines for defining openness in data, but offers little in ways to drive organisations to make their datasets available.

So do we need a GPL for data, and if so, what would it look like?

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by Phoenix666 on Wednesday March 18 2015, @07:30PM

    by Phoenix666 (552) on Wednesday March 18 2015, @07:30PM (#159549) Journal

    Open data is essential. Private company data, sure, there are reasons to lock that away. But data generated by government, that we have already paid them to collect and publish via our tax dollars, absolutely should be available to the public in electronic form, free of charge. It's ridiculous when they try to charge you fees for the stuff. I've been participating with Code for America for the past several years and the developer community has been able to do interesting things with the public data sets they have been able to get their hands on, like heat maps for traffic accidents in NYC that have helped inform public policy (for example, thanks to the better data visualization the Mayor's office recently announced $250M to implement traffic safety measures on the most dangerous roads). Those are actual uses of Open Data to help save actual lives.

    A lot more can be done, too, just in the realm of publicly-funded research. Every research outfit that takes government funding from, say, the NSF or NIH should be required to make all their stuff available to the public, free of charge. I realize some of it is, and some of it isn't, but it should be a universal standard, universally applied. I would even say that we should have an amendment to the Constitution to guarantee it, but, heh, who in government pays any attention to that hackneyed old thing anymore?

    --
    Washington DC delenda est.
    Starting Score:    1  point
    Moderation   +3  
       Insightful=1, Interesting=1, Informative=1, Total=3
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 2) by wantkitteh on Friday March 20 2015, @04:36AM

    by wantkitteh (3362) on Friday March 20 2015, @04:36AM (#160268) Homepage Journal

    When you say "available to the public", do you mean that everyone should be able to read everyone else's government data, or everyone should be able to read their own government data?

    • (Score: 2) by Phoenix666 on Friday March 20 2015, @12:44PM

      by Phoenix666 (552) on Friday March 20 2015, @12:44PM (#160358) Journal

      I'm talking about data sets you can get, but which the government currently charges you for. For example, if I want to get the data set of who voted in the last election from the New York Board of Elections, a common thing to do if you're a candidate or organization that wants to find out which voters vote the most (they're called "1's," or people who vote in every election, even judicial ones and special elections) so you can focus on winning them over, you have to spend $50-100 for a *CD* with that info. But that is data we paid them to gather already in the form of their salaries and benefits. So why should we have to pay them for it again? It should be freely available online without even so much as having to register on their website.

      Because consider that it's not only the voter data you need, but maps with neighborhood shape files and demographics (income, household size, etc) gathered by the US Census. That stuff, I can tell you from experience, can very quickly run into the thousands of dollars. So you see, it doesn't take very long at all before the upfront cost for "public" data becomes prohibitive for the independent developer who wants to build apps or new ways to see, understand, and act on that information.

      It's incredibly irritating. One of my favorite dodges is when the government agency in question claims the data itself is free, but you can only get it packaged from their favorite company X for $500. In essence, it's a corrupt deal between the head of the agency and the CEO of the company, who were roomies at Yale or something.

      So when asked the question, "Does Open Data Need to be the Next Open Source?" I say, "Hell yes!"

      --
      Washington DC delenda est.
      • (Score: 2) by wantkitteh on Friday March 20 2015, @02:51PM

        by wantkitteh (3362) on Friday March 20 2015, @02:51PM (#160412) Homepage Journal

        I don't think that's quite what the original article is referring to - you want data that's already available to be free as in beer, rather than free as in speech which is more the issue at hand. What I'm trying to ask is whether you include other people's personal, private data in the collection of data that your government has assembled and should be available for free. The argument "I contributed towards my neighbourhood's government subsidised drug rehab and mental health facilities, I should be entitled to all their data!" is indicative of a pretty sick attitude, so I hope that's not what you mean.

        • (Score: 2) by Phoenix666 on Friday March 20 2015, @06:22PM

          by Phoenix666 (552) on Friday March 20 2015, @06:22PM (#160515) Journal

          No, that's not what I mean. Privacy is big with me. I think there would be value in an anonymized db of everyone's DNA, for example, because it would do so much for archaeology, epidemiology, etc., but I have no trust in the government whatsoever so scratch that idea.

          --
          Washington DC delenda est.
          • (Score: 2) by wantkitteh on Friday March 20 2015, @08:43PM

            by wantkitteh (3362) on Friday March 20 2015, @08:43PM (#160567) Homepage Journal

            Ok, misunderstanding cleared up ;) See comments/links elsewhere in this comment section for details on how hard anonymising data really is.