Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Tuesday October 24 2017, @09:02AM   Printer-friendly
from the one-license-to-rule-them-all dept.

The Linux Foundation has created one open-data licence framework to rule them all, allowing users to collaborate on data-driven projects.

Today at the Open Source Summit in Prague, executive director Jim Zemlin announced the Community Data License Agreement, which is designed for non-proprietary data.

The org says data producers can now share the goods "with greater clarity about what recipients may do with it".

One branch "puts terms in place to ensure that downstream recipients can use and modify that data, and are also required to share their changes", while the other does not oblige users to share those changes.

The idea is to accelerate machine learning in open source.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 1, Interesting) by Anonymous Coward on Tuesday October 24 2017, @09:26AM (1 child)

    by Anonymous Coward on Tuesday October 24 2017, @09:26AM (#586790)

    Would any legal expert around here care to explain the difference to Creative Commons licenses please?

  • (Score: 4, Insightful) by shrewdsheep on Tuesday October 24 2017, @11:52AM

    by shrewdsheep (5215) on Tuesday October 24 2017, @11:52AM (#586806)

    Just putting out licenses feels like a cheap and uniformed shot on behalf of the Linux Foundation. While there might be a subset of small datasets for which this might be of use there are other licenses out there already. For serious data such as scientific data (except for the astronomical/physical variety), web-traffic, social networking these data are governed by privacy laws which overrule some points in the licenses. Anyway, the difficult part about data is to host it, have meta-data available and have some quality assurances (or least proper descriptions on the measurement itself). In the biomedical sciences this has turned out to be an extremely hard problem, the solution to which so far are publicly funded data repositories. The licenses are non-standardized but licenses seem to be irrelevant compared to the other challenges anyway.
     

  • (Score: 0) by Anonymous Coward on Tuesday October 24 2017, @02:11PM (1 child)

    by Anonymous Coward on Tuesday October 24 2017, @02:11PM (#586856)

    I thought they wanted to take my personal data and share it freely with anyone that wants it.

    The difference between a slut and a whore is that Facebook and Google and AT&T and Verizon and T-Mobile at least beat you and charge for your services. You can't even be a slut in the sharing economy because you can't choose who to share personal data with. you're passed around like a disease.

    a slut might have a friends list, but its mostly other people profiting off your behaviors.
     

    • (Score: 0) by Anonymous Coward on Tuesday October 24 2017, @05:21PM

      by Anonymous Coward on Tuesday October 24 2017, @05:21PM (#586952)

      You need to adjust your lenses, your focus is a bit out of whack.

  • (Score: 2) by Rich on Tuesday October 24 2017, @02:13PM (2 children)

    by Rich (945) on Tuesday October 24 2017, @02:13PM (#586858) Journal

    What's got the Linux Foundation to do with improving the situation for big data applications in free software? Or at least, what should it have to do? From the name alone, I'd expect them to promote the Linux kernel, secure its legal situation, and pave the way for improvements. I've recently posted a massive rant here about the tie-in of graphics drivers to the X ecosystem (and implied how this de-facto inhibits exploring any new paths for all kinds of lean things that would require graphics acceleration). That's my main (and pretty much only) beef with the situation around kernel land. You can happily set up an embedded system without all the Red Hat originated junk (e.g. systemd, pulseaudio, ...), but the one thing you can't get for it is proper graphics. I'd expect the LF to sort that out. They should know it needs sorting, and if they don't, we don't need them.

    I'd expect this big data stuff to be dealt with by the FSF. They have the philosophical background and legal heavyweights to work it out.

    Of course, touching down back to reality, the last I remember from that side was a completely delusional call to the community by Stallman to somehow improve emacs so it becomes more modern and attractive.

    *sigh*

    • (Score: 2) by DannyB on Tuesday October 24 2017, @02:28PM (1 child)

      by DannyB (5839) Subscriber Badge on Tuesday October 24 2017, @02:28PM (#586871) Journal

      This is just a guess. But the Linux kernel is used to host important applications that are all about data. It makes sense to help lubricate the wheels of the ecosystem(s) that produce, consume, use and manipulate that data.

      Linux is used not only in data centers where such data may reside. But Linux is also used in a few billion smart phones. And tablets. And GPS Navigators. And set top boxes of various kinds (Roku, TiVo just to name two), but also so called but poorly named "smart" TVs. Some apps in your living room might rely on open data. Maybe eventually open speech recognition devices, just to speculate. Alexa, destruct sequence 1, 1A, 2B. Etc.

      --
      The lower I set my standards the more accomplishments I have.
      • (Score: 2) by Rich on Tuesday October 24 2017, @02:55PM

        by Rich (945) on Tuesday October 24 2017, @02:55PM (#586886) Journal

        It makes sense to help lubricate the wheels of the ecosystem(s) that produce, consume, use and manipulate that data.

        I fail a bit to see why this would be helpful to Linux as such. It's not like that people would suddenly start using Windows for such tasks; as of June, 99.6% of the Top 500 supercomputers were running Linux. And there are always layers of software in between the kernel and the data processing.

        What I might understand is that these folks are looking at the current developments (Alexa et al. as you said) and think "me too!". But I doubt that the free ecosystem is able to compete with these expensive-to-curate things that are in the end paid for with loss of one's privacy and the free market.

        And sorry for being so repeatingly nagging about the graphics issue, but again, the biggest threat to Linux' market share is Google moving to their own OS for Android. Had they nailed down that stuff firmly within their inner ecosystem, it'd be a much more powerful lock in (or more kindly put, incentive to stay with the platform...). Even more so, for other "customers", if Google decides to go BSD with their stuff.

  • (Score: 2) by looorg on Tuesday October 24 2017, @08:58PM

    by looorg (578) on Tuesday October 24 2017, @08:58PM (#587094)

    I'm skeptical but perhaps this will be nice for machine-gathered data, machines that just automatically create large amounts of sensory data all day everyday. But that might be one of few areas where this might actually be viable. Mostly because the data gathered will be so useless and boring.

    For anything gathered by hand, surveys, observations and such there will just be to much work and to many obstacles and risks involved by just giving it away to anyone. Certainly so when it's data about people. In my mind I have quickly gone over all the projects I, remember and, have been working on for the last couple of years and there would be so much work to be able to distribute the data afterwards it just wouldn't be viable from a time- and economical standpoint. Most concerning tho is that anonymity might go completely out the window. Things would just have to be purged and stripped down to the bare essentials. It's already hard to get people to answer surveys and if we just gave it away afterwards I strongly suspect that the answer-rate would plummet. In some sense they answer our surveys because they trust us not to fuck them over in the end and afterwards.

    For anything that gathers data on people the ethics board will probably have a fit, just trying to get past them would probably be a complete nightmare. It's already starting to be a pain in the arse where you have to adjust questions and answers to fit various criteria. If they would be told we would just hand out all the data afterwards in raw to anyone with an internet connection I think they might have a stroke right there on the spot and a big red NO stamp all over the application would be sure to follow.

    Various companies and commercial entities wouldn't want to give it away since it reveals what they are doing to their competition. So for them there might be other ethical or security concerns. But then that is probably beyond the issue since it's stated in the article that the license is designed for "non-proprietary data".

    But even for the non-commercial data it will probably boil down to economical aspect as gathering data costs money, a lot of money. Just giving it away doesn't make sense in that regard. Not that your data might sell. But just to make it ready for the public is a full time effort. So as previously noted I do suspect that the free data will be generic auto-created data, or things that are just so old it's borderline worthless. Even if Google starts scanning data as they do with books it might not all make sense without human intervention.

    The org says data producers can now share the goods "with greater clarity about what recipients may do with it".

    This sentence doesn't make any kind of sense to me. If I produce data and then give it away I do lose all control and I have no idea what the recipients might or might not do with it. They might use the data all wrong, since they might not take into account what the data was gathered for and how and just interpret the results as to fit some agenda of theirs. I can't take it back at that time. But they can use me for their purpose. They might produce shit with it and drag your name in the mud. I don't have time to sit and monitor the world for people that grab my data from the internet and then write retractions if people do fucked up things with it.

(1)