Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by janrinok on Thursday November 02 2023, @05:26AM   Printer-friendly

Meta's AI research head wants open source licensing to change:

In July, Meta's Fundamental AI Research (FAIR) center released its large language model Llama 2 relatively openly and for free, a stark contrast to its biggest competitors. But in the world of open-source software, some still see the company's openness with an asterisk.

While Meta's license makes Llama 2 free for many, it's still a limited license that doesn't meet all the requirements of the Open Source Initiative (OSI). As outlined in the OSI's Open Source Definition, open source is more than just sharing some code or research. To be truly open source is to offer free redistribution, access to the source code, allow modifications, and must not be tied to a specific product. Meta's limits include requiring a license fee for any developers with more than 700 million daily users and disallowing other models from training on Llama. IEEE Spectrum wrote researchers from Radboud University in the Netherlands claimed Meta saying Llama 2 is open-source "is misleading," and social media posts questioned how Meta could claim it as open-source.

FAIR lead and Meta vice president for AI research Joelle Pineau is aware of the limits of Meta's openness. But, she argues that it's a necessary balance between the benefits of information-sharing and the potential costs to Meta's business. In an interview with The Verge, Pineau says that even Meta's limited approach to openness has helped its researchers take a more focused approach to its AI projects.

"Being open has internally changed how we approach research, and it drives us not to release anything that isn't very safe and be responsible at the onset," Pineau says.

Meta's AI division has worked on more open projects before

One of Meta's biggest open-source initiatives is PyTorch, a machine learning coding language used to develop generative AI models. The company released PyTorch to the open source community in 2016, and outside developers have been iterating on it ever since. Pineau hopes to foster the same excitement around its generative AI models, particularly since PyTorch "has improved so much" since being open-sourced.

She says that choosing how much to release depends on a few factors, including how safe the code will be in the hands of outside developers.

"How we choose to release our research or the code depends on the maturity of the work," Pineau says. "When we don't know what the harm could be or what the safety of it is, we're careful about releasing the research to a smaller group."

It is important to FAIR that "a diverse set of researchers" gets to see their research for better feedback. It's this same ethos that Meta used when it announced Llama 2's release, creating the narrative that the company believes innovation in generative AI has to be collaborative.

[...] Pineau says current licensing schemes were not built to work with software that takes in vast amounts of outside data, as many generative AI services do. Most licenses, both open-source and proprietary, give limited liability to users and developers and very limited indemnity to copyright infringement. But Pineau says AI models like Llama 2 contain more training data and open users to potentially more liability if they produce something considered infringement. The current crop of software licenses does not cover that inevitability.

"AI models are different from software because there are more risks involved, so I think we should evolve the current user licenses we have to fit AI models better," she says. "But I'm not a lawyer, so I defer to them on this point."

People in the industry have begun looking at the limitations of some open-source licenses for LLMs in the commercial space, while some are arguing that pure and true open source is a philosophical debate at best and something developers don't care about as much.

Stefano Maffulli, executive director of OSI, tells The Verge that the group understands that current OSI-approved licenses may fall short of certain needs of AI models. He says OSI is reviewing how to work with AI developers to provide transparent, permissionless, yet safe access to models.


Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Touché) by Samantha Wright on Thursday November 02 2023, @08:10AM (2 children)

    by Samantha Wright (4062) on Thursday November 02 2023, @08:10AM (#1331201)

    Some cherrypicked, cynical quotations to gross you out (emphasis added):

    But OpenAI co-founder and chief scientist Ilya Sutskever told The Verge it was a mistake to share their research, citing competitive[ness] and safety concerns.

    The rest of TFA seems to be about hand-wringing Facebook goons yearning for an arcane legal incantation that somehow protects them from being sued (by authors and artists) over their ill-gotten web-scraped training corpus. It ain't gonna happen. There is no magic wand that can be waved which will miraculously grant businesses free rein to commit mass, for-profit copyright infringement without consequences. The OSI certainly can't help with it.

    There's also that utterly incestuous Medium essay, which just lists a bunch of non-GNU open source licenses and then moans about them, like Smithers explaining to Mr. Burns why he can't simply shoot union organizers:

    Despite the permissiveness of these licenses, some impose requirements that may affect the utilization of LLMs in a commercial context. For instance, Apache-2.0 and BSD-3-Clause licenses stipulate that any modifications to the model must be documented, a condition that might not suit certain businesses.

    Moreover, Creative Commons licenses (CC-BY-2.0, 3.0, 4.0) mandate giving appropriate credit, providing a link to the license, and indicating if changes were made. These requirements might be burdensome in large-scale commercial applications of LLMs.

    It's almost as if the main goal of half these licenses is to force proprietary vendors to contribute to the commons instead of just extracting value from it! What a crazy concept!

  • (Score: 5, Insightful) by Ox0000 on Thursday November 02 2023, @11:22AM (2 children)

    by Ox0000 (5111) on Thursday November 02 2023, @11:22AM (#1331206)

    Let me paraphrase the article:
    The want to have their cake and eat it too.
    They want to license things under a non open source license and be able to call it open source because the latter gets them credit. That is the fundamental issue.

    The issue is not licensing, because they can craft whatever license they want with whatever terms in their that they want and addresses their needs. They have plenty of lawyers to do that for them. It's just that that license wouldn't be an actual, real, meaningful open source license.
    They want the ability to say "and it's open source" even though something isn't, that is the crux of this all.

    Do not let these ... "people" corrupt what open source means!

    • (Score: 4, Touché) by Runaway1956 on Thursday November 02 2023, @01:18PM

      by Runaway1956 (2926) Subscriber Badge on Thursday November 02 2023, @01:18PM (#1331218) Journal

      So, the tech version of virtue signaling. I'm sure they buy carbon credits too.

    • (Score: 2, Funny) by deganee on Thursday November 02 2023, @08:02PM

      by deganee (3187) Subscriber Badge on Thursday November 02 2023, @08:02PM (#1331273)

      Dude it's free up to 700 million monthly users, it's like anyone-but-faang license.

(1)