Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by hubie on Friday November 01, @01:40AM   Printer-friendly

Arthur T Knackerbracket has processed the following story:

If you believe Mark Zuckerberg, Meta's AI large language model (LLM) Llama 3 is open source.

It's not, despite what he says. The Open Source Initiative (OSI) spells it out in the Open Source Definition, and Llama 3's license – with clauses on litigation and branding – flunks it on several grounds.

Meta, unfortunately, is far from unique in wanting to claim that some of its software and models are open source. Indeed, the concept has its own name: open washing. 

This is a deceptive practice in which companies or organizations present their products, services, or processes as "open" when they are not truly open in the spirit of transparency, access to information, participation, and knowledge sharing. This term is modeled after "greenwashing" and was coined by Michelle Thorne, an internet and climate policy scholar, in 2009.

With the rise of AI, open washing has become commonplace, as shown in a recent study. Andreas Liesenfeld and Mark Dingemanse of Radboud University's Center for Language Studies surveyed 45 text and text-to-image models that claim to be open. The pair found that while a handful of lesser-known LLMs, such as AllenAI's OLMo and BigScience Workshop + HuggingFace with BloomZ could be considered open, most are not. Would it surprise you to know that according to the study, the big-name ones from Google, Meta, and Microsoft aren't? I didn't think so.

But why do companies do this? Once upon a time, companies avoided open source like the plague. Steve Ballmer famously proclaimed in 2001 that "Linux is a cancer," because: "The way the license is written, if you use any open source software, you have to make the rest of your software open source." But that was a long time ago. Today, open source is seen as a good thing. Open washing enables companies to capitalize on the positive perception of open source and open practices without actually committing to them. This can help improve their public image and appeal to consumers who value transparency and openness.

[...] That's not to say all the big-name AI companies are lying about their open source street cred. For example, IBM's Granite 3.0 LLMs really are open source under the Apache 2 license.

Why is this important? Why do people like me insist that we properly use the term open source? It's not like, after all, the OSI is a government or regulatory organization. It's not. It's just a nonprofit that has created some very useful guidelines.

[...] If we need to check every license for every bit of code, "developers are going to go to legal reviews every time you want to use a new library. Companies are going to be scared to publish things on the internet if they're not clear about the liabilities they're encountering when that source code becomes public."

Lorenc continued: "You might think this is only a big company problem, but it's not. It's a shared problem. Everybody who uses open source is going to be affected by this. It could cause entire projects to stop working. Security bugs aren't going to get fixed. Maintenance is going to get a lot harder. We must act together to preserve and defend the definition of open source. Otherwise, the lawyers are going to have to come back. No one wants the lawyers to come back."

I must add that I know a lot of IP lawyers. They do not need or want these headaches. Real open source licenses make life easier for everyone: businesses, programmers, and lawyers. Introducing "open except for someone who might compete with us" or "open except for someone who might deploy the code on a cloud" is just asking for trouble.

In the end, open washing will dirty the legal, business, and development work for everyone. Including, ironically, the shortsighted companies now supporting this approach. After all, almost all their work, especially in AI, is ultimately based on open source.


Original Submission

Related Stories

The Drunken Plagiarists: Working with Co-pilots 21 comments

The Association for Computing Machinery has a post by George Neville-Neil of FreeBSD fame comparing LLMs to drunken plagiarists:

Before trying to use these tools, you need to understand what they do, at least on the surface, since even their creators freely admit they do not understand how they work deep down in the bowels of all the statistics and text that have been scraped from the current Internet. The trick of an LLM is to use a little randomness and a lot of text to Gauss the next word in a sentence. Seems kind of trivial, really, and certainly not a measure of intelligence that anyone who understands the term might use. But it's a clever trick and does have some applications.

[...] While help with proper code syntax is a boon to productivity (consider IDEs that highlight syntactical errors before you find them via a compilation), it is a far cry from SEMANTIC knowledge of a piece of code. Note that it is semantic knowledge that allows you to create correct programs, where correctness means the code actually does what the developer originally intended. KV can show many examples of programs that are syntactically?but not semantically?correct. In fact, this is the root of nearly every security problem in deployed software. Semantics remains far beyond the abilities of the current AI fad, as is evidenced by the number of developers who are now turning down these technologies for their own work.

He continues by pointing out how LLMs are not only based on plagiarism, they are unable provide useful annotation in the comments or otherwise address the semantics of the code they swipe.

Previously:
(2024) Make Illegally Trained LLMs Public Domain as Punishment
(2024) The Open Secret Of Open Washing
(2023) A Jargon-Free Explanation of How AI Large Language Models Work
(2019) AI Training is *Very* Expensive
... and many more.


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Insightful) by Rosco P. Coltrane on Friday November 01, @03:13AM (5 children)

    by Rosco P. Coltrane (4757) on Friday November 01, @03:13AM (#1379738)

    AI is a tiny program doing simple repetitive calculation on a huge set of numbers (the training dataset). Even of both were fully open source, you can't fork and "patch" the training dataset because you need Facebook's training data for that, and they're not gonna give it to you. Not to mention, you need millions of dollars and Big Data-levels of computing resources, and that's typically not available to your average open-source enthusiast.

    It's like if I gave you a tiny bytecode interpreter and a huge executable I compiled myself and I told you "Here, it's all open source." Fat good it's gonna do you if you can't recompile the binary because you don't have the source code and the giant-ass computer I used to generate it.

    Zuckerberg open-sourced Llama because it doesn't matter: Llama is squarely under Facebook's control.

    • (Score: 3, Touché) by darkfeline on Friday November 01, @07:52AM (1 child)

      by darkfeline (1030) on Friday November 01, @07:52AM (#1379764) Homepage

      That's not necessarily important. That's kind of the point of LLMs, after all. You don't need to retrain the whole model, but you can train the base model to specialize in various tasks. Retraining the whole model wouldn't be practical anyway, which is again the point of LLMs.

      It's a little bit like having an open source processor design and saying that it's not open source because it doesn't come with a consumer friendly chip fab or open source instructions for setting up, hiring for, and managing/funding a full chip fab.

      --
      Join the SDF Public Access UNIX System today!
      • (Score: 4, Insightful) by stormwyrm on Friday November 01, @09:45AM

        by stormwyrm (717) on Friday November 01, @09:45AM (#1379769) Journal
        The whole point of Free/Open Source Software is that you have the freedom to share and change the program. You cannot easily make meaningful alterations to the model without the training data used to create the model. If you do not have the ability to change the program (i.e. large language model in this case) in its entirety then you do not have open source. Training the base model to specialise in various tasks is like linking your open program to a closed-source library. That you are allowed to do this does not make the library open source. The question of resources is another matter entirely.
        --
        Numquam ponenda est pluralitas sine necessitate.
    • (Score: 3, Interesting) by stormwyrm on Friday November 01, @09:34AM (2 children)

      by stormwyrm (717) on Friday November 01, @09:34AM (#1379768) Journal
      The GNU General Public License has this basic definition in section 1: 'The “source code” for a work means the preferred form of the work for making modifications to it. “Object code” means any non-source form of a work.' Thus, in the context of an LLM the only thing that really qualifies as source code under that definition is the training data set that was used to create the model. The model they openly provide is thus object code, so it is no different from providing you with a closed-source executable because there is no easy way to determine what happens when you alter the weights in the model. That makes it even more opaque than machine language binary object code, which can be disassembled and some inkling of the logic used can be obtained: there is apparently no way to go from model back to anything approximating the training data.
      --
      Numquam ponenda est pluralitas sine necessitate.
      • (Score: 1, Interesting) by Anonymous Coward on Friday November 01, @02:41PM (1 child)

        by Anonymous Coward on Friday November 01, @02:41PM (#1379816)

        The GNU General Public License has this basic definition in section 1: 'The “source code” for a work means the preferred form of the work for making modifications to it. [...] Thus, in the context of an LLM the only thing that really qualifies as source code under that definition is the training data set that was used to create the model.

        I'm not really in to LLMs but is this really true? Do people who make these things really retrain the weights from scratch, using all their original (possibly modified) datasets, whenever they want to make modifications to the model?

        I'm not sure they do. It seems to me that in the usual cases modifying the model is done by taking the existing weights as they are and just doing more training with new data until everything is working according to whatever criteria the designer decided.

        From that perspective, the training data is not really source code, it's more akin to design documentation which while certainly nice to have, is not an essential part of the source code and the GPL would not require its distribution.

        • (Score: 1, Informative) by Anonymous Coward on Friday November 01, @08:15PM

          by Anonymous Coward on Friday November 01, @08:15PM (#1379858)

          You can finetune the model with your own data. As long as you have inference and training code, that's all you need.. besides compute.

  • (Score: 2) by mrpg on Friday November 01, @04:25AM (14 children)

    by mrpg (5708) <mrpgNO@SPAMsoylentnews.org> on Friday November 01, @04:25AM (#1379751) Homepage

    I'm still trying to understand the difference betwixt free software and open source.

    • (Score: 5, Interesting) by janrinok on Friday November 01, @06:31AM (8 children)

      by janrinok (52) Subscriber Badge on Friday November 01, @06:31AM (#1379760) Journal

      Free software is just that - free. It costs you nothing. It has no conditions attached to it.

      Open Source is also free, as in it costs nothing, but it usually comes with specific conditions which protect your rights to change it, use it elsewhere, and copy it in part or in whole. Often, one of those conditions is that the rights that have been given to you must continue to applied to whatever you do with it if you distribute it. Not all have these conditions however.

      One of the things that some businesses wish to be able to do is use your work for free in something that they wish to sell or benefit from, yet hold you responsible for maintenance, upkeep and continued security of that software. Additionally, they have no wish to reveal how or where they use it when they distribute it, which is often in direct contradiction of the terms of the licence. Essentially, that want to benefit from Open Source but take none of the risks of using it. The licence can often be your protection against this.

      There are many additional nuances to this simple comparison, and not all licences are the same. Some specifically say that others can use it however they wish. But those licences usually also say that anyone who wishes to use it does so at their own risk.

      --
      I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
      • (Score: 3, Informative) by stormwyrm on Friday November 01, @07:32AM (7 children)

        by stormwyrm (717) on Friday November 01, @07:32AM (#1379763) Journal
        No. In general practice, free software and open source are equivalent. They both make use of licenses to provide conditions to give you the rights to share and change the software, and some licenses (e.g. the GNU General Public License) will also impose conditions on you redistributing the work to others if you have made such changes. The real difference between Free Software and Open Source is the emphasis. The Free Software movement, typified by the GNU Project and the Free Software Foundation emphasises the freedom to share and change software as being fundamental rights like the rights to life, liberty, and the pursuit of happiness. Free as in speech, not free as in beer as they say. This gives it a political colouring that has had businesses looking at this askance, so the Open Source movement, typified by the Open Source Initiative and corporations like Red Hat, while endorsing the same practices, chooses instead to emphasise the pragmatic benefits of the open model of development and de-emphasises the issue of rights. Nowhere do either of these movements is there any mention of free as in costing nothing. That is incidental, and there is nothing that either the Free or Open movements say that software has to be provided or supported at no cost. In actual practice it seems to have resulted in the Open Source people being more willing to compromise with big businesses than the Free Software people, who are seen by the Open Source people and their allies as angry iconoclasts tilting at windmills.
        --
        Numquam ponenda est pluralitas sine necessitate.
        • (Score: 3, Interesting) by janrinok on Friday November 01, @09:00AM (2 children)

          by janrinok (52) Subscriber Badge on Friday November 01, @09:00AM (#1379767) Journal

          This is why there is so much confusion.

          Some of what you have said seems to contradict the OSI's [opensource.org] own definition as stated in the link in TFS. It still requires businesses to comply with the terms of the license with regard to source code ("The program must include source code,"), derivative works ("The license must allow modifications and derived works, and must allow them to be distributed under the same terms as the license of the original software."), the transfer of rights ("The rights attached to the program must apply to all to whom the program is redistributed without the need for execution of an additional license by those parties.") etc.

          I do not see this as being more compatible with businesses who, nevertheless, do NOT always comply with these restrictions.

          Your interpretation might be one that you and your employers choose to adopt but I do not agree that it gives it a "political colouring". They may have originated in different ways and for different reasons but they are very similar in what they both say.

          It is still incumbent upon all users, private or business, to verify that the code does what they want and to take responsibility for how it is used.

          I will concur with your statements regarding 'free as in cost' - neither license says that it cannot be paid for. But it is free as in "The license shall not require a royalty or other fee for such sale.". There is still much genuine 'free' software given away on in magazines, or online as part of a subscription to a magazine or just offered in various repositories.

          I happen to be one of those who thinks that, if I have bought a product, I can do with it as I wish and be entirely responsible for any outcomes. I do not accept that the seller can, at a later date, withdraw my right to use something without giving me a full refund for whatever it cost. I suppose that puts me in the category of having rights more in accordance with the FSF and not so much in the way that some businesses are currently using it.

          --
          I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
        • (Score: 1, Interesting) by pTamok on Friday November 01, @12:15PM (3 children)

          by pTamok (3042) on Friday November 01, @12:15PM (#1379785)

          In general practice, free software and open source are equivalent.

          It depends who you ask, and what the precise definition of each is.

          First of all, in English, free has two meanings:

          1) Free: available without payment or other transfer of value. A little used sysnonym is gratis.
          2) Free: Without restrictions, at liberty, as in, set free from prison. Often, a French word is used to set apart from the other meaning: libre.

          So free software can be available without payment - there used to be a thing called 'Freeware'.

          Secondly, the term 'Open Source' has been interpreted in different ways.
          1) The source code is available to people other than the owner for perusal. It can be read. In this interpretation, other rights are not granted: for example, to copy, republish, modify, or publish modifications.
          2) The source code is available for re-use: to copy, republish, modify, or publish modifications.

          The first interpretation is considerably more restrictive than the second.

          In the case of FLOSS (Free (as in gratis), Libre (as in unrestricted), Open Source Software), the second interpretation is used, and a common licence is the GNU GPL, as are the BSD licences. There are others. Unfortunately, some people interpret 'open' source in the first way: you are open to read the source, it is not hidden, but you can;t do anything else with it. SOme might say that this restrictive interpretation was generated precisely to confuse people wanting a simple label 'open source' that gives the most useful capabilities.

          The end result is that you need to read precisely what any licence associated with software that you use actually says. Simple labels can be misleading. Which is to be regretted.

          • (Score: 2) by VLM on Friday November 01, @02:59PM (1 child)

            by VLM (445) on Friday November 01, @02:59PM (#1379819)

            I don't disagree with your post but I would add to it the factor of discrimination.

            For example, some CC-NC licensed "stuff" happens to be software and my LLC cannot use it because its "-NC-" non-commercial however if I roll my chair across my home office to my personal desk then as a private citizen the "-NC-" no longer applies and I can do what I want with it. There are other licensees and other forms of discrimination of course but CC-NC came immediately to mind as "non-commercial" is literally in the name.

            Then there's the strange gray area of "Fair use" where my LLC could indeed commercially use it regardless of "CC-NC" for some limited values of the concept of "use". Write an editorial/academic product review, etc.

            • (Score: 1) by pTamok on Friday November 01, @04:59PM

              by pTamok (3042) on Friday November 01, @04:59PM (#1379834)

              Very good points.

              It emphasises the (unfortunate) need to be familiar with the precise terms of the licence of the software one wishes to use. Relying on a simple label, such as 'free' or 'open source' is simply insufficient,

              From my point of view, I try to stick to GPL (2 or 3), BSD licences, and a few others (like MIT, Artistic licences after version 1, Apache). It is a problem for people wanting to use FLOSS software, because it Is confusing.

              As the Wikipedia article points out, there are two organisations that curate lists of software with various degrees of freedom/openness: The Free Software Foundation (FSF) and the Open Source Initiative (OSI).

              https://en.wikipedia.org/wiki/Free_software#Licensing [wikipedia.org]

              So there is a relatively simple algorithm: if both the FSF and OSI approve a licence, it is very likely both free (libre) and open source in the flexible interpretation.

              Of course, you need to check that this is true in your jurisdiction if it is not the same as that which the FSF and OSI operate under (laws differ between jurisdictions).

              For many people, the certainties of commercial licensing of non-free software are worth it. It's pretty clear what you can or cannot do.

              Things are complicated by copyright, and product liability as well. Discussions for another time/place.

          • (Score: 1) by pTamok on Sunday November 03, @03:18PM

            by pTamok (3042) on Sunday November 03, @03:18PM (#1380128)

            I am intrigued as to why the parent post to this got moderated 'Flamebait'.

            I was trying to be factual: what did I get wrong?

    • (Score: 2) by Thexalon on Friday November 01, @12:38PM (4 children)

      by Thexalon (636) on Friday November 01, @12:38PM (#1379792)

      One way of looking at it:

      Free Software comes from an academic viewpoint of how software should work. The logic is that (a) software is basically an R&D effort with nearly zero cost to make more copies, (b) information should be spread and shared freely to anybody who wants it, ergo software should be spread and shared freely. And among the information that should be freely available is how to make the software, so that means source code in what amounts to a commons of software anybody who wants to can use and improve. Free Software takes a lot of steps (e.g. the GPL) aimed at guarding against businesses taking that common software and taking it for themselves to effectively enclose the commons behind paywalls and other barriers.

      Open Source comes from a business viewpoint of how to software should work, with a goal of making software cheaper and higher quality. The logic is that (a) anyone who uses software should be able to fix problems, (b) stuff everybody needs should be available as cheaply as possible, and (c) duplication of development effort is wasteful. That leads to a more flexible idea about what counts as being "open" enough to still be good enough. All Free Software generally counts as Open Source, but Open Source also allows situations where (a) a company basically treats the open-source portion as a free sample and has added and sold proprietary extensions to make the open-source portion actually useful, (b) a company controls what happens to all the fixes that users or outside developers contribute, or (c) a company takes a project the community has developed and turns it into a fully-owned proprietary product.

      Guess which one businesses prefer.

      --
      "Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
      • (Score: 2) by Freeman on Friday November 01, @01:51PM (1 child)

        by Freeman (732) on Friday November 01, @01:51PM (#1379800) Journal

        Most businesses prefer neither, they prefer closed source. Look at the likes of Microsoft Office vs LibreOffice.

        --
        Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
        • (Score: 3, Touché) by Thexalon on Friday November 01, @08:21PM

          by Thexalon (636) on Friday November 01, @08:21PM (#1379860)

          Some businesses like the open-source model, though, because it gives them free labor from suckers^H^Hvolunteer developers. They especially like to prey on the young and naive for this.

          --
          "Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
      • (Score: 2) by mrpg on Friday November 01, @02:21PM (1 child)

        by mrpg (5708) <mrpgNO@SPAMsoylentnews.org> on Friday November 01, @02:21PM (#1379810) Homepage

        It seems to me that OS doesn't always gives me the four freedoms.

        "While open source software typically provides access to the source code, it may not always adhere strictly to the four freedoms. For example, some open source licenses may restrict the ability to modify and redistribute the software."

        That's the thing, sometimes it seems FS.

        • (Score: 2) by Thexalon on Friday November 01, @08:23PM

          by Thexalon (636) on Friday November 01, @08:23PM (#1379861)

          You are correct: Open Source (or often faked open source) regularly skip over the things that Free Software demands, because they're trying to be business friendly.

          --
          "Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
  • (Score: 2) by VLM on Friday November 01, @02:50PM

    by VLM (445) on Friday November 01, @02:50PM (#1379818)

    I expect a new wave of the BSD and GPL folks BOTH simultaneously using the new term to try to politically declare their opponent to no longer be open source because they're merely "openwashing".

    Unfortunately, this will probably be the only significant long-term outcome of the newly invented term.

  • (Score: 3, Disagree) by ElizabethGreene on Friday November 01, @04:41PM (3 children)

    by ElizabethGreene (6748) on Friday November 01, @04:41PM (#1379829) Journal

    This is a space where we need to terms to be accurate.

    To be open source, i.e. for a person to be able to view the source and independently recreate it from that source, they would need access to the full training set, the software tools used to do the training, and the adjustments made in that training process. This would still not exactly reproduce the model because there is a large random element in AI training.

    A better term would be, "Open Weight", where the weights used in the model are publicly available for download and use.

    Neither of these would match the OSI's definition of Open Source, because they have chosen to overload that term to include a bunch of other things.

    • (Score: 2, Interesting) by Anonymous Coward on Friday November 01, @08:54PM (2 children)

      by Anonymous Coward on Friday November 01, @08:54PM (#1379866)

      To be open source, i.e. for a person to be able to view the source and independently recreate it from that source, they would need access to the full training set, the software tools used to do the training, and the adjustments made in that training process. This would still not exactly reproduce the model because there is a large random element in AI training.

      I will note that the purpose of access to source code, as codified in the both the Open Source Definition [opensource.org] and the Free Software Definition [gnu.org] is to facilitate creating and using modified versions. It is not about "independently recreating it from source". It is unclear to me that the original training data is necessary or even typically used in order to make modifications to an LLM model.

      It seems to me that the training weights are a bit like writing other kinds of magic numbers in programs without explanation. There's just a lot more of them. For some simpler hand written examples maybe I have a C program that contains the following function:

      uint32_t divide_u32_by_10000(uint32_t x)
      {
         return (x * UINT64_C(3518437209)) >> 45;
      }

      Or here is another that I think is quite a bit less obvious what's going on:

      int count_u64_trailing_zeroes(uint64_t x)
      {
         static const unsigned char bitpos[64] = {
             0,  1,  2,  7,  3, 13,  8, 19,  4, 25, 14, 28,  9, 34, 20, 40,
             5, 17, 26, 38, 15, 46, 29, 48, 10, 31, 35, 54, 21, 50, 41, 57,
            63,  6, 12, 18, 24, 27, 33, 39, 16, 37, 45, 47, 30, 53, 49, 56,
            62, 11, 23, 32, 36, 44, 52, 55, 61, 22, 43, 51, 60, 42, 59, 58
         };

         return bitpos[(((x & -x)*151050438420815295)>>58) & 63];
      }

      Fun fact: x86_64 GCC can actually compile the above count_u64_trailing_zeroes function to a single instruction [godbolt.org] if it knows that the argument is never zero.

      Now as I care about my craft, I would document how I came up with the numbers 3518437209. 45, 151050438420815295 and 58 if I actually used code like this in a program. And if I saw code like this somewhere like in a soylentnews comment I wouldn't just start using it in my programs without first understanding why they work. But I think nobody would reject programs with functions like these as being non-free or closed source. Understanding the reasons why these magic numbers were chosen, or how they were generated, or proofs of correctness for these functions -- these are all nice things to know but are not prerequisites for a program to be free software.

      • (Score: 2) by sonamchauhan on Sunday November 03, @05:17AM

        by sonamchauhan (6546) on Sunday November 03, @05:17AM (#1380082)

        The training weights seem to encode the behaviour of the 'LLM program'.

        I think 'open weight' LLMs are more akin to encoding a multi-GB zipped executable in that data structure example you offered, with the program instructed to unzip and run it.

      • (Score: 2) by ElizabethGreene on Monday November 04, @03:39PM

        by ElizabethGreene (6748) on Monday November 04, @03:39PM (#1380265) Journal

        For me, the problem with not including the training data and pipeline is it makes the origin story for the resulting magic numbers opaque. It is trivially easy to encode biases, malicious or benign, in the training process.

        Imagine the NSA published a "Best encryption in the universe" algorithm that included a large block of seemingly random magic numbers they pinky swore didn't contain a backdoor. Would you trust that?

        (I'm not sure how to phrase this next bit without it turning into a political thing, so please indulge me a moment. I don't want to have a political conversation.) My limited experience is that college students and recent graduates have strong biases in their training sets and their work output reflects those biases. As they accrue life experiences outside academia, the sharper edges of those biases get rounded off as they recognize the mismatch between the training set and reality. Today's LLMs don't have the ability to accrue experiences "beyond academia". It's always graduation day for an AI model, and within a decade they'll be making substantial decisions impacting billions of people. The opacity of the biases in the training sets and pipelines is scary to me. I want to understand "where their head is at".

  • (Score: -1, Troll) by Anonymous Coward on Friday November 01, @08:13PM

    by Anonymous Coward on Friday November 01, @08:13PM (#1379857)

    They can't be expected to open the dataset because there are copyrighted things inside. Training techniques and all that, sure.

    So who is the "open source initiative"? A licensing and copyright body.

    They're totally picking at million dollar freely released AI to our benefit, guys.

  • (Score: 2) by rpnx on Sunday November 03, @05:37PM (1 child)

    by rpnx (13892) on Sunday November 03, @05:37PM (#1380142) Journal

    I think large language models aren't actually eligible for copyright protection, so maybe you can just ignore the license anyway?

    It would be interesting to see that tested in court.

    • (Score: 0) by Anonymous Coward on Tuesday November 05, @09:55AM

      by Anonymous Coward on Tuesday November 05, @09:55AM (#1380376)

      It would be interesting to see that tested in court.

      Just say you found it publicly accessible on the Internet and used it for training your AI, so it's not infringement of any sort.

      But I guess what works for them won't work for you if you're not a big fish but one of the small fishes?

(1)