Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.
posted by janrinok on Friday January 06 2023, @03:51PM   Printer-friendly
from the 17-USC-§§-1201-1205 dept.

As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.

The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.

"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.

Besides open-source licenses and DMCA (§ 1202, which for­bids the removal of copy­right-man­age­ment infor­ma­tion), the lawsuit alleges violation of GitHub's terms of ser­vice and pri­vacy poli­cies, the Cal­i­for­nia Con­sumer Pri­vacy Act (CCPA), and other laws.

The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– Fraud.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Negligence.
– Civil conspiracy.
– Declaratory relief.

Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.

Previously:
(2022) GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit
(2022) Give Up GitHub: The Time Has Come!
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground


Original Submission

 
This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by sjames on Friday January 06 2023, @04:59PM (14 children)

    by sjames (2882) on Friday January 06 2023, @04:59PM (#1285497) Journal

    Just what we need, another lawyer more than happy to support a greedy or butthurt individual who doesn't even understand what they're suing over.

    This is just another take of the patent trend years ago of claiming that 'on a computer' makes an old thing entirely novel. AI learns from existing things, much like human students. If I write a book, I have done so based on learning from the many books I have read, but I have not violated copyright. Neither have the various art generators or AI text tools, or CoPilot.

    It's one thing for a layman to mis-understand and believe the things store everything they learned from and do a cut-paste job, UNLESS they want to take legal action based on that woeful mis-understanding.

    This is right up there with idiot corporations claiming #include "stdio.h" is a copyright violation.

    I say this as an anything but a fan of MS.

    Note that in practice, CoPilot doesn't seem to be all that good (even marketing only claims it to produce the right code 50% of the time) and it sounds like even an improved version is a great way to introduce a nearly ubiquitous exploitable security flaw.

    Starting Score:    1  point
    Moderation   +2  
       Insightful=1, Interesting=1, Total=2
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 3, Interesting) by shrewdsheep on Friday January 06 2023, @05:24PM (6 children)

    by shrewdsheep (5215) Subscriber Badge on Friday January 06 2023, @05:24PM (#1285505)

    Your point is well taken and I tend to agree. Just to point out the counter-argument: Language models tend to be so good because they are in part nearest-neighbor learners. They literally copy portions from the training data over to the output which can be deduced from mistakes they make. Certainly this argument is not water-tight as there is no investigation (to my knowledge) showing how widespread this behavior is for any response, but it can certainly be posited that this behavior does happen. I am actually surprised RMS has not yet jumped up to claim all auto-pilot code is GPL.

    • (Score: 3, Interesting) by sjames on Friday January 06 2023, @05:43PM (5 children)

      by sjames (2882) on Friday January 06 2023, @05:43PM (#1285510) Journal

      I have heard the counter-argument, but both programming and natural language are filled with pat phrases. For example, I'll bet you read "Your point is well taken" at some point before the first time you said or wrote it. That's not an accusation, it's just how language works. It's also to be expected of a synthetic neural network.

      The person who wrote the code that CoPilot learned from probably did so because their own naturally occurring neural net distilled down many variants that it was exposed to and identified that particular variation as a canonical phrase.

      I'm almost 100% certain that I have written a line of code at some point in my career that was identical to a line someone else wrote and that neither of us is aware of it, simply because it was a good concise way to express the thought.

      • (Score: 2, Interesting) by shrewdsheep on Friday January 06 2023, @05:52PM (4 children)

        by shrewdsheep (5215) Subscriber Badge on Friday January 06 2023, @05:52PM (#1285512)

        Indeed one line wouldn't qualify and it would be impossible to find the primordial line anyhow. This is precisely what will be tested in court: how many lines qualify as plagiarism. For me, it would be about 10 lines of code but it is really statistics: how many lines make a unique snippet of code.

        • (Score: 4, Interesting) by HiThere on Friday January 06 2023, @06:13PM

          by HiThere (866) on Friday January 06 2023, @06:13PM (#1285518) Journal

          It probably will end up being something that stupid, but that's a really stupid measure of whether anything significant was copied. And even more of whether anything significant that was original to the author was copied.

          And if it's successful, Knuth can sue every programmer in existence for their entire worth.

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 1, Informative) by Anonymous Coward on Friday January 06 2023, @06:44PM

          by Anonymous Coward on Friday January 06 2023, @06:44PM (#1285524)

          > about 10 lines of code

          Or about a half-line of APL...

        • (Score: 3, Interesting) by turgid on Friday January 06 2023, @09:30PM

          by turgid (4318) Subscriber Badge on Friday January 06 2023, @09:30PM (#1285564) Journal

          Something that troubles me is the concept of accessors in OOP languages, getters and setters. They're boiler plate, yet they consume many lines of code. Does copying one of those constitute plagiarism? My next questions is "What have we been doing for the last 40 years?" Why don't OOP languages provide operators for this purpose? Why are we writing this code by hand?

        • (Score: 3, Interesting) by RS3 on Sunday January 08 2023, @05:27AM

          by RS3 (6367) on Sunday January 08 2023, @05:27AM (#1285790)

          All this is making me wonder: are they looking at source code? Or object / binary executable? Cause you could copy binary, toss in some nops here and there, recalculate checksum, and I think it'd be difficult to detect? But make the source look very different. I dunno, it's a mess; there are no easy or simple answers.

  • (Score: 2) by SomeRandomGeek on Friday January 06 2023, @06:01PM (1 child)

    by SomeRandomGeek (856) on Friday January 06 2023, @06:01PM (#1285514)

    You need to think about it legally rather than technically. I would like to hear Microsoft's legal theory of the case. Do they think that an AI trained from data is not a derived work of that data? Do they think that their AI is based on so many different pieces of stolen code that attribution can't be traced to any one source and therefore they don't owe anyone anything? Did they hide a license to do this in the GitHub ToS? Do they think they have deeper pockets and they can wait out any challenge regardless of its validity? The details matter. Legal stuff is funny that way.

    • (Score: 4, Insightful) by sjames on Friday January 06 2023, @06:27PM

      by sjames (2882) on Friday January 06 2023, @06:27PM (#1285520) Journal

      How about if an AI trained on publicly available source code is infringing copyright than so is literally every single human programmer who has ever lived except Lady Ada. Extending to other copyrightable works, it's copyright infringement all the way down.

      And since that is literally the only known way of learning to write or code, we can either return to the caves (but NO DRAWING!) or decide perhaps this is not actually copyright violation.

      As an amusing note, just imagine how many pat legal phrases have been ruthlessly copied without attribution in the court filing!

  • (Score: 2) by mcgrew on Friday January 06 2023, @08:40PM (4 children)

    by mcgrew (701) <publish@mcgrewbooks.com> on Friday January 06 2023, @08:40PM (#1285549) Homepage Journal

    Yet another reason I don't believe that computer code should be covered by copyright at all. That despite (because of?) the fact that I registered two copyrights for computer programs four decades ago, neither of which can be run on any existing computer today. I'll still hold those copyrights ninety five years after I'm buried.

    You can't copyright a food recipe, and a computer program is a recipe of sorts, unlike a book or a painting. You can't copyright a dance. I've never seen any reason, let alone a good one, why computer programs should be covered by copyright.

    Why not patents? I will agree that they're too expensive and have way too much bureaucracy, but that's a completely different set of problems.

    --
    Impeach Donald Saruman and his sidekick Elon Sauron
    • (Score: 4, Insightful) by maxwell demon on Saturday January 07 2023, @06:44AM (3 children)

      by maxwell demon (1608) on Saturday January 07 2023, @06:44AM (#1285628) Journal

      Why not patents?

      Patents are much worse than copyright for software. With patents, you are in violation even if you provably never saw the patent and came up with the same solution independently.

      The only advantage of patents is that their duration is not as excessively long as copyright. But the solution to that is to reduce copyright to a reasonable time; that would not only benefit software.

      --
      The Tao of math: The numbers you can count are not the real numbers.
      • (Score: 2) by mcgrew on Saturday January 07 2023, @02:39PM (2 children)

        by mcgrew (701) <publish@mcgrewbooks.com> on Saturday January 07 2023, @02:39PM (#1285679) Homepage Journal

        The only advantage of patents is that their duration is not as excessively long as copyright. But the solution to that is to reduce copyright to a reasonable time

        About as easy as flying to the moon by flapping your arms in our plutocratic society.

        --
        Impeach Donald Saruman and his sidekick Elon Sauron
        • (Score: 2) by maxwell demon on Saturday January 07 2023, @05:00PM (1 child)

          by maxwell demon (1608) on Saturday January 07 2023, @05:00PM (#1285705) Journal

          Well, still easier than getting completely rid of copyright, don't you think?

          --
          The Tao of math: The numbers you can count are not the real numbers.
          • (Score: 3, Interesting) by mcgrew on Sunday January 08 2023, @01:24AM

            by mcgrew (701) <publish@mcgrewbooks.com> on Sunday January 08 2023, @01:24AM (#1285755) Homepage Journal

            Actually, the law now hardly qualifies as copyright, which was never about copying, it was about publishing. In America, for example, foreign works' copyrights were legally ignored, so American authors simply couldn't get published. Congress changed copyright to cover foreign works just to get Americans published.

            Real copyright isn't evil, present copyright is simply ridiculous. 25 years ago it had a 20 year monopoly as opposed to the author's lifetime plus 95 years like now.

            --
            Impeach Donald Saruman and his sidekick Elon Sauron