Slash Boxes

SoylentNews is people

posted by janrinok on Friday October 21 2022, @07:04PM   Printer-friendly
from the unintended-consequences dept.

GitHub Copilot may steer Microsoft into a copyright lawsuit:

GitHub Copilot – a programming auto-suggestion tool trained from public source code on the internet – has been caught generating what appears to be copyrighted code, prompting an attorney to look into a possible copyright infringement claim.

On Monday, Matthew Butterick, a lawyer, designer, and developer, announced he is working with Joseph Saveri Law Firm to investigate the possibility of filing a copyright claim against GitHub. There are two potential lines of attack here: is GitHub improperly training Copilot on open source code, and is the tool improperly emitting other people's copyrighted work – pulled from the training data – to suggest code snippets to users?

Butterick has been critical of Copilot since its launch. In June he published a blog post arguing that "any code generated by Copilot may contain lurking license or IP violations," and thus should be avoided.

That same month, Denver Gingerich and Bradley Kuhn of the Software Freedom Conservancy (SFC) said their organization would stop using GitHub, largely as a result of Microsoft and GitHub releasing Copilot without addressing concerns about how the machine-learning model dealt with different open source licensing requirements.

Copilot's capacity to copy code verbatim, or nearly so, surfaced last week when Tim Davis, a professor of computer science and engineering at Texas A&M University, found that Copilot, when prompted, would reproduce his copyrighted sparse matrix transposition code.

Asked to comment, Davis said he would prefer to wait until he has heard back from GitHub and its parent Microsoft about his concerns.

In an email to The Register, Butterick indicated there's been a strong response to news of his investigation.

"Clearly, many developers have been worried about what Copilot means for open source," he wrote. "We're hearing lots of stories. Our experience with Copilot has been similar to what others have found – that it's not difficult to induce Copilot to emit verbatim code from identifiable open source repositories. As we expand our investigation, we expect to see more examples.

"But keep in mind that verbatim copying is just one of many issues presented by Copilot. For instance, a software author's copyright in their code can be violated without verbatim copying. Also, most open-source code is covered by a license, which imposes additional legal requirements. Has Copilot met these requirements? We're looking at all these issues."

Spokespeople for Microsoft and GitHub were unable to comment for this article. However, GitHub's documentation for Copilot warns that the output may contain "undesirable patterns" and puts the onus of intellectual property infringement on the user of Copilot. That is to say, if you use Copilot to auto-complete code for you and you get sued, you were warned. That warning implies that the potential for Copilot to produce copyrighted code was not unanticipated.

[...] "Obviously, it's ironic that GitHub, a company that built its reputation and market value on its deep ties to the open source community, would release a product that monetizes open source in a way that damages the community. On the other hand, considering Microsoft's long history of antagonism toward open source, maybe it's not so surprising. When Microsoft bought GitHub in 2018, a lot of open source developers – me included – hoped for the best. Apparently that hope was misplaced."

Original Submission

Related Stories

Microsoft, GitHub, and OpenAI Sued for $9B in Damages Over Piracy 51 comments

As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.

The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.

"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.

Besides open-source licenses and DMCA (§ 1202, which for­bids the removal of copy­right-man­age­ment infor­ma­tion), the lawsuit alleges violation of GitHub's terms of ser­vice and pri­vacy poli­cies, the Cal­i­for­nia Con­sumer Pri­vacy Act (CCPA), and other laws.

The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– Fraud.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Negligence.
– Civil conspiracy.
– Declaratory relief.

Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.

(2022) GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit
(2022) Give Up GitHub: The Time Has Come!
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground

Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Touché) by MostCynical on Friday October 21 2022, @08:22PM

    by MostCynical (2589) on Friday October 21 2022, @08:22PM (#1277785) Journal

    a lot of open source developers – me included – hoped for the best

    there is no "hope", there is only delusion. This is Microsoft.

    "I guess once you start doubting, there's no end to it." -Batou, Ghost in the Shell: Stand Alone Complex
  • (Score: 2, Insightful) by MonkeypoxBugChaser on Saturday October 22 2022, @01:10AM (2 children)

    by MonkeypoxBugChaser (17904) on Saturday October 22 2022, @01:10AM (#1277807) Homepage Journal

    I hate MS and github, but the premise sounds cool and this reads like a bunch of butt hurt from tightwad people. This is the ultimate open source if it can actually be open and learn from everyone's code not just scrape public repos.

    • (Score: 5, Insightful) by MostCynical on Saturday October 22 2022, @05:01AM (1 child)

      by MostCynical (2589) on Saturday October 22 2022, @05:01AM (#1277826) Journal

      tightwad.. open source...huh?

      people who *might* make a small amount of money from a program (very small), who maintain code as a labor of love, are tightwads?

      the issues are in the article, including the broken licensing - you get code that is GPL and put it in proprietary software...

      open is fine - only if it remains open.

      "I guess once you start doubting, there's no end to it." -Batou, Ghost in the Shell: Stand Alone Complex
      • (Score: 5, Interesting) by tekk on Saturday October 22 2022, @04:16PM

        by tekk (5704) Subscriber Badge on Saturday October 22 2022, @04:16PM (#1277867)

        It's not even just the GPL license it violates. Practically every permissive license includes an attribution requirement which would be broken. Inclusion of apache v2 code would also be funny despite being permissive because that includes a patent grant, oops.

  • (Score: 3, Funny) by krishnoid on Saturday October 22 2022, @02:03AM

    by krishnoid (1156) on Saturday October 22 2022, @02:03AM (#1277812)

    What if you use one of its transformative [] features (warning, loud explosion at end).

  • (Score: 4, Touché) by tekk on Saturday October 22 2022, @04:14PM

    by tekk (5704) Subscriber Badge on Saturday October 22 2022, @04:14PM (#1277866)

    Come on, it's pretty obvious isn't it?

    All Copilot has to do is only scrape repos with approved licenses then automatically generate and check-in a multi-megabyte file containing attribution information for every single github repo it used in its training data :^)

  • (Score: 2) by VLM on Saturday October 22 2022, @05:36PM

    by VLM (445) on Saturday October 22 2022, @05:36PM (#1277873)

    would reproduce his copyrighted sparse matrix transposition code

    I could get sued just as bad a Copilot if I were writing linear algebra stuff.

    Just because you "can" copyright "hello_world.c" doesn't mean nobody else, including an AI, can re-implement it if its trivial enough.

    It is an interesting legal case simply to figure you who's responsible for the I.P. infringement if an AI does the infringement. The AI author? The idiot who trusted it? Whomever pays the AI's power bill? Whomever "owns" the non-human intelligence? Maybe nobody if we define something trivial enough for an AI to implement it as uncopyrightable by nature?