GitHub Copilot may steer Microsoft into a copyright lawsuit:
GitHub Copilot – a programming auto-suggestion tool trained from public source code on the internet – has been caught generating what appears to be copyrighted code, prompting an attorney to look into a possible copyright infringement claim.
On Monday, Matthew Butterick, a lawyer, designer, and developer, announced he is working with Joseph Saveri Law Firm to investigate the possibility of filing a copyright claim against GitHub. There are two potential lines of attack here: is GitHub improperly training Copilot on open source code, and is the tool improperly emitting other people's copyrighted work – pulled from the training data – to suggest code snippets to users?
Butterick has been critical of Copilot since its launch. In June he published a blog post arguing that "any code generated by Copilot may contain lurking license or IP violations," and thus should be avoided.
That same month, Denver Gingerich and Bradley Kuhn of the Software Freedom Conservancy (SFC) said their organization would stop using GitHub, largely as a result of Microsoft and GitHub releasing Copilot without addressing concerns about how the machine-learning model dealt with different open source licensing requirements.
Copilot's capacity to copy code verbatim, or nearly so, surfaced last week when Tim Davis, a professor of computer science and engineering at Texas A&M University, found that Copilot, when prompted, would reproduce his copyrighted sparse matrix transposition code.
Asked to comment, Davis said he would prefer to wait until he has heard back from GitHub and its parent Microsoft about his concerns.
In an email to The Register, Butterick indicated there's been a strong response to news of his investigation.
"Clearly, many developers have been worried about what Copilot means for open source," he wrote. "We're hearing lots of stories. Our experience with Copilot has been similar to what others have found – that it's not difficult to induce Copilot to emit verbatim code from identifiable open source repositories. As we expand our investigation, we expect to see more examples.
"But keep in mind that verbatim copying is just one of many issues presented by Copilot. For instance, a software author's copyright in their code can be violated without verbatim copying. Also, most open-source code is covered by a license, which imposes additional legal requirements. Has Copilot met these requirements? We're looking at all these issues."
Spokespeople for Microsoft and GitHub were unable to comment for this article. However, GitHub's documentation for Copilot warns that the output may contain "undesirable patterns" and puts the onus of intellectual property infringement on the user of Copilot. That is to say, if you use Copilot to auto-complete code for you and you get sued, you were warned. That warning implies that the potential for Copilot to produce copyrighted code was not unanticipated.
[...] "Obviously, it's ironic that GitHub, a company that built its reputation and market value on its deep ties to the open source community, would release a product that monetizes open source in a way that damages the community. On the other hand, considering Microsoft's long history of antagonism toward open source, maybe it's not so surprising. When Microsoft bought GitHub in 2018, a lot of open source developers – me included – hoped for the best. Apparently that hope was misplaced."
(Score: 3, Touché) by MostCynical on Friday October 21 2022, @08:22PM
there is no "hope", there is only delusion. This is Microsoft.
"I guess once you start doubting, there's no end to it." -Batou, Ghost in the Shell: Stand Alone Complex
(Score: 2, Insightful) by MonkeypoxBugChaser on Saturday October 22 2022, @01:10AM (2 children)
I hate MS and github, but the premise sounds cool and this reads like a bunch of butt hurt from tightwad people. This is the ultimate open source if it can actually be open and learn from everyone's code not just scrape public repos.
(Score: 5, Insightful) by MostCynical on Saturday October 22 2022, @05:01AM (1 child)
tightwad.. open source...huh?
people who *might* make a small amount of money from a program (very small), who maintain code as a labor of love, are tightwads?
the issues are in the article, including the broken licensing - you get code that is GPL and put it in proprietary software...
open is fine - only if it remains open.
"I guess once you start doubting, there's no end to it." -Batou, Ghost in the Shell: Stand Alone Complex
(Score: 5, Interesting) by tekk on Saturday October 22 2022, @04:16PM
It's not even just the GPL license it violates. Practically every permissive license includes an attribution requirement which would be broken. Inclusion of apache v2 code would also be funny despite being permissive because that includes a patent grant, oops.
(Score: 3, Funny) by krishnoid on Saturday October 22 2022, @02:03AM
What if you use one of its transformative [youtu.be] features (warning, loud explosion at end).
(Score: 4, Touché) by tekk on Saturday October 22 2022, @04:14PM
Come on, it's pretty obvious isn't it?
All Copilot has to do is only scrape repos with approved licenses then automatically generate and check-in a multi-megabyte file containing attribution information for every single github repo it used in its training data :^)
(Score: 2) by VLM on Saturday October 22 2022, @05:36PM
I could get sued just as bad a Copilot if I were writing linear algebra stuff.
Just because you "can" copyright "hello_world.c" doesn't mean nobody else, including an AI, can re-implement it if its trivial enough.
It is an interesting legal case simply to figure you who's responsible for the I.P. infringement if an AI does the infringement. The AI author? The idiot who trusted it? Whomever pays the AI's power bill? Whomever "owns" the non-human intelligence? Maybe nobody if we define something trivial enough for an AI to implement it as uncopyrightable by nature?