from the 17-USC-§§-1201-1205 dept.
As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.
The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.
"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.
Besides open-source licenses and DMCA (§ 1202, which forbids the removal of copyright-management information), the lawsuit alleges violation of GitHub's terms of service and privacy policies, the California Consumer Privacy Act (CCPA), and other laws.
The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Civil conspiracy.
– Declaratory relief.
Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.
The Copilot tool has been trained on mountains of publicly available code
[...] When GitHub announced Copilot on June 29, the company said that the algorithm had been trained on publicly available code posted to GitHub. Nat Friedman, GitHub’s CEO, has written on forums like Hacker News and Twitter that the company is legally in the clear. “Training machine learning models on publicly available data is considered fair use across the machine learning community,” the Copilot page says.
But the legal question isn’t as settled as Friedman makes it sound — and the confusion reaches far beyond just GitHub. Artificial intelligence algorithms only function due to massive amounts of data they analyze, and much of that data comes from the open internet. An easy example would be ImageNet, perhaps the most influential AI training dataset, which is entirely made up of publicly available images that ImageNet creators do not own. If a court were to say that using this easily accessible data isn’t legal, it could make training AI systems vastly more expensive and less transparent.
Despite GitHub’s assertion, there is no direct legal precedent in the US that upholds publicly available training data as fair use, according to Mark Lemley and Bryan Casey of Stanford Law School, who published a paper last year about AI datasets and fair use in the Texas Law Review.
[...] And there are past cases to support that opinion, they say. They consider the Google Books case, in which Google downloaded and indexed more than 20 million books to create a literary search database, to be similar to training an algorithm. The Supreme Court upheld Google’s fair use claim, on the grounds that the new tool was transformative of the original work and broadly beneficial to readers and authors.
Those who forget history often inadvertently repeat it. Some of us recall that twenty-one years ago, the most popular code hosting site, a fully Free and Open Source (FOSS) site called SourceForge, proprietarized all their code — never to make it FOSS again. Major FOSS projects slowly left SourceForge since it was now, itself, a proprietary system, and antithetical to FOSS. FOSS communities learned that it was a mistake to allow a for-profit, proprietary software company to become the dominant FOSS collaborative development site.
SourceForge slowly collapsed after the DotCom crash, and today, SourceForge is more advertising link-bait than it is code hosting. We learned a valuable lesson that was a bit too easy to forget — especially when corporate involvement manipulates FOSS communities to its own ends. We now must learn the SourceForge lesson again with Microsoft's GitHub.
GitHub has, in the last ten years, risen to dominate FOSS development. They did this by building a user interface and adding social interaction features to the existing Git technology. (For its part, Git was designed specifically to make software development distributed without a centralized site.) In the central irony, GitHub succeeded where SourceForge failed: they have convinced us to promote and even aid in the creation of a proprietary system that exploits FOSS. GitHub profits from those proprietary products (sometimes from customers who use it for problematic activities).
Specifically, GitHub profits primarily from those who wish to use GitHub tools for in-house proprietary software development. Yet, GitHub comes out again and again seeming like a good actor — because they point to their largess in providing services to so many FOSS endeavors. But we've learned from the many gratis offerings in Big Tech: if you aren't the customer, you're the product. The FOSS development methodology is GitHub's product, which they've proprietarized and repackaged with our active (if often unwitting) help.
GitHub Copilot – a programming auto-suggestion tool trained from public source code on the internet – has been caught generating what appears to be copyrighted code, prompting an attorney to look into a possible copyright infringement claim.
On Monday, Matthew Butterick, a lawyer, designer, and developer, announced he is working with Joseph Saveri Law Firm to investigate the possibility of filing a copyright claim against GitHub. There are two potential lines of attack here: is GitHub improperly training Copilot on open source code, and is the tool improperly emitting other people's copyrighted work – pulled from the training data – to suggest code snippets to users?
Butterick has been critical of Copilot since its launch. In June he published a blog post arguing that "any code generated by Copilot may contain lurking license or IP violations," and thus should be avoided.
That same month, Denver Gingerich and Bradley Kuhn of the Software Freedom Conservancy (SFC) said their organization would stop using GitHub, largely as a result of Microsoft and GitHub releasing Copilot without addressing concerns about how the machine-learning model dealt with different open source licensing requirements.
Copilot's capacity to copy code verbatim, or nearly so, surfaced last week when Tim Davis, a professor of computer science and engineering at Texas A&M University, found that Copilot, when prompted, would reproduce his copyrighted sparse matrix transposition code.
Asked to comment, Davis said he would prefer to wait until he has heard back from GitHub and its parent Microsoft about his concerns.
Over the past year, generative AI has kicked off a wave of existential dread over potential machine-fueled job loss not seen since the advent of the industrial revolution. On Tuesday, Netflix reinvigorated that fear when it debuted a short film called Dog and Boy that utilizes AI image synthesis to help generate its background artwork.
Directed by Ryotaro Makihara, the three-minute animated short follows the story of a boy and his robotic dog through cheerful times, although the story soon takes a dramatic turn toward the post-apocalyptic. Along the way, it includes lush backgrounds apparently created as a collaboration between man and machine, credited to "AI (+Human)" in the end credit sequence.
[...] Netflix and the production company WIT Studio tapped Japanese AI firm Rinna for assistance with generating the images. They did not announce exactly what type of technology Rinna used to generate the artwork, but the process looks similar to a Stable Diffusion-powered "img2img" process than can take an image and transform it based on a written prompt.
ChatGPT Can't be Credited as an Author, Says World's Largest Academic Publisher
90% of Online Content Could be 'Generated by AI by 2025,' Expert Says
Getty Images Targets AI Firm For 'Copying' Photos
Controversy Erupts Over Non-consensual AI Mental Health Experiment
Microsoft's New AI Can Simulate Anyone's Voice With Three Seconds of Audio
AI Everything, Everywhere
Microsoft, GitHub, and OpenAI Sued for $9B in Damages Over Piracy
Adobe Stock Begins Selling AI-Generated Artwork
AI Systems Can't Patent Inventions, US Federal Circuit Court Confirms
Last week, Microsoft researchers announced an experimental framework to control robots and drones using the language abilities of ChatGPT, a popular AI language model created by OpenAI. Using natural language commands, ChatGPT can write special code that controls robot movements. A human then views the results and adjusts as necessary until the task gets completed successfully.
The research arrived in a paper titled "ChatGPT for Robotics: Design Principles and Model Abilities," authored by Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor of the Microsoft Autonomous Systems and Robotics Group.
In a demonstration video, Microsoft shows robots—apparently controlled by code written by ChatGPT while following human instructions—using a robot arm to arrange blocks into a Microsoft logo, flying a drone to inspect the contents of a shelf, or finding objects using a robot with vision capabilities.
To get ChatGPT to interface with robotics, the researchers taught ChatGPT a custom robotics API. When given instructions like "pick up the ball," ChatGPT can generate robotics control code just as it would write a poem or complete an essay. After a human inspects and edits the code for accuracy and safety, the human operator can execute the task and evaluate its performance.
In this way, ChatGPT accelerates robotic control programming, but it's not an autonomous system. "We emphasize that the use of ChatGPT for robotics is not a fully automated process," reads the paper, "but rather acts as a tool to augment human capacity."