The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"
Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.
The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."
[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.
"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.
"Settled copyright law protects our journalism and content. If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission. They have not done so."
[...] OpenAI has tried to allay news publishers concerns. In December, the company announced a partnership with Axel Springer — the parent company of Business Insider, Politico, and European outlets Bild and Welt — which would license its content to OpenAI in return for a fee.
Also at CNBC and The Guardian.
Previously:
- Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
- Microsoft, GitHub, and OpenAI Sued for $9B in Damages Over Piracy
In August, word leaked out that The New York Times was considering joining the growing legion of creators that are suing AI companies for misappropriating their content. The Times had reportedly been negotiating with OpenAI regarding the potential to license its material, but those talks had not gone smoothly. So, eight months after the company was reportedly considering suing, the suit has now been filed.
The Times is targeting various companies under the OpenAI umbrella, as well as Microsoft, an OpenAI partner that both uses it to power its Copilot service and helped provide the infrastructure for training the GPT Large Language Model. But the suit goes well beyond the use of copyrighted material in training, alleging that OpenAI-powered software will happily circumvent the Times' paywall and ascribe hallucinated misinformation to the Times.
Journalism is expensive
The suit notes that The Times maintains a large staff that allows it to do things like dedicate reporters to a huge range of beats and engage in important investigative journalism, among other things. Because of those investments, the newspaper is often considered an authoritative source on many matters.
All of that costs money, and The Times earns that by limiting access to its reporting through a robust paywall. In addition, each print edition has a copyright notification, the Times' terms of service limit the copying and use of any published material, and it can be selective about how it licenses its stories. In addition to driving revenue, these restrictions also help it to maintain its reputation as an authoritative voice by controlling how its works appear.
The suit alleges that OpenAI-developed tools undermine all of that. "By providing Times content without The Times's permission or authorization, Defendants' tools undermine and damage The Times's relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue," the suit alleges.
Part of the unauthorized use The Times alleges came during the training of various versions of GPT. Prior to GPT-3.5, information about the training dataset was made public. One of the sources used is a large collection of online material called "Common Crawl," which the suit alleges contains information from 16 million unique records from sites published by The Times. That places the Times as the third most references source, behind Wikipedia and a database of US patents.
OpenAI no longer discloses as many details of the data used for training of recent GPT versions, but all indications are that full-text NY Times articles are still part of that process. [...] Expect access to training information to be a major issue during discovery if this case moves forward.
Not just training
A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times' suite goes well beyond that to show how the material ingested during training can come back out during use. "Defendants' GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples," the suit alleges.
As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.
The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.
"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.
Besides open-source licenses and DMCA (§ 1202, which forbids the removal of copyright-management information), the lawsuit alleges violation of GitHub's terms of service and privacy policies, the California Consumer Privacy Act (CCPA), and other laws.
The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– Fraud.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Negligence.
– Civil conspiracy.
– Declaratory relief.
Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.
Previously:
(2022) GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit
(2022) Give Up GitHub: The Time Has Come!
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground
OpenAI could be fined up to $150,000 for each piece of infringing content:
Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.
NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.
Neither OpenAI nor the Times immediately responded to Ars' request to comment.
If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.
Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711
Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815
(Score: 2) by Rosco P. Coltrane on Friday December 29, @03:24PM
There. FTFY.
The cloud isn't about providing online services: it's about collecting as much private data as possible and monetizing it. It's always been about that.
The New York Times is absolutely right: it is a business model based on massive copyright infringement. But it's wrong on two things: it's not new, and it's not just Microsoft and OpenAI. It's been going on for decades, and all Big Data players essentially owe their very existence to the business of exploiting data they have no right to exploit.
The difference between then and now is that the data they had no right to exploit wasn't exploited directly: it was digested and used indirectly for the purpose of advertisement. For example, when Google's surveillance collective has your medical file because your healthcare provider put your medical data in their cloud, and it knows you have some disease, and you keep getting advertisement more or less closely related to that disease, you have a hunch that Google is using data it shouldn't be using but you can't prove it.
With AI, you can prove the infringement clear as day: chat with the stupid bot long enough and it will regurgitate your own data back to you verbatim. That's the difference.
(Score: 3, Interesting) by Runaway1956 on Friday December 29, @03:37PM (2 children)
MS is pretty damned big. AI is bigger than Microsoft - almost everyone is investing in it. NYT has a lot of legal firepower to bring to bear. Time to sit back, and watch the fireworks. On the one hand, we want to see current copyright law seriously crippled. On the other hand, we'd like to see AI wither on the vine and die. Maybe we'll get lucky, and the entire publishing world joins with NYT against all of AI, and they mutually extinguish each other.
In the aftermath, just maybe some reasonable legislation regarding copyright as well as AI are passed? Not much chance, but maybe. I can dream, can't I?
How do we get the advertising industry involved with all of this? They need to take some serious damage from it, somehow.
Through a Glass, Darkly -George Patton
(Score: 2) by looorg on Friday December 29, @03:59PM
> How do we get the advertising industry involved with all of this? They need to take some serious damage from it, somehow.
What the heck are "the copyrightable expression"? Is that slogans or what? I would gather the same thing tho, IF this works for the NYT then anything print, audio, visual will go after OpenAI and the others that have harvested data for their AI for a free payday. This will include advertisers. Certainly so if these "copyrightable expressions" are a thing, that sounds like something from advertisement. That said that will just be another payday for them and not actually a loss.
(Score: 0) by Anonymous Coward on Friday December 29, @03:59PM
There is zero chance with today's political climate that anything good will come out of
the latest tech money grab.
(Score: 2) by DannyB on Friday December 29, @04:24PM
If you don't want people or machines to read your content, or view your images . . .
DON'T PUT THEM ONLINE !!!
My new year's resolution for 2024 is to not make a new year's resolution for 2024 so I can't unintentionally break it.