OpenAI could be fined up to $150,000 for each piece of infringing content:
Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.
NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.
Neither OpenAI nor the Times immediately responded to Ars' request to comment.
If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.
Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711
Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815
(Score: 2) by bloodnok on Wednesday August 23 2023, @06:15PM (1 child)
Actually that's not quite right, and it's a conflation of copyright and licensing.
First off, the GPL says nothing about charging for GPL'd code. You can do it if someone wants to pay you and you keep to its terms.
Secondly the GPL is a license. It *allows* you to use the copyrighted code under a number of (reasonable IMHO) terms. If you breach those terms, your license is invalidated and you can now be sued for copyright and license infringement (if I undestand it correctly, I am not a lawyer, etc).
What I find confusing about this notion is that ChatGPT does not seem to have breached copyright by "reading" the articles, since the NYT allows reading. As far as I can tell, it has also not breached copyright by retaining a non-verbatim recollection of those articles (as fas as I know the original source articles are not stored in a cache, but form a kind of weighted networked dataset from which it would be impossible to separate the NYT articles from everthing else it has read.
Where it could breach copyright is when it is asked a question and quite reasonably makes a response that contains identical segments of text to those NYT articles. But that doesn't seem to be what the NYT is concerned about. It seems to be concerned that ChatGPT's learning is itself a breach of copyright.
The reason I find this hard to grok is that a human could also regurgitate such text based on similar learning. I have no problem with the generated text being subject to copyright, but I do have a problem with the learning being copyrightable. I have learned much (or too little, depending on who you ask) over the years and I like to think of this knowledge as mine.
Or to put it more succinctly, the NYT can bite my ass.
__
The Major
(Score: 2) by Mykl on Wednesday August 23 2023, @11:11PM
I think the NYT might have a case where they could claim that all ChatGPT output is a derivative work of their original material. Given that the source data has been put through a neural blender, it would be possible to argue that NYT material is present (in a derivative form) in every single ChatGPT output.
At least, that's what I'd argue if I was a lawyer.
I agree with others though - this is only an issue for them now that ChatGPT is a potential money machine.