OpenAI could be fined up to $150,000 for each piece of infringing content:
Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.
NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.
Neither OpenAI nor the Times immediately responded to Ars' request to comment.
If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.
Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711
Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815
(Score: 4, Insightful) by Mykl on Wednesday August 23 2023, @08:04AM (7 children)
You can be sued under the GPL for using free product, then charging for it yourself without providing the source code.
This is what it all really hinges on - what conditions did the NYT / Sarah Silverman and others place upon their consumers? Both groups own copyright of their work (for better or worse), so ChatGPT really needs some form of agreement with them if they are going to sell a product that 'contains' that material.
I am wary of ChatGPT behaving like a Chinese manufacturing firm - using the intellectual property of their partners/clients (provided to them for the sole purpose of that partnership) to build a competing product in parallel is shady at best.
(Score: 2) by Username on Wednesday August 23 2023, @09:06AM
Hum. So someone who paid for chatgtp is using it to access the site? I thought the programmer just used the material to train the bot linguistically. Or is this bot just passing off the material as it's own? Can I repeat a joke at work?
(Score: 0) by Anonymous Coward on Wednesday August 23 2023, @09:18AM (3 children)
You misunderstand copyright. It controls the right to make copies. It's right there in the name.
If OpenAI are not making copies they are not infringing on copyright. Using something as training material does not infringe copyright. Showing someone (or someAI) a work you bought a copy of does not infringe copyright.
Unless you make a copy, you have not infringed copyright.
(Score: 5, Insightful) by r_a_trip on Wednesday August 23 2023, @10:23AM (2 children)
Except anything to do with computing inherently has to make a copy to do something. Loading material into RAM is already copying (inevitable) and under copyright you need permission for that. That is why software comes with a license or a public domain statement. Otherwise you can't load the program/data.
Even if OpenAI doesn't store anything concrete after the training, the fact they loaded other people's material into RAM constitutes a copy which is controlled by the rightsholder.
I think both The Times and Silverman are having their panties in a twist over nothing. ChatGPT outputs convincing garbage, but it can't write books, news articles, comedy, recipes or anything else that requires human skill for that matter. It can only output unoriginal boilerplate and it has definite, discernable patterns in what it puts out when getting prompted to write in a certain topic. Anyone can see after they poke ChatGPT around for a bit.
So this unfounded fear of being replaced by a bot in the future is idiotic for the foreseeable future. ChatGPT has no agency. It is a parrot. Bluntly put, you need to say "Polly want a cracker?" for it to spring into action and regurgitate "Yaah, Yaah!"
Or, and this might equally be the case, they smell easy money and want in on the action. Paying several millions for a non-exclusive, perpetual worldwide licenses to copyright holders to be able to continue using their training data probably sounds pretty good to OpenAI. At least they get to keep their first to market advantage with such a setup.
(Score: 2, Touché) by r_a_trip on Wednesday August 23 2023, @10:35AM
Upon further review, Sarah Silverman might be in massive trouble. ChatGPT puts out better and funnier garbage than she does. So I consider her replaced.
(Score: 0) by Anonymous Coward on Wednesday August 23 2023, @02:15PM
Under that interpretation, looking at it in a mirror is a violation of copyright. You are looking at an image that you did not buy.
Conversely, the Sony decision argues against a local copy (necessary to use) being a copyright violation.
(Score: 2) by bloodnok on Wednesday August 23 2023, @06:15PM (1 child)
Actually that's not quite right, and it's a conflation of copyright and licensing.
First off, the GPL says nothing about charging for GPL'd code. You can do it if someone wants to pay you and you keep to its terms.
Secondly the GPL is a license. It *allows* you to use the copyrighted code under a number of (reasonable IMHO) terms. If you breach those terms, your license is invalidated and you can now be sued for copyright and license infringement (if I undestand it correctly, I am not a lawyer, etc).
What I find confusing about this notion is that ChatGPT does not seem to have breached copyright by "reading" the articles, since the NYT allows reading. As far as I can tell, it has also not breached copyright by retaining a non-verbatim recollection of those articles (as fas as I know the original source articles are not stored in a cache, but form a kind of weighted networked dataset from which it would be impossible to separate the NYT articles from everthing else it has read.
Where it could breach copyright is when it is asked a question and quite reasonably makes a response that contains identical segments of text to those NYT articles. But that doesn't seem to be what the NYT is concerned about. It seems to be concerned that ChatGPT's learning is itself a breach of copyright.
The reason I find this hard to grok is that a human could also regurgitate such text based on similar learning. I have no problem with the generated text being subject to copyright, but I do have a problem with the learning being copyrightable. I have learned much (or too little, depending on who you ask) over the years and I like to think of this knowledge as mine.
Or to put it more succinctly, the NYT can bite my ass.
__
The Major
(Score: 2) by Mykl on Wednesday August 23 2023, @11:11PM
I think the NYT might have a case where they could claim that all ChatGPT output is a derivative work of their original material. Given that the source data has been put through a neural blender, it would be possible to argue that NYT material is present (in a derivative form) in every single ChatGPT output.
At least, that's what I'd argue if I was a lawyer.
I agree with others though - this is only an issue for them now that ChatGPT is a potential money machine.