On Friday, the Joseph Saveri Law Firm filed US federal class-action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of illegally using copyrighted material to train AI language models such as ChatGPT and LLaMA.
Other authors represented include Christopher Golden and Richard Kadrey, and an earlier class-action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.
[...] Authors claim that by utilizing "flagrantly illegal" data sets, OpenAI allegedly infringed copyrights of Silverman's book The Bedwetter, Golden's Ararat, and Kadrey's Sandman Slime. And Meta allegedly infringed copyrights of the same three books, as well as "several" other titles from Golden and Kadrey.
[...] Authors are already upset that companies seem to be unfairly profiting off their copyrighted materials, and the Meta lawsuit noted that any unfair profits currently gained could further balloon, as "Meta plans to make the next version of LLaMA commercially available." In addition to other damages, the authors are asking for restitution of alleged profits lost.
"Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by plaintiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation," Saveri and Butterick wrote in their press release.
Related Stories
OpenAI could be fined up to $150,000 for each piece of infringing content:
Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.
NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.
Neither OpenAI nor the Times immediately responded to Ars' request to comment.
If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.
Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711
Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815
(Score: 3, Interesting) by sigterm on Wednesday July 12 2023, @10:01AM (16 children)
I have some doubts about it being illegal to scrape material that's been published on the open Internet.
But I say Google and Meta should just remove this from their data sets. Sure, I will no longer be able to ask LLaMA or ChatGPT to do stuff like "rewrite the Declaration of Independence in the style of a painfully unfunny comedian," but I can live with that.
(Score: 4, Insightful) by canopic jug on Wednesday July 12 2023, @12:30PM (9 children)
It goes far beyond just scraping. The LLMs make a local copy, strip attribution and copyright information, and the regurgitate the result as original work. It's plagiarism as a service, but set up so that one can blame "the algorithm" long enough to confuse technologically inept legislators, lawyers, jurors, and judges.
Money is not free speech. Elections should not be auctions.
(Score: 5, Interesting) by looorg on Wednesday July 12 2023, @02:03PM (2 children)
That is in part what is so baffling about it all. Most, if not all, of these people involved in this LLM AI whatever come from the academic world, where referencing is everything. Anyone really working on this that doesn't have at least a Masters degree or a PhD? They might be used to trying to do things on the cheap and free. But you never skimp on referencing. After all being caught out as a plagiarist is basically a career ending offense in some part.
(Score: 3, Touché) by Freeman on Wednesday July 12 2023, @04:14PM (1 child)
Nothing to be baffled about $.$ vision.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 3, Insightful) by DeathMonkey on Wednesday July 12 2023, @07:16PM
Also, the people building the engine and the people scraping the web into an engine are different people.
(Score: 0) by Anonymous Coward on Wednesday July 12 2023, @02:25PM
Eh, just slap a "for personal use only: not for reselling or relicensing, or other professional use" at the bottom of the ChatGPT output page.
(Score: 4, Interesting) by ElizabethGreene on Wednesday July 12 2023, @03:30PM (3 children)
I don't think this is how it works.
Could you suggest a prompt that will reproduce a substantial portion of Silverman's work in the answer or in the open LLama model point to where in their work is present without attribution or copyright information? The model is a series of weights in hidden layers on a neural net, not terribly dissimilar to the varied strength of connections between neurons in our own noggins.
Following the idea of copying biology, a LLM learning a book by reading it, aka training on it, is very similar to when you or I read a book. I don't consider reading a book to be infringement IIF Meta, Google, or whoever trained the model had a legal copy of the source book. If I memorize the large portions of the content word-for-word and then reproduce it that would be an infringement, but the current LLMs struggle to do that.
A valid counterpoint here would be that the LLM is creating derivative work, but the copyright office says a machine can't create a work. Only the human operating it can.
Another valid counterpoint would be that the LLM *is* a derivative work, falling in the definition "such as a translation, [...], abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted." That's a reasonable assertion, and I think is the one the court will decide.
(Score: 3, Interesting) by Anonymous Coward on Wednesday July 12 2023, @04:40PM
I know that is how neural nets have been colloquially described since at least the 90s, but I don't think one can make a compelling case for that's how our brains work in general. These LLMs have only shown recent success because of the enormous number of connections and the unimaginable amount of data needed to train them, with perfect recall. Our brains seem to function much better using much less, that there are those who feel these massive weighted "neuron" brute force approaches are not the right path and that if/when AGI is achieved, it will be using much more modest hardware and software.
(Score: 3, Interesting) by Mykl on Wednesday July 12 2023, @11:22PM
I think this is the main thrust of the lawsuit - it's doubtful that the trainers can provide a receipt for Silverman's book, or for the other million plus books that have been fed into the machine. If they could show the receipts then there wouldn't be a lawsuit.
Assuming that the trainers have legal access to the source material, I would be fine with spitting out small exerpts (fair use), summaries or even getting the AI to write "in the style of".
(Score: 1) by shrewdsheep on Thursday July 13 2023, @12:53PM
All the weights represent a transformation of the data. It is just a mapping. All references to biology are but a mere vague analogy at this point (ReLU is not remotely what a neuron does). These models (transformations) have been shown to exhibit nearest-neighbor characteristics (i.e. outputting the nearest neighbor in the training data) in several cases and we discussed the google paper showing that some input images are almost completely stored in the network weights here (sorry couldn't find the story). While I agree that literal reproduction is infrequent, the unpredictable nature of such behavior lays burden of proof at the feet of the model IMO.
(Score: 2, Interesting) by Anonymous Coward on Thursday July 13 2023, @02:11AM
Yeah I'd be more convinced stuff like this should be legal if for example Microsoft didn't train Copilot on github stuff but on Microsoft's internal source code e.g. Windows, Office etc.
Then Microsoft is the one taking the risk of others accessing the Windows source code without Microsoft being able to claim copyright infringement...
In contrast now the GPL stuff is the copyright that could be infringed.
See also: https://www.theregister.com/2023/06/09/github_copilot_lawsuit/ [theregister.com]
(Score: 2) by Thexalon on Wednesday July 12 2023, @05:05PM (2 children)
It's not a crime to scrape data. It is potentially a copyright violation and thus a civil tort, depending in part on what you do with it afterwords. Especially since a lot of websites have a copyright notice somewhere on the page, which almost definitely got ignored by the scraping bots.
I'm no lawyer, but this sure seems like a kind of case that was guaranteed to happen eventually. And I could also imagine such a case being settled if the so-called-AI companies set up some sort of system of giving the creators of their source material a portion of whatever proceeds they're getting from what they're creating based on that source material (which could well be a "derivative work" under copyright law).
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
(Score: 3, Informative) by DeathMonkey on Wednesday July 12 2023, @07:22PM
In this case they're getting sued so it's civil already.
However, criminal copyright statutes exist as well so civil lawsuits are definitely not the only remedy. (for better or worse...)
(Score: 0) by Anonymous Coward on Thursday July 13 2023, @03:11AM
Yeah Microsoft seems fine with scraping GPL code and using it for Copilot but are they doing using the Windows, MS Office etc source code for Copilot?
(Score: 4, Interesting) by mcgrew on Thursday July 13 2023, @01:04AM (2 children)
I have some doubts
Do you? Are you lost, little one? Google is your friend, evil as it is. Publishing on the open internet does NOT invalidate a copyright. Where did you come up with such a ridiculous idea?
And FYI, the Declaration of Independence or the Constitution are NOT covered under copyright. Or recipes, dance, or clothing patterns. Educate yourself before you attempt to educate others.
Our nation is in deep shit, but it's illegal to say that on TV.
(Score: 2) by sigterm on Sunday July 16 2023, @06:49PM (1 child)
Where did you get the idea that I was arguing against copyright law? I'm not.
Leaving aside the obvious copyright infringement that is wholesale reproduction of unaltered content, which is not being argued here: There is such a thing as "fair use." Unless ChatGPT is using the material in a non-novel way, and/or are creating a (derivative) product that takes market share from the original content, they'll have a hard time arguing that their copyright is being violated.
Plaintiff: "Your Honor, defendant read/say the content we distributed on the Internet, and is now creating derivative works that only vaguely resemble the original!" Defendant: "Yes, we are." Judge: "That's perfectly allowed. Next case!"
(Score: 2) by mcgrew on Saturday July 22 2023, @09:04PM
How much and to what purpose? Even if it's a single sentence, if the author isn't credited, it's plagiarism. Fair use credits the original author. If he copies five paragraphs and puts "a passage from [name of work]:", or indented with a footnote after it is fair use. A sentence without credit is plagiarism, period. If the computer credits all those it copies with what it has copied, it may just be kosher. But I wouldn't bet on it.
Our nation is in deep shit, but it's illegal to say that on TV.
(Score: 2) by looorg on Wednesday July 12 2023, @11:51AM (11 children)
While a bit over the the top perhaps I do think that they have a point. A vital point of collecting data have always been to be able to reference it, as in telling where things came from and to then be able to give credit. That said if her, others, have their books or writings publicly available then they might be barking up the wrong tree. But otherwise you would have to have some kind of licensing deal or just I guess buy a copy of their book(s).
One would think it would also be some what vital to data health as in knowing where things came from so you can actually remove bad data. Still you don't need to know where it came from to delete it but it would help as you can then remove all data from said source as it was clearly bad.
But I assume she just doesn't want credit. She also wants to get paid. Which I guess is hard if you are already sharing it all for free. But if she doesn't and their works are somehow already still in there then I guess they got some explaining to do. As they have indeed stolen it.
For it to be plagiarism they would have to somehow claim credit for it. I don't think they are claiming credit for it. Not that I know of. They are just not giving any credit to anyone. It's just from the big old datablob of content that they claim to not know where it came from. Which as noted previously is bad for so many reasons.
(Score: 4, Insightful) by shrewdsheep on Wednesday July 12 2023, @12:09PM
To be fair, it is incredibly hard to control availability. If I come across something paywalled (paper/news/book) more often than not, a simple DDG will cough it up. I think the onus has to be on the scraper to prove legitimacy.
(Score: 4, Insightful) by HiThere on Wednesday July 12 2023, @01:50PM (9 children)
Given our idiotic copyright laws I think they should win the case. The real problem is with the copyright laws.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 3, Insightful) by ElizabethGreene on Wednesday July 12 2023, @03:35PM (1 child)
I couldn't agree more. It's time for a forklift upgrade here. They need to be reasonably time limited in a non-Disneyfied way, clear guidance on what is and is not fair use, and some level of foresight with how new technologies should be treated until such time as congress updates the law.
(Score: 4, Insightful) by DeathMonkey on Wednesday July 12 2023, @07:27PM
Time limited being the key feature here I think.
If they could just scrape everything older than 20 years and be in clear and objective compliance with copyright law they would be doing it already!
(Score: 3, Informative) by Thexalon on Wednesday July 12 2023, @05:08PM (6 children)
The real problem is that artists and authors and musicians and filmmakers need to eat and have a roof over their head, and the only way they can do so is if somebody has to pay them for their work. That's what copyright was supposed to do for them.
Of course, Disney et al have worked hard to turn copyright into something that protects Disney and not writers who work for Disney (for example), but there was at least some reasonableness behind the concept once.
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
(Score: 3, Interesting) by pTamok on Wednesday July 12 2023, @06:14PM (1 child)
Simple.
Copyrights can only be owned by natural humans, not corporations. Licences can only be non-exclusive.
(Score: 3, Insightful) by Joe Desertrat on Thursday July 13 2023, @01:03AM
To expand a bit on this, copyrights should only ever be able to be owned by the actual creators, with perhaps a brief period where death or incapacity might allow it to be transferred to an heir. Otherwise, if transferred or sold any copyright is voided. At any rate a copyright should not last for more than 12 years or so. Limited licensing for distribution should be allowed for a short period (3 years? 5 years? 7 years?) with maybe one 3 year renewal being allowed. In every case the original creator should be able to make full personal use of their own copyrighted work.
(Score: 3, Funny) by legont on Thursday July 13 2023, @12:52AM (3 children)
All those artists can make their living by open air live performances where tickets are sold for physical beings to attend.
As per their creations, they are free as birds for anybody to take.
Yes, they are not supposed to be rich. Rich makes bad art. Only poor - preferably near to death poor - makes good art.
"Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
(Score: 3, Touché) by mcgrew on Thursday July 13 2023, @01:10AM (2 children)
Only poor - preferably near to death poor - makes good art.
Says the man who knows absolutely nothing about any art form whatever. Yes, I was an art student, kid, half a century ago. Those who think they know everything are annoying to those who know nobody does.
Our nation is in deep shit, but it's illegal to say that on TV.
(Score: 2) by legont on Wednesday July 19 2023, @02:06AM (1 child)
Let me guess - you didn't make any significant art.
"Wealth is the relentless enemy of understanding" - John Kenneth Galbraith.
(Score: 2) by mcgrew on Saturday July 22 2023, @09:06PM
Those who have seen it would disagree.
Our nation is in deep shit, but it's illegal to say that on TV.
(Score: 5, Interesting) by Mojibake Tengu on Wednesday July 12 2023, @02:26PM (10 children)
But one day this will happen to movies.
Illegal or not, there will become one big pile of compressed data where all the movies will be stuffed learned in, and you'll ask the AI to synthesize 120 episodes of any stupidity you can imagine and talk out of that.
The End for all Hollywood.
Rust programming language offends both my Intelligence and my Spirit.
(Score: 4, Touché) by looorg on Wednesday July 12 2023, @02:51PM
In some part I look forward to that day so that I can get more episodes of Firefly etc. All the good shows that was cancelled season 1 or early due to them not having massmarket appeal with the dummies.
(Score: 4, Funny) by pTamok on Wednesday July 12 2023, @03:31PM (5 children)
Hey! Extra episodes of Fawlty Towers - what's not to like!
(No, I'm not serious.)
I'm sure I'll get to appreciate Beethoven's 23rd Symphony, Michaelangelo's Putin, da Vinci's Eiffel Tower, and Welles' sequel to Citizen Kane, and I'm looking forward to Shakespeare's next play. The outpouring of inspired cultural works is going to be incredible.
(Score: 2) by Freeman on Wednesday July 12 2023, @04:17PM
One might say that part of the inspirational aspect of those, is the fact that a person had reasons, motivations, and thought go into the creation of those works. AI output is regurgitated content.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 4, Funny) by istartedi on Wednesday July 12 2023, @08:19PM (2 children)
Sequel to Citizen Kane
Trailer: Rosebud's back, and this time it's personal.
Frightened man on bus: "How... how is this possible? It's destroying the city!"
Woman with haunted look in her eyes: It's an advanced mega-sled. Liquid metal. Extra flexible flyer. It came back... for me.
Appended to the end of comments you post. Max: 120 chars.
(Score: 2) by The Vocal Minority on Thursday July 13 2023, @04:05AM (1 child)
Woman with haunted look in her eyes, three arms, and six fingers on each hand: It's an advanced mega-sled. Liquid metal. Extra flexible flyer. It came back... for me.
(Score: 3, Funny) by istartedi on Thursday July 13 2023, @04:05PM
I can just picture her mouth warping as she tries to eat pizza.
Appended to the end of comments you post. Max: 120 chars.
(Score: 2) by KritonK on Thursday July 13 2023, @07:38AM
You may not be, but other people are [variety.com].
(Score: 0) by Anonymous Coward on Wednesday July 12 2023, @08:01PM
It seems like it's already happening with the writer's strike and it being Summer when a lot of stuff gets re-run anyway. Realty TV is just game shows, and the old networks are not even hiding it any more--they're even running traditional game shows in prime time now which is the next best thing to just giving up. I think an AI could easily generate even Jeopardy! questions, and Wheel of Fortune no problem. If the talent (hosts) don't join the writers, then those shows march on; but reality TV is an even better strike-breaking tool because you don't need real talent, just contestants and a camera crew. The contestants usually have no hope of getting a SAG card anyway, and scab camera crews aren't that hard to scare up and/or they'll just never put it on their resume. The sad thing is, people actually watch that shit so it makes money, lots of money because you don't have to negotiate with real talent that people like--no Lenos or Lettermans in that space, just nobodies who think they're somebody.
(Score: 2) by mcgrew on Thursday July 13 2023, @01:13AM
There's a writer's strike about that, and it looks like the actors will strike, too.
Our nation is in deep shit, but it's illegal to say that on TV.
(Score: 2, Insightful) by MonkeypoxBugChaser on Friday July 14 2023, @09:17PM
I haven't watched a single piece of hollywood media in over a year. I don't even want it for free. Waste of space on my hard drive.
The actors can go ahead and strike. I won't miss them. AI generated text is better than their professionally written "stories".
(Score: 0, Offtopic) by dwilson98052 on Wednesday July 12 2023, @03:23PM
....freakasaurus
(Score: 5, Insightful) by Anonymous Coward on Wednesday July 12 2023, @03:53PM (2 children)
There's been a trend over the last couple decades of adding Techno Sauce (TM) to avoid laws and/or torts.
Uber -- "ride sharing" == Taxi licensing violation.
AirBnB -- "home sharing" == Short term occupancy code violation
YouTube -- "user generated videos" == massive opportunity for copyright violations, rights holders play whack-a-mole
I've referred to Uber in particular as an "Exchange-Traded Criminal Syndicate", but ETCS doesn't have much a ring to it, and a lot of these companies are private anyway.
The message is clear though. There's a model here.
1. Pick an illegal or tort act.
2. Sprinkle "app" on it.
3. Profit! (enough to defend against suits and/or buy legislators)
(Score: 1) by dwilson98052 on Wednesday July 12 2023, @04:07PM
1. Collect underpants
2. ?
3. Profit
(Score: 2) by Thexalon on Wednesday July 12 2023, @05:09PM
No, that's called "disrupting the market" you see.
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
(Score: 3, Funny) by VLM on Wednesday July 12 2023, @05:44PM
She's got a point, if the bot tells a joke that's not funny, it probably is one of hers, because she only tells jokes that aren't funny.
(Score: 3, Interesting) by bzipitidoo on Wednesday July 12 2023, @06:26PM (2 children)
Many authors refuse to get it because, in the words of the famous author Upton Sinclair: “It is difficult to get a man to understand something, when his salary depends on his not understanding it.” Not only has intellectual property law become one of the biggest obstacles to progress, it feeds peoples' possessiveness and fears of loss. Many authors feel they're entitled to not just money, but considerable control over the uses others may make of their efforts. Had to smack them down, hard, to make it very clear that parody is allowed. Currently, parody is pretty well accepted by all.
The worst damage has been to the stories. So many stories have plot elements of dramatic loss that are based upon the false premise that knowledge is very very hard to preserve. It is particularly ironic to encounter this element in SF or the story lines of computer games. But it's all over the place. Fantasy, maybe, could be excused when it is based upon medieval technology when copying was indeed laborious, and yet fantasies often have magic that most certainly should be able to handle that. One must not think too hard about the contradictions typical of the magic in such stories. However, many share common features of somehow placing people above magic, far more often the users of such power rather than the victims of it. For instance, in Harry Potter it is asserted that it's impossible to make a person fall in love with magic, magic can only at most create an infatuation. It also is impossible to create food. Why?? All the other crazy things magic can do, but it can't do those things? I have no doubt that centuries from now, scholars of 20th century fiction will be well aware of this predilection to make melodrama out of the loss of knowledge that is patently ridiculously easy to avoid. One of the worst cases is the Two Trees in the Silmarillion. Seeds basically are the knowledge of how to grow a plant. But those Two Trees are each one of a kind that can't reproduce like every other freaking form of life almost ever, no. They can produce offspring but those offspring are hugely lessened in all ways, and why? Because the author says so, and that because that makes their loss more dramatic. Ugh. Even today, we're still somewhat hung over about "lost secrets of the ancients" when much knowledge was indeed lost with the fall of the Roman Empire. The reason the Middle Ages is called "Middle" is in recognition of it being (mostly for Europe, of course) a period of backwardness, a technological low between the ancient world and the modern world. Melodrama over easily avoided loss of knowledge might even be the top feature related to students on the first day of their first class on the subject of 20th century fiction.
I have not heard that anyone has been paid in appreciation of and to encourage further participation in online forums such as this one. Perhaps we should be? Instead, some of us donate to help out with the expenses incurred by running the forum. Anyway, payment for participation has always had a stench of corruption, with the recipients of pay too often being employed for nefarious purposes, such as, to spread propaganda. I've often seen posts full of FUD that intimates that libre software is unreliable, unmaintained, unpolished, lacking in functionality, etc. Money warps people perhaps worse that copyright warps stories.
(Score: 3, Insightful) by Freeman on Wednesday July 12 2023, @07:22PM
Given enough time and magnitude of disaster, it's possible to lose vast amounts of knowledge. Whether in a Sci-Fi or Fantasy setting, I find it plausible enough to not trash all over my suspension of disbelief.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2) by DeathMonkey on Wednesday July 12 2023, @07:31PM
On the plus side her salary is going to depend on a deeper understanding of the issues moving forward!
I think that's a trite quote, though. Just because an author has a different opinion of how copyright should work doesn't mean they misunderstand the concept.