https://mashable.com/article/penguin-random-house-ai-protections-copyright-page
https://www.theverge.com/2024/10/18/24273895/penguin-random-house-books-copyright-ai
PRH's changing of its copyright wording to combat AI training makes it the first of the Big Five publishers to take such an action against AI, at least publicly.
The clause also notes that Penguin Random House "expressly reserves this work from the text and data mining exception" in line with the European Union's laws.
In August, Penguin Random House published a statement saying that the publisher will "vigorously defend the intellectual property that belongs to our authors and artists."
Penguin Random House will amend their copyright notice with "no part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.".
Will it work? Have they just created more job for themselves trying to litigate to all the LLM trainers? How much is to much or enough for it to be distinct from their books or just not words other people have expressed to?
(Score: 4, Funny) by Anonymous Coward on Tuesday October 22, @04:34PM (1 child)
I was hoping this was about Linux.
(Score: 0) by Anonymous Coward on Tuesday October 22, @10:11PM
That would have been great. Hoping Linus comes out an blocks LLM in the kernel. Never going to happen, but one can dream ...
(Score: 4, Interesting) by VLM on Tuesday October 22, @04:58PM (2 children)
Heres some believed to be factual background, or at least I don't think any of it is false but its useful background to the topic:
Lots of people believe in the "no such thing as bad PR" theory. So even if this is legally untenable it's still getting their name in the news so they already won. Even if its never enforcible, assuming it IS unenforcible.
A lot of computer people think of law as machine code that will execute 100% reliably and reproducibly. IRL its more of a guideline of what you'll get away with.
Under fair use a lawyer could make a pretty good argument that someone bought a book for training that the AI is making review based commentary upon or "not intended to exploit" the work. You could train the AI in an abstract sense of "hey AI write a review of some book" and under fair use there's not a heck of a lot they could do about writing a review without banning reviews of books, I think.
There is/was a statute of limitations on copyright infringement on the books for three years. The supreme court reinterpreted various things to not have a time limit. I'm not saying this proves anything but its going to be weird. So buy a book, let it sit on a shelf in a warehouse for three years, THEN scan it, then train on it and plus or minus future re-interpretations there might be nothing they can do about it if they don't find out for several years. I'm not saying this is a solution now but you might see something like this in the future or as some kind of mediated solution to a future court case. "We won't train on anything newer than X years old" might be the settlement terms, or might not. So ask an AI for fictional modern Star Wars and it'll respond "who? what?" but ask for Aesop's fables reinterpreted and it'll do fine.
One problem with AI training is GIGO and training on modern ghostwritten by AI content might be a VERY bad idea in the long run, so training exclusively on the contents of copyright free Project Gutenberg type sources might be a very good idea both for the AI and as a weapon against multinational publishers. If the multinationals are not careful they might get marginalized out of relevance and then existence. See newspapers, scientific journals, etc. Sure they might scrape along an existence, ignored by almost everyone and culturally irrelevant ... of course isn't that the position of book publishers already in 2024? Hmm. If the AI is trained on material that fell out of copyright last century, there's not much a modern publisher can do about it by manipulating current copyright terms.
All you need is one judge to declare AI a life form that's mentally disabled and/or requires educational material as a sentient being, and you'd have to rewrite all the fair use laws if you want to implement "separate but equal" for AIs. Currently, for example, its big time illegal to try to prevent a blind person from using a screenreader. Its quite hopeless to enforce a copyright specifically designed to exclude academic review and quotation. All they need is one judge one time and the AI is on the loose.
(Score: 4, Insightful) by turgid on Tuesday October 22, @07:02PM
When someone finally realises that for any AI to be any good it needs to understand and to employ the Scientific Method, not to just regurgitate random sophistry, that's when we'll be in real trouble.
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 2) by ledow on Wednesday October 23, @07:58AM
"Lots of people believe in the "no such thing as bad PR" theory. So even if this is legally untenable it's still getting their name in the news so they already won. Even if its never enforcible, assuming it IS unenforcible."
Ask Mr RyanAir.
Every 6 months, regular as clockwork, he makes some barmy assertion (like having passengers stand through their flight, or their bags being sent separately, or charging them for using the toilet on the plane, etc.).
Bam, RyanAir is all over the news, associated with being "cheap". Sounds awful business practice. Actually works, because many people are prepared to accept "cheap" and it looks like they're trying to help you be even cheaper. And none of it actually comes to fruition, so they never actually catch any flak for all those assertion (most of which would be illegal anyway).
(Score: 2, Funny) by Billy the Mountain on Tuesday October 22, @05:01PM (1 child)
I think it's all well and good for Penguin to say this but I think they or someone would need to develop a way to detect that their intellectual property is being used and being redistributed. The two articles are short and don't seem to address this issue.
(Score: 2) by Reziac on Wednesday October 23, @03:00AM
That was kinda my question... if the AI is actually any good,how would they know what was used to train it??
And there is no Alkibiades to come back and save us from ourselves.
(Score: 0, Funny) by Anonymous Coward on Tuesday October 22, @05:58PM
Lisa, I would like to buy your penguin
(Score: 3, Insightful) by bzipitidoo on Wednesday October 23, @01:53AM (9 children)
Publishers are trying to turn back the clock, again. These organizations are not spreaders of knowledge, not educators. They merely operate machinery that copies knowledge. They aspire to be gatekeepers who charge high tolls. This anti-AI clause smacks of opportunism, using the panic over AI to further strengthen copyright.
I have grown quite tired of another plot element often found in SF and other genres: too easily created AI of genuine intelligence, the worst sort being accidental AI. For example, Skynet in the Terminator stories. Such stories suggests that creating intelligence is much easier than appears to be the case. Right now, not only can we not create intelligence, we are still pretty fuzzy about defining exactly what intelligence is! I sometimes wonder if the stunningly low complexity required to achieve Turing completeness has mislead us. I guess people think Conway's Game of Life might be enough to support intelligence, if only the universe can be made large and fast enough. I have doubts, and think other ingredients may be required. What ingredients, I don't know, but I suspect a cellular automata alone is not enough. Also, scientists thought it would be much easier for computers to get good at chess than it turned out to be, and that once this was accomplished, we'd have general AI. The latter has turned out to be very wrong. Chess computers appear to be the ultimate idiot savants, better than any human at chess, utterly clueless about anything whatsoever outside the universe of chess.
(Score: 2) by Reziac on Wednesday October 23, @03:04AM (2 children)
It has occurred to me that Stargate's replicators are Conway's Game of Life writ large.
And publishers are not in the publishing business to spread knowledge; they're in it to collect money in exchange for distributing words.
And there is no Alkibiades to come back and save us from ourselves.
(Score: 2) by mcgrew on Wednesday October 23, @07:56PM (1 child)
"And publishers are not in the publishing business to spread knowledge; they're in it to collect money in exchange for distributing words," he publishes on a free web site that contains no ads, and does so without remuneration. Do you guys even THINK about your words?
Posting on S/N is PUBLISHING. S/N is a publication. YOU are a publisher!
Our nation is in deep shit, but it's illegal to say that on TV.
(Score: 2) by Reziac on Friday October 25, @11:38PM
LOL. I should have been more specific. :D
And there is no Alkibiades to come back and save us from ourselves.
(Score: 2, Disagree) by ledow on Wednesday October 23, @08:09AM (2 children)
Copyright exists because of musicians protecting their work, hundreds of years ago, not mega-conglomerates protecting IP they stole from others.
Authors, artists and others all followed suit.
It's not turning back the clock, their clients are demanding it. Otherwise, they'd just publish their book online for free with a CC0 licence. You'll notice that professional authors don't, and yet budding authors will fall over themselves to sign up with a publisher.
Doing something solely for "the art of it" is unprofitable, and people don't tend to do it unless they're a) already rich, b) have another job elsewhere. Copyright is the sole protector of that industry and it's being turned into the bad guy. Even while, with the same breath, we talk about how open-source software works so well and the licences stop them being abused... the exact same laws facilitate the GPL etc. as that novel sitting on your bookshelf.
And the increasing modern trend, as well as the entire basis of modern AI (somehow!), is to just disregard copyright entirely because it's "new" and copyright is "old".
The only logical outcome is AI slop in everything, artists being cut out of the loop, and AI feeding on its own output only (because why would you write something only for an AI to copy it and pretend it made it?). It's literally already happening.
But that doesn't excuse the fact that copyright exists, is still law, and is the primary legal protection for all kinds of artists - from programmers to writers, artists to musicians, Hollywood to VR.
AGI doesn't exist. It's not even on the horizon. What we have is the same kinds of AI as we had in the 60's - statistical boxes - that we trained on the entire world's data. And it turns out, they still never "learn" and aren't actually that smart.
The only good thing - we will soon finally be able to put to rest the assertion that's followed AI since the 60's. "If only we had a little more processing/memory/training data/time/money/scale/nodes/neurons/transformers... this idea of ours will magically turn into actual intelligence".
It's never happened, and it's still not happened. When this current AI fad dies, the next one is going to have to actually THINK about the problem, rather than just hoping AI will magic out of the ether if you hit enough neurons.
(Score: 3, Insightful) by bzipitidoo on Wednesday October 23, @04:21PM
Hundreds of years ago? Mozart made his living from patronage, not copyright. Shakespeare's earnings came from performances, not copyright.
Protection from copying isn't what artists need. That protection is only a means, and a poor one that often backfires, to what artists really need: a living. Protection works against publicity. Another very bad thing about protection is that it triggers fears of loss that are unreasonable.
> Copyright is the sole protector
Sole? No, it's not. Protector? Again, protection is not the ultimate goal. Art is.
> the exact same laws facilitate the GPL
The brilliance of copyleft is that the more these special interests manage to strengthen copyright, the more they strengthen copyleft, and that, they don't want to do.
> why would you write something only for an AI to copy it and pretend it made it?
Copying is one issue, and pretending to have made it is an entirely separate issue, called plagiarism. Copying is good. Plagiarism is bad.
Why write, you say? There you are, getting all triggered about an imaginary loss. Artists shouldn't have to be copy police, shouldn't have to fret about that. What we need is to improve the business models that do not rely on copyright, to better compensate artists fairly.
> is the primary legal protection for all kinds of artists - from programmers ...
Very few programmers use copyright protection. Most of our work is done on a "work for hire" basis.
Further, it is utter absurdity that software was ever allowed to be patented. No one can write a 100 line program without infringing on patents by the dozens. In practice, this infringement is ignored, but there is always the possibility that someone will try to make a big stink over it, to do a shake down.
> AGI doesn't exist. It's not even on the horizon. What we have is the same kinds of AI as we had in the 60's - statistical boxes - that we trained on the entire world's data. And it turns out, they still never "learn" and aren't actually that smart.
With this, I agree. LLMs are brainless. LLMs are not much more than the famous ELIZA program from the 60's, with a much, much bigger library of phrases.
(Score: 4, Informative) by mcgrew on Wednesday October 23, @08:08PM
Copyright exists because of musicians protecting their work, hundreds of years ago
My GOD! The ignorance in that statement is appalling, more so because it's presented as factual knowledge.
"Hundreds of years ago" the only thing a musician had to do with copyright was to buy sheet music from a publisher who had rights to publish that sheet music; recorded sound is only a little more than a century old.
Copyright was started in Britain to protect publishers from other publishers. It was started in the US to protect authors from publishers; at first, only American works could be copyrighted in the US, resulting in no Americans being published.
The copyright clause was proposed by James Madison in 1787 and unanimously agreed to by the Convention. Congress quickly implemented the clause by passing the Copyright Act of 1790, the first federal copyright law. This law protected books, maps, and charts for 14 years, with the option to renew for another 14 years.
Please educate yourself [lessig.org] before spouting such ignorance.
Our nation is in deep shit, but it's illegal to say that on TV.
(Score: 0) by Anonymous Coward on Wednesday October 23, @10:30AM (1 child)
> ... too easily created AI of genuine intelligence, the worst sort being accidental AI.
It's time to expose the emperor's new clothes -- stop calling this new pattern matching thing "AI" and start using RMS's name - bull shit generators or (in my mind) now shortened to BSGs.
Save the "AI" name for the future, if/when generalized artificial intelligence is developed.
(Score: 2) by bzipitidoo on Wednesday October 23, @01:25PM
"Word bandiers" is a perfectly acceptable term. I agree that calling this "intelligence" is incorrect. It is propaganda.
(Score: 2) by mcgrew on Wednesday October 23, @07:49PM
Publishers are trying to turn back the clock, again.
Bullshit. Read Free Culture [lessig.org] (PDF at the link). They've gone way past turning back the clock.
But I'll be damned if I'll agree to let my work, free in electronic form, be used to train AI. That's why I deleted my Farsebook account. If you look at my satire site you'll see at the bottom "©2024, all commercial rights reserved. Use for any renumeration, or to train AI, is strictly forbidden without explicit written permission on paper, and we will sue the shit out of you if you infringe!"
In Journey to Madness there are no human authors, painters, poets, or any of the art; it is all produced by AI. Some things are more important than money.
Our nation is in deep shit, but it's illegal to say that on TV.