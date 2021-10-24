from the house.random.penguin.news dept.
https://mashable.com/article/penguin-random-house-ai-protections-copyright-page
https://www.theverge.com/2024/10/18/24273895/penguin-random-house-books-copyright-ai
PRH's changing of its copyright wording to combat AI training makes it the first of the Big Five publishers to take such an action against AI, at least publicly.
The clause also notes that Penguin Random House "expressly reserves this work from the text and data mining exception" in line with the European Union's laws.
In August, Penguin Random House published a statement saying that the publisher will "vigorously defend the intellectual property that belongs to our authors and artists."
Penguin Random House will amend their copyright notice with "no part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems.".
Will it work? Have they just created more job for themselves trying to litigate to all the LLM trainers? How much is to much or enough for it to be distinct from their books or just not words other people have expressed to?
(Score: 1, Funny) by Anonymous Coward on Tuesday October 22, @04:34PM
I was hoping this was about Linux.
(Score: 2) by VLM on Tuesday October 22, @04:58PM
Heres some believed to be factual background, or at least I don't think any of it is false but its useful background to the topic:
Lots of people believe in the "no such thing as bad PR" theory. So even if this is legally untenable it's still getting their name in the news so they already won. Even if its never enforcible, assuming it IS unenforcible.
A lot of computer people think of law as machine code that will execute 100% reliably and reproducibly. IRL its more of a guideline of what you'll get away with.
Under fair use a lawyer could make a pretty good argument that someone bought a book for training that the AI is making review based commentary upon or "not intended to exploit" the work. You could train the AI in an abstract sense of "hey AI write a review of some book" and under fair use there's not a heck of a lot they could do about writing a review without banning reviews of books, I think.
There is/was a statute of limitations on copyright infringement on the books for three years. The supreme court reinterpreted various things to not have a time limit. I'm not saying this proves anything but its going to be weird. So buy a book, let it sit on a shelf in a warehouse for three years, THEN scan it, then train on it and plus or minus future re-interpretations there might be nothing they can do about it if they don't find out for several years. I'm not saying this is a solution now but you might see something like this in the future or as some kind of mediated solution to a future court case. "We won't train on anything newer than X years old" might be the settlement terms, or might not. So ask an AI for fictional modern Star Wars and it'll respond "who? what?" but ask for Aesop's fables reinterpreted and it'll do fine.
One problem with AI training is GIGO and training on modern ghostwritten by AI content might be a VERY bad idea in the long run, so training exclusively on the contents of copyright free Project Gutenberg type sources might be a very good idea both for the AI and as a weapon against multinational publishers. If the multinationals are not careful they might get marginalized out of relevance and then existence. See newspapers, scientific journals, etc. Sure they might scrape along an existence, ignored by almost everyone and culturally irrelevant ... of course isn't that the position of book publishers already in 2024? Hmm. If the AI is trained on material that fell out of copyright last century, there's not much a modern publisher can do about it by manipulating current copyright terms.
All you need is one judge to declare AI a life form that's mentally disabled and/or requires educational material as a sentient being, and you'd have to rewrite all the fair use laws if you want to implement "separate but equal" for AIs. Currently, for example, its big time illegal to try to prevent a blind person from using a screenreader. Its quite hopeless to enforce a copyright specifically designed to exclude academic review and quotation. All they need is one judge one time and the AI is on the loose.
(Score: 1) by Billy the Mountain on Tuesday October 22, @05:01PM
I think it's all well and good for Penguin to say this but I think they or someone would need to develop a way to detect that their intellectual property is being used and being redistributed. The two articles are short and don't seem to address this issue.