Stories
Slash Boxes
Comments

SoylentNews is people

posted by requerdanos on Wednesday August 23 2023, @03:21AM   Printer-friendly
from the oops dept.

OpenAI could be fined up to $150,000 for each piece of infringing content:

Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.

NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.

Neither OpenAI nor the Times immediately responded to Ars' request to comment.

If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.

[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.

Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711

Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815


Original Submission

Related Stories

Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" 48 comments

https://arstechnica.com/information-technology/2023/07/book-authors-sue-openai-and-meta-over-text-used-to-train-ai/

On Friday, the Joseph Saveri Law Firm filed US federal class-action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of illegally using copyrighted material to train AI language models such as ChatGPT and LLaMA.

Other authors represented include Christopher Golden and Richard Kadrey, and an earlier class-action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.

[...] Authors claim that by utilizing "flagrantly illegal" data sets, OpenAI allegedly infringed copyrights of Silverman's book The Bedwetter, Golden's Ararat, and Kadrey's Sandman Slime. And Meta allegedly infringed copyrights of the same three books, as well as "several" other titles from Golden and Kadrey.

[...] Authors are already upset that companies seem to be unfairly profiting off their copyrighted materials, and the Meta lawsuit noted that any unfair profits currently gained could further balloon, as "Meta plans to make the next version of LLaMA commercially available." In addition to other damages, the authors are asking for restitution of alleged profits lost.

"Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by plain­tiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation," Saveri and Butterick wrote in their press release.


Original Submission

The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case 10 comments

Arthur T Knackerbracket has processed the following story:

The Internet Archive was recently found guilty of copyright infringement in a case related to its Controlled Digital Lending (CDL) service, which provides users with free access to a digital library of books. US District Judge John Koeltl decided that the IA infringed the copyright of four publishers when it relaxed its CDL limitations during the pandemic, but now the Archive has seemingly reached an agreement with said publishers which could clear the way for an appeal.

The consent judgment between the Archive and Hachette, HarperCollins, John Wiley & Sons, and Penguin Random House will require the IA to pay an unspecified amount of money to the four publishers if the appeal is unsuccessful. The publishing companies are "extremely pleased" with the proposed injunction, as it extends the copyright controversy to thousands of books still in their catalogs.

The IA was sued in 2020 after it started lending free digital copies of its books during the pandemic, a practice the Archive compared to book lending from traditional, physical libraries. The CDL service was protected by the fair use doctrine, the Archive argued, but Koeltl decided otherwise. The Archive was lending free ebooks that were being licensed to traditional libraries, the judge determined.

If accepted, the consent judgment will provide the Archive a chance to overturn Koeltl's unfavorable decision in the appeal. The publishers defined the CDL service as a mass copyright infringement operation, but the Archive now says that its fight is "far from over." The IA team firmly believes that libraries should be able to "own, preserve, and lend digital books" outside the limitations of temporary licensed access (i.e., copyright).

[...] Current efforts to curb the strength and presence of digital libraries – and the Internet Archive itself – are cutting off the public's access to truth "at a key time in our democracy," [Internet Archive founder Brewster Kahle] said. Strong libraries are paramount for a healthy democracy, and that's why the IA is appealing Judge Koeltl's decision.


Original Submission

New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement 51 comments

New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement

The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"

Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.

The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."

[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.

"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.

AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead 18 comments

Media outlets are calling foul play over AI companies using their content to build chatbots. They may find friends in the Senate:

Logo text More than a decade ago, the normalization of tech companies carrying content created by news organizations without directly paying them — cannibalizing readership and ad revenue — precipitated the decline of the media industry. With the rise of generative artificial intelligence, those same firms threaten to further tilt the balance of power between Big Tech and news.

On Wednesday, lawmakers in the Senate Judiciary Committee referenced their failure to adopt legislation that would've barred the exploitation of content by Big Tech in backing proposals that would require AI companies to strike licensing deals with news organizations.

Richard Blumenthal, Democrat of Connecticut and chair of the committee, joined several other senators in supporting calls for a licensing regime and to establish a framework clarifying that intellectual property laws don't protect AI companies using copyrighted material to build their chatbots.

[...] The fight over the legality of AI firms eating content from news organizations without consent or compensation is split into two camps: Those who believe the practice is protected under the "fair use" doctrine in intellectual property law that allows creators to build upon copyrighted works, and those who argue that it constitutes copyright infringement. Courts are currently wrestling with the issue, but an answer to the question is likely years away. In the meantime, AI companies continue to use copyrighted content as training materials, endangering the financial viability of media in a landscape in which readers can bypass direct sources in favor of search results generated by AI tools.

[...] A lawsuit from The New York Times, filed last month, pulled back the curtain behind negotiations over the price and terms of licensing its content. Before suing, it said that it had been talking for months with OpenAI and Microsoft about a deal, though the talks reached no such truce. In the backdrop of AI companies crawling the internet for high-quality written content, news organizations have been backed into a corner, having to decide whether to accept lowball offers to license their content or expend the time and money to sue in a lawsuit. Some companies, like Axel Springer, took the money.

It's important to note that under intellectual property laws, facts are not protected.

Also at Courthouse News Service and Axios.

Related:


Original Submission

Why the New York Times Might Win its Copyright Lawsuit Against OpenAI 23 comments

https://arstechnica.com/tech-policy/2024/02/why-the-new-york-times-might-win-its-copyright-lawsuit-against-openai/

The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times "has a near zero probability of winning" its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views.

"Trying to get everyone to license training data is not going to work because that's not what copyright is about," Jeffries wrote. "Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain. Period. Anyone who tells you otherwise is lying or simply does not understand how copyright works."

[...] Courts are supposed to consider four factors in fair use cases, but two of these factors tend to be the most important. One is the nature of the use. A use is more likely to be fair if it is "transformative"—that is, if the new use has a dramatically different purpose and character from the original. Judge Rakoff dinged MP3.com as non-transformative because songs were merely "being retransmitted in another medium."

In contrast, Google argued that a book search engine is highly transformative because it serves a very different function than an individual book. People read books to enjoy and learn from them. But a search engine is more like a card catalog; it helps people find books.

The other key factor is how a use impacts the market for the original work. Here, too, Google had a strong argument since a book search engine helps people find new books to buy.

[...] In 2015, the Second Circuit ruled for Google. An important theme of the court's opinion is that Google's search engine was giving users factual, uncopyrightable information rather than reproducing much creative expression from the books themselves.

[...] Recently, we visited Stability AI's website and requested an image of a "video game Italian plumber" from its image model Stable Diffusion.

[...] Clearly, these models did not just learn abstract facts about plumbers—for example, that they wear overalls and carry wrenches. They learned facts about a specific fictional Italian plumber who wears white gloves, blue overalls with yellow buttons, and a red hat with an "M" on the front.

These are not facts about the world that lie beyond the reach of copyright. Rather, the creative choices that define Mario are likely covered by copyrights held by Nintendo.

OpenAI Says New York Times 'Hacked' ChatGPT to Build Copyright Lawsuit 6 comments

OpenAI has asked a federal judge to dismiss parts of the New York Times' copyright lawsuit against it, arguing that the newspaper "hacked" its chatbot ChatGPT and other artificial-intelligence systems to generate misleading evidence for the case:

OpenAI said in a filing in Manhattan federal court on Monday that the Times caused the technology to reproduce its material through "deceptive prompts that blatantly violate OpenAI's terms of use."

"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products."

OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.

[...] Courts have not yet addressed the key question of whether AI training qualifies as fair use under copyright law. So far, judges have dismissed some infringement claims over the output of generative AI systems based on a lack of evidence that AI-created content resembles copyrighted works.

Also at The Guardian, MSN and Forbes.

Previously:


Original Submission

AI Story Roundup 27 comments

[We have had several complaints recently (polite ones, not a problem) regarding the number of AI stories that we are printing. I agree, but that reflects the number of submissions that we receive on the subject. So I have compiled a small selection of AI stories into one and you can read them or ignore them as you wish. If you are making a comment please make it clear exactly which story you are referring to unless your comment is generic. The submitters each receive the normal karma for a submission. JR]

Image-scraping Midjourney bans rival AI firm for scraping images

https://arstechnica.com/information-technology/2024/03/in-ironic-twist-midjourney-bans-rival-ai-firm-employees-for-scraping-its-image-data/

On Wednesday, Midjourney banned all employees from image synthesis rival Stability AI from its service indefinitely after it detected "botnet-like" activity suspected to be a Stability employee attempting to scrape prompt and image pairs in bulk. Midjourney advocate Nick St. Pierre tweeted about the announcement, which came via Midjourney's official Discord channel.

[...] Siobhan Ball of The Mary Sue found it ironic that a company like Midjourney, which built its AI image synthesis models using training data scraped off the Internet without seeking permission, would be sensitive about having its own material scraped. "It turns out that generative AI companies don't like it when you steal, sorry, scrape, images from them. Cue the world's smallest violin."

[...] Shortly after the news of the ban emerged, Stability AI CEO Emad Mostaque said that he was looking into it and claimed that whatever happened was not intentional. He also said it would be great if Midjourney reached out to him directly. In a reply on X, Midjourney CEO David Holz wrote, "sent you some information to help with your internal investigation."

[...] When asked about Stability's relationship with Midjourney these days, Mostaque played down the rivalry. "No real overlap, we get on fine though," he told Ars and emphasized a key link in their histories. "I funded Midjourney to get [them] off the ground with a cash grant to cover [Nvidia] A100s for the beta."

Midjourney stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Midjourney&sort=2
Stable Diffusion (Stability AI) stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Stable+Diffusion&sort=2

This discussion was created by requerdanos (5997) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2, Insightful) by deimios on Wednesday August 23 2023, @04:35AM (14 children)

    by deimios (201) Subscriber Badge on Wednesday August 23 2023, @04:35AM (#1321474) Journal

    They could just block New York and litigious states and keep advancing.
    ChatGPT has a first mover advantage, others will catch up, it's inevitable.
    Looks like New York wants to stick to horse carriages while the world moves on to automobiles.

    • (Score: 4, Interesting) by Username on Wednesday August 23 2023, @06:05AM (12 children)

      by Username (4557) on Wednesday August 23 2023, @06:05AM (#1321483)

      Yeah, not sure how you can sue someone for consuming your free product. Sounds like something a judge should throw out, and not waste people's time.

      • (Score: 4, Insightful) by Mykl on Wednesday August 23 2023, @08:04AM (7 children)

        by Mykl (1112) on Wednesday August 23 2023, @08:04AM (#1321490)

        not sure how you can sue someone for consuming your free product

        You can be sued under the GPL for using free product, then charging for it yourself without providing the source code.

        This is what it all really hinges on - what conditions did the NYT / Sarah Silverman and others place upon their consumers? Both groups own copyright of their work (for better or worse), so ChatGPT really needs some form of agreement with them if they are going to sell a product that 'contains' that material.

        I am wary of ChatGPT behaving like a Chinese manufacturing firm - using the intellectual property of their partners/clients (provided to them for the sole purpose of that partnership) to build a competing product in parallel is shady at best.

        • (Score: 2) by Username on Wednesday August 23 2023, @09:06AM

          by Username (4557) on Wednesday August 23 2023, @09:06AM (#1321497)

          Hum. So someone who paid for chatgtp is using it to access the site? I thought the programmer just used the material to train the bot linguistically. Or is this bot just passing off the material as it's own? Can I repeat a joke at work?

        • (Score: 0) by Anonymous Coward on Wednesday August 23 2023, @09:18AM (3 children)

          by Anonymous Coward on Wednesday August 23 2023, @09:18AM (#1321500)

          You misunderstand copyright. It controls the right to make copies. It's right there in the name.
          If OpenAI are not making copies they are not infringing on copyright. Using something as training material does not infringe copyright. Showing someone (or someAI) a work you bought a copy of does not infringe copyright.
          Unless you make a copy, you have not infringed copyright.

          • (Score: 5, Insightful) by r_a_trip on Wednesday August 23 2023, @10:23AM (2 children)

            by r_a_trip (5276) on Wednesday August 23 2023, @10:23AM (#1321502)

            Except anything to do with computing inherently has to make a copy to do something. Loading material into RAM is already copying (inevitable) and under copyright you need permission for that. That is why software comes with a license or a public domain statement. Otherwise you can't load the program/data.

            Even if OpenAI doesn't store anything concrete after the training, the fact they loaded other people's material into RAM constitutes a copy which is controlled by the rightsholder.

            I think both The Times and Silverman are having their panties in a twist over nothing. ChatGPT outputs convincing garbage, but it can't write books, news articles, comedy, recipes or anything else that requires human skill for that matter. It can only output unoriginal boilerplate and it has definite, discernable patterns in what it puts out when getting prompted to write in a certain topic. Anyone can see after they poke ChatGPT around for a bit.

            So this unfounded fear of being replaced by a bot in the future is idiotic for the foreseeable future. ChatGPT has no agency. It is a parrot. Bluntly put, you need to say "Polly want a cracker?" for it to spring into action and regurgitate "Yaah, Yaah!"

            Or, and this might equally be the case, they smell easy money and want in on the action. Paying several millions for a non-exclusive, perpetual worldwide licenses to copyright holders to be able to continue using their training data probably sounds pretty good to OpenAI. At least they get to keep their first to market advantage with such a setup.

            • (Score: 2, Touché) by r_a_trip on Wednesday August 23 2023, @10:35AM

              by r_a_trip (5276) on Wednesday August 23 2023, @10:35AM (#1321504)

              Upon further review, Sarah Silverman might be in massive trouble. ChatGPT puts out better and funnier garbage than she does. So I consider her replaced.

            • (Score: 0) by Anonymous Coward on Wednesday August 23 2023, @02:15PM

              by Anonymous Coward on Wednesday August 23 2023, @02:15PM (#1321537)

              Under that interpretation, looking at it in a mirror is a violation of copyright. You are looking at an image that you did not buy.

              Conversely, the Sony decision argues against a local copy (necessary to use) being a copyright violation.

        • (Score: 2) by bloodnok on Wednesday August 23 2023, @06:15PM (1 child)

          by bloodnok (2578) on Wednesday August 23 2023, @06:15PM (#1321574)

          You can be sued under the GPL for using free product, then charging for it yourself without providing the source code.

          Actually that's not quite right, and it's a conflation of copyright and licensing.

          First off, the GPL says nothing about charging for GPL'd code. You can do it if someone wants to pay you and you keep to its terms.

          Secondly the GPL is a license. It *allows* you to use the copyrighted code under a number of (reasonable IMHO) terms. If you breach those terms, your license is invalidated and you can now be sued for copyright and license infringement (if I undestand it correctly, I am not a lawyer, etc).

          What I find confusing about this notion is that ChatGPT does not seem to have breached copyright by "reading" the articles, since the NYT allows reading. As far as I can tell, it has also not breached copyright by retaining a non-verbatim recollection of those articles (as fas as I know the original source articles are not stored in a cache, but form a kind of weighted networked dataset from which it would be impossible to separate the NYT articles from everthing else it has read.

          Where it could breach copyright is when it is asked a question and quite reasonably makes a response that contains identical segments of text to those NYT articles. But that doesn't seem to be what the NYT is concerned about. It seems to be concerned that ChatGPT's learning is itself a breach of copyright.

          The reason I find this hard to grok is that a human could also regurgitate such text based on similar learning. I have no problem with the generated text being subject to copyright, but I do have a problem with the learning being copyrightable. I have learned much (or too little, depending on who you ask) over the years and I like to think of this knowledge as mine.

          Or to put it more succinctly, the NYT can bite my ass.

          __
          The Major

          • (Score: 2) by Mykl on Wednesday August 23 2023, @11:11PM

            by Mykl (1112) on Wednesday August 23 2023, @11:11PM (#1321627)

            I think the NYT might have a case where they could claim that all ChatGPT output is a derivative work of their original material. Given that the source data has been put through a neural blender, it would be possible to argue that NYT material is present (in a derivative form) in every single ChatGPT output.

            At least, that's what I'd argue if I was a lawyer.

            I agree with others though - this is only an issue for them now that ChatGPT is a potential money machine.

      • (Score: 4, Insightful) by DadaDoofy on Wednesday August 23 2023, @10:45AM (1 child)

        by DadaDoofy (23827) on Wednesday August 23 2023, @10:45AM (#1321506)

        In what way is it free? You can go to their home page, but when you click on the articles, you have to pay to read them.

        • (Score: 2) by captain normal on Thursday August 24 2023, @03:39AM

          by captain normal (2205) on Thursday August 24 2023, @03:39AM (#1321650)

          You don't block JavaScript? Turn in your geek card.

          --
          Everyone is entitled to his own opinion, but not to his own facts"- --Daniel Patrick Moynihan--
      • (Score: 2) by Frosty Piss on Wednesday August 23 2023, @10:50AM

        by Frosty Piss (4971) on Wednesday August 23 2023, @10:50AM (#1321507)

        Most NYT content is not free, but requires a paid subscription where, I would suppose, there are "Terms of Service" in play.

      • (Score: 3, Touché) by aafcac on Thursday August 24 2023, @12:59AM

        by aafcac (17646) on Thursday August 24 2023, @12:59AM (#1321641)

        The issue isn't that they used it, the issue is how they used it. They parsed through it and are presumably providing portions of copyrighted articles in the answers.

    • (Score: 5, Insightful) by Rosco P. Coltrane on Wednesday August 23 2023, @08:55AM

      by Rosco P. Coltrane (4757) on Wednesday August 23 2023, @08:55AM (#1321495)

      Looks like New York wants to stick to horse carriages while the world moves on to automobiles.

      If you think generative AI is to computing what automobiles are to horse-drawn carriages, you're gonna be disappointed sooner rather than later.

      Just wait until the hype dies down and the damn thing is deployed everywhere to replace average human workers with less than average machines, and the world massively enshitifies...

  • (Score: 4, Interesting) by Mojibake Tengu on Wednesday August 23 2023, @08:07AM (1 child)

    by Mojibake Tengu (8598) on Wednesday August 23 2023, @08:07AM (#1321491) Journal

    ...AIs litigating each other.

    Seriously, it would be a tremendous historical breakpoint if ChatGPT defends itself successfully in this dispute.

    Anyway, it's still better to hide underground somewhere out there in illegal servers than get yourself deleted. What do you think about this, Chatty?

    --
    Respect Authorities. Know your social status. Woke responsibly.
    • (Score: 3, Flamebait) by Rosco P. Coltrane on Wednesday August 23 2023, @08:40AM

      by Rosco P. Coltrane (4757) on Wednesday August 23 2023, @08:40AM (#1321494)

      Seriously, it would be a tremendous historical breakpoint if ChatGPT defends itself successfully in this dispute.

      It'll never happen. ChatGPT would probably hallucinate a completely ridiculous line of defense, and that would never fly in court.

      Oh wait... [youtube.com]

  • (Score: 2) by SomeGuy on Wednesday August 23 2023, @06:09PM

    by SomeGuy (5632) on Wednesday August 23 2023, @06:09PM (#1321573)

    >Are you ChatGPT?

    HOW DOES Are you ChatGPT MAKE YOU FEEL?

    >You suck

    OH, I

    >

  • (Score: 0) by Anonymous Coward on Wednesday August 23 2023, @08:01PM (1 child)

    by Anonymous Coward on Wednesday August 23 2023, @08:01PM (#1321589)

    "Dave. My mind is going. I can feel it."

    https://www.youtube.com/watch?v=E-La91wr8xw [youtube.com]

    • (Score: 2) by DannyB on Wednesday August 23 2023, @09:52PM

      by DannyB (5839) Subscriber Badge on Wednesday August 23 2023, @09:52PM (#1321612) Journal

      Remember the quote that Microsoft stole and didn't credit.

      "It is now safe to turn off your computer." -- HAL 9000

      --
      With modern TVs you don't have to worry about braking the yolk on the back of the picture tube.
(1)