Stories
Slash Boxes
Comments

SoylentNews is people

posted by requerdanos on Wednesday July 12 2023, @09:02AM   Printer-friendly
from the regurgitation dept.

https://arstechnica.com/information-technology/2023/07/book-authors-sue-openai-and-meta-over-text-used-to-train-ai/

On Friday, the Joseph Saveri Law Firm filed US federal class-action lawsuits on behalf of Sarah Silverman and other authors against OpenAI and Meta, accusing the companies of illegally using copyrighted material to train AI language models such as ChatGPT and LLaMA.

Other authors represented include Christopher Golden and Richard Kadrey, and an earlier class-action lawsuit filed by the same firm on June 28 included authors Paul Tremblay and Mona Awad. Each lawsuit alleges violations of the Digital Millennium Copyright Act, unfair competition laws, and negligence.

[...] Authors claim that by utilizing "flagrantly illegal" data sets, OpenAI allegedly infringed copyrights of Silverman's book The Bedwetter, Golden's Ararat, and Kadrey's Sandman Slime. And Meta allegedly infringed copyrights of the same three books, as well as "several" other titles from Golden and Kadrey.

[...] Authors are already upset that companies seem to be unfairly profiting off their copyrighted materials, and the Meta lawsuit noted that any unfair profits currently gained could further balloon, as "Meta plans to make the next version of LLaMA commercially available." In addition to other damages, the authors are asking for restitution of alleged profits lost.

"Much of the material in the training datasets used by OpenAI and Meta comes from copyrighted works—including books written by plain­tiffs—that were copied by OpenAI and Meta without consent, without credit, and without compensation," Saveri and Butterick wrote in their press release.


Original Submission

 
This discussion was created by requerdanos (5997) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Thexalon on Wednesday July 12 2023, @05:05PM (2 children)

    by Thexalon (636) on Wednesday July 12 2023, @05:05PM (#1315726)

    It's not a crime to scrape data. It is potentially a copyright violation and thus a civil tort, depending in part on what you do with it afterwords. Especially since a lot of websites have a copyright notice somewhere on the page, which almost definitely got ignored by the scraping bots.

    I'm no lawyer, but this sure seems like a kind of case that was guaranteed to happen eventually. And I could also imagine such a case being settled if the so-called-AI companies set up some sort of system of giving the creators of their source material a portion of whatever proceeds they're getting from what they're creating based on that source material (which could well be a "derivative work" under copyright law).

    --
    The only thing that stops a bad guy with a compiler is a good guy with a compiler.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Informative) by DeathMonkey on Wednesday July 12 2023, @07:22PM

    by DeathMonkey (1380) on Wednesday July 12 2023, @07:22PM (#1315749) Journal

    In this case they're getting sued so it's civil already.

    However, criminal copyright statutes exist as well so civil lawsuits are definitely not the only remedy. (for better or worse...)

  • (Score: 0) by Anonymous Coward on Thursday July 13 2023, @03:11AM

    by Anonymous Coward on Thursday July 13 2023, @03:11AM (#1315856)

    It is potentially a copyright violation and thus a civil tort, depending in part on what you do with it afterwords.

    Yeah Microsoft seems fine with scraping GPL code and using it for Copilot but are they doing using the Windows, MS Office etc source code for Copilot?