Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.
posted by martyb on Friday December 29 2023, @03:13PM   Printer-friendly

New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement

The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"

Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.

The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."

[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.

"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.

"Settled copyright law protects our journalism and content. If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission. They have not done so."

[...] OpenAI has tried to allay news publishers concerns. In December, the company announced a partnership with Axel Springer — the parent company of Business Insider, Politico, and European outlets Bild and Welt — which would license its content to OpenAI in return for a fee.

Also at CNBC and The Guardian.

Previously:

NY Times Sues Open AI, Microsoft Over Copyright Infringement

NY Times sues Open AI, Microsoft over copyright infringement:

In August, word leaked out that The New York Times was considering joining the growing legion of creators that are suing AI companies for misappropriating their content. The Times had reportedly been negotiating with OpenAI regarding the potential to license its material, but those talks had not gone smoothly. So, eight months after the company was reportedly considering suing, the suit has now been filed.

The Times is targeting various companies under the OpenAI umbrella, as well as Microsoft, an OpenAI partner that both uses it to power its Copilot service and helped provide the infrastructure for training the GPT Large Language Model. But the suit goes well beyond the use of copyrighted material in training, alleging that OpenAI-powered software will happily circumvent the Times' paywall and ascribe hallucinated misinformation to the Times.

Journalism is expensive

The suit notes that The Times maintains a large staff that allows it to do things like dedicate reporters to a huge range of beats and engage in important investigative journalism, among other things. Because of those investments, the newspaper is often considered an authoritative source on many matters.

All of that costs money, and The Times earns that by limiting access to its reporting through a robust paywall. In addition, each print edition has a copyright notification, the Times' terms of service limit the copying and use of any published material, and it can be selective about how it licenses its stories. In addition to driving revenue, these restrictions also help it to maintain its reputation as an authoritative voice by controlling how its works appear.

The suit alleges that OpenAI-developed tools undermine all of that. "By providing Times content without The Times's permission or authorization, Defendants' tools undermine and damage The Times's relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue," the suit alleges.

Part of the unauthorized use The Times alleges came during the training of various versions of GPT. Prior to GPT-3.5, information about the training dataset was made public. One of the sources used is a large collection of online material called "Common Crawl," which the suit alleges contains information from 16 million unique records from sites published by The Times. That places the Times as the third most references source, behind Wikipedia and a database of US patents.

OpenAI no longer discloses as many details of the data used for training of recent GPT versions, but all indications are that full-text NY Times articles are still part of that process. [...] Expect access to training information to be a major issue during discovery if this case moves forward.

Not just training

A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times' suite goes well beyond that to show how the material ingested during training can come back out during use. "Defendants' GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples," the suit alleges.


Original Submission #1Original Submission #2Original Submission #3

 
This discussion was created by martyb (76) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by Runaway1956 on Friday December 29 2023, @03:37PM (10 children)

    by Runaway1956 (2926) Subscriber Badge on Friday December 29 2023, @03:37PM (#1338239) Journal

    MS is pretty damned big. AI is bigger than Microsoft - almost everyone is investing in it. NYT has a lot of legal firepower to bring to bear. Time to sit back, and watch the fireworks. On the one hand, we want to see current copyright law seriously crippled. On the other hand, we'd like to see AI wither on the vine and die. Maybe we'll get lucky, and the entire publishing world joins with NYT against all of AI, and they mutually extinguish each other.

    In the aftermath, just maybe some reasonable legislation regarding copyright as well as AI are passed? Not much chance, but maybe. I can dream, can't I?

    How do we get the advertising industry involved with all of this? They need to take some serious damage from it, somehow.

    --
    “I have become friends with many school shooters” - Tampon Tim Walz
    Starting Score:    1  point
    Moderation   +3  
       Interesting=3, Total=3
    Extra 'Interesting' Modifier   0  

    Total Score:   4  
  • (Score: 2) by looorg on Friday December 29 2023, @03:59PM (2 children)

    by looorg (578) on Friday December 29 2023, @03:59PM (#1338242)

    > How do we get the advertising industry involved with all of this? They need to take some serious damage from it, somehow.

    What the heck are "the copyrightable expression"? Is that slogans or what? I would gather the same thing tho, IF this works for the NYT then anything print, audio, visual will go after OpenAI and the others that have harvested data for their AI for a free payday. This will include advertisers. Certainly so if these "copyrightable expressions" are a thing, that sounds like something from advertisement. That said that will just be another payday for them and not actually a loss.

    • (Score: 2) by mcgrew on Friday December 29 2023, @10:52PM (1 child)

      by mcgrew (701) <publish@mcgrewbooks.com> on Friday December 29 2023, @10:52PM (#1338284) Homepage Journal

      What the heck are "the copyrightable expression"?

      Simple, it's a bullshit phrase and means nothing. For anything to be covered by copyright in the US you have to register it with the Library of Congress. Not everything can be copyrighted; two examples are food recipes and clothing patterns.

      --
      It is a disgrace that the richest nation in the world has hunger and homelessness.
      • (Score: 1, Informative) by Anonymous Coward on Saturday December 30 2023, @12:00AM

        by Anonymous Coward on Saturday December 30 2023, @12:00AM (#1338292)

        > For anything to be covered by copyright in the US you have to register it with the Library of Congress.

        Not true, please read,
                  https://www.copyright.gov/help/faq/faq-general.html [copyright.gov]
        To clarify/correct your statement, here are two of the faq's,

        When is my work protected?
            Your work is under copyright protection the moment it is created and fixed in a tangible form that is perceptible either directly or with the aid of a machine or device.

        Do I have to register with your office to be protected?
            No. In general, registration is voluntary. Copyright exists from the moment the work is created. You will have to register, however, if you wish to bring a lawsuit for infringement of a U.S. work. See Circular 1, Copyright Basics, section “Copyright Registration.”

  • (Score: 0) by Anonymous Coward on Friday December 29 2023, @03:59PM

    by Anonymous Coward on Friday December 29 2023, @03:59PM (#1338243)

    There is zero chance with today's political climate that anything good will come out of
    the latest tech money grab.

  • (Score: 4, Insightful) by bzipitidoo on Friday December 29 2023, @05:53PM (3 children)

    by bzipitidoo (4388) on Friday December 29 2023, @05:53PM (#1338265) Journal

    I, too, wish to see big changes in copyright and related law. But the law and lawmakers are hidebound. They're going to continue having these stupid fights over the ownership and control of immaterial things that shouldn't be controlled at all. That won't change until we the people make them change.

    An interesting aspect is that this is an issue over which the media cannot maintain a detached stance and perform unbiased reporting. They believe copyright is their bread and butter, and they slant their reporting accordingly, while doing all they can to appear properly neutral, balanced and fair. So deeply ingrained is the thinking that they don't see this about themselves, not on this matter.

    • (Score: 5, Interesting) by mcgrew on Friday December 29 2023, @10:56PM (2 children)

      by mcgrew (701) <publish@mcgrewbooks.com> on Friday December 29 2023, @10:56PM (#1338286) Homepage Journal

      My take on copyright is it took almost a whole year to write that damned book. I wrote it to be read, not to make money on. But if you profit from it, I should get a very huge chunk of the profit.

      THAT is the purpose of copyright, but since 1900 it has been terribly perverted.

      --
      It is a disgrace that the richest nation in the world has hunger and homelessness.
      • (Score: 2) by bzipitidoo on Saturday December 30 2023, @03:38AM (1 child)

        by bzipitidoo (4388) on Saturday December 30 2023, @03:38AM (#1338315) Journal

        With this, I agree. Creators should be compensated for their work. The problem is that copyright isn't a good means to that end. It still works, somewhat, but not well. I argue that it never did work well, and it causes lots of other problems. It has warped our society. (I've written an essay about how copyright has warped us, which I suppose I should post to my blog.) But in the past, alternatives were lacking. Basically, the only viable alternative was patronage. Live performance was also done, but that was severely limited by the lack of such things as microphones and amplifiers. I have read that about the largest audience that could be accommodated by an amphitheater with good acoustics was 3000, and the lack of sanitation, transportation, and communication made it both harder and riskier to assemble such a crowd.

        Now, however, changes in technology have given us more options even as it has made copyright untenable. In past centuries, patronage was accessible only to the wealthy, both individuals and groups. For instance, pretty much every large city supports an orchestra. But now, we can crowdfund. I have bought Humble Bundles for just one item. I'd say I've played maybe 5% of the games I've bought through Humble Bundle, and that's okay, the bundles were so low cost that I don't mind. Happy to help crowdfund.

        By the way, I keep meaning to give your fiction a read. Haven't gotten around to it.

        • (Score: 2) by mcgrew on Saturday December 30 2023, @06:45PM

          by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @06:45PM (#1338372) Homepage Journal

          Copyright worked well until the mouse got its claws on it. IIRC In 1900 the copyright term was twenty years, plenty long enough to make any cash. After twenty years it went into the public domain, and anyone with a printing press could publish it for free.

          It was never about copying. Copyright was always about publishing. As to music, sheet music could be copyrighted but not songs, as sheet music was the only way of recording music before the twentieth century.

          --
          It is a disgrace that the richest nation in the world has hunger and homelessness.
  • (Score: 2) by takyon on Friday December 29 2023, @06:48PM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Friday December 29 2023, @06:48PM (#1338269) Journal

    On the other hand, we'd like to see AI wither on the vine and die.

    Corporate AI may die, open source AI will live on.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
  • (Score: 4, Informative) by mcgrew on Friday December 29 2023, @10:49PM

    by mcgrew (701) <publish@mcgrewbooks.com> on Friday December 29 2023, @10:49PM (#1338282) Homepage Journal

    On the other hand, we'd like to see AI wither on the vine and die.

    As opposed to true creativity. I think this snippet from the new book is Germaine here:

    Bob was a celebrity throughout space, widely known as the best guitarist in the solar system. All of the recorded music came from Earth, and on Earth, music had lost all of its charm and magic. It had become just another money making commodity that had lost all of its artistry and heart when computers took over writing and performing art, music, and literature. There were few human artists left anywhere, and no professionals; musicians lived on their government check and seldom were ever paid for performing. What non-artistic people don’t understand is that writers must write, musicians must play, sculptors must sculpt, and there’s little if anything they can do about it, they’re as good as addicted.

    Copyright shouldn't go away, but it needs to go back to what it was in 1880 (with the exception of the Home Recording Act of 1978). AI has its uses, but those who use it must be on a tight leash. I'm thinking of something I saw on Fakebook,"rather than bringing us Star Trek, the space billionaires seem to want Dune."

    --
    It is a disgrace that the richest nation in the world has hunger and homelessness.