Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 13 submissions in the queue.
posted by martyb on Friday December 29 2023, @03:13PM   Printer-friendly

New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement

The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"

Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.

The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."

[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.

"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.

"Settled copyright law protects our journalism and content. If Microsoft and OpenAI want to use our work for commercial purposes, the law requires that they first obtain our permission. They have not done so."

[...] OpenAI has tried to allay news publishers concerns. In December, the company announced a partnership with Axel Springer — the parent company of Business Insider, Politico, and European outlets Bild and Welt — which would license its content to OpenAI in return for a fee.

Also at CNBC and The Guardian.

Previously:

NY Times Sues Open AI, Microsoft Over Copyright Infringement

NY Times sues Open AI, Microsoft over copyright infringement:

In August, word leaked out that The New York Times was considering joining the growing legion of creators that are suing AI companies for misappropriating their content. The Times had reportedly been negotiating with OpenAI regarding the potential to license its material, but those talks had not gone smoothly. So, eight months after the company was reportedly considering suing, the suit has now been filed.

The Times is targeting various companies under the OpenAI umbrella, as well as Microsoft, an OpenAI partner that both uses it to power its Copilot service and helped provide the infrastructure for training the GPT Large Language Model. But the suit goes well beyond the use of copyrighted material in training, alleging that OpenAI-powered software will happily circumvent the Times' paywall and ascribe hallucinated misinformation to the Times.

Journalism is expensive

The suit notes that The Times maintains a large staff that allows it to do things like dedicate reporters to a huge range of beats and engage in important investigative journalism, among other things. Because of those investments, the newspaper is often considered an authoritative source on many matters.

All of that costs money, and The Times earns that by limiting access to its reporting through a robust paywall. In addition, each print edition has a copyright notification, the Times' terms of service limit the copying and use of any published material, and it can be selective about how it licenses its stories. In addition to driving revenue, these restrictions also help it to maintain its reputation as an authoritative voice by controlling how its works appear.

The suit alleges that OpenAI-developed tools undermine all of that. "By providing Times content without The Times's permission or authorization, Defendants' tools undermine and damage The Times's relationship with its readers and deprive The Times of subscription, licensing, advertising, and affiliate revenue," the suit alleges.

Part of the unauthorized use The Times alleges came during the training of various versions of GPT. Prior to GPT-3.5, information about the training dataset was made public. One of the sources used is a large collection of online material called "Common Crawl," which the suit alleges contains information from 16 million unique records from sites published by The Times. That places the Times as the third most references source, behind Wikipedia and a database of US patents.

OpenAI no longer discloses as many details of the data used for training of recent GPT versions, but all indications are that full-text NY Times articles are still part of that process. [...] Expect access to training information to be a major issue during discovery if this case moves forward.

Not just training

A number of suits have been filed regarding the use of copyrighted material during training of AI systems. But the Times' suite goes well beyond that to show how the material ingested during training can come back out during use. "Defendants' GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples," the suit alleges.


Original Submission #1Original Submission #2Original Submission #3

Related Stories

Microsoft, GitHub, and OpenAI Sued for $9B in Damages Over Piracy 51 comments

As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.

The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.

"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.

Besides open-source licenses and DMCA (§ 1202, which for­bids the removal of copy­right-man­age­ment infor­ma­tion), the lawsuit alleges violation of GitHub's terms of ser­vice and pri­vacy poli­cies, the Cal­i­for­nia Con­sumer Pri­vacy Act (CCPA), and other laws.

The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– Fraud.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Negligence.
– Civil conspiracy.
– Declaratory relief.

Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.

Previously:
(2022) GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit
(2022) Give Up GitHub: The Time Has Come!
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground


Original Submission

Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over 20 comments

OpenAI could be fined up to $150,000 for each piece of infringing content:

Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.

NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.

Neither OpenAI nor the Times immediately responded to Ars' request to comment.

If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.

[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.

Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711

Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815


Original Submission

AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead 18 comments

Media outlets are calling foul play over AI companies using their content to build chatbots. They may find friends in the Senate:

Logo text More than a decade ago, the normalization of tech companies carrying content created by news organizations without directly paying them — cannibalizing readership and ad revenue — precipitated the decline of the media industry. With the rise of generative artificial intelligence, those same firms threaten to further tilt the balance of power between Big Tech and news.

On Wednesday, lawmakers in the Senate Judiciary Committee referenced their failure to adopt legislation that would've barred the exploitation of content by Big Tech in backing proposals that would require AI companies to strike licensing deals with news organizations.

Richard Blumenthal, Democrat of Connecticut and chair of the committee, joined several other senators in supporting calls for a licensing regime and to establish a framework clarifying that intellectual property laws don't protect AI companies using copyrighted material to build their chatbots.

[...] The fight over the legality of AI firms eating content from news organizations without consent or compensation is split into two camps: Those who believe the practice is protected under the "fair use" doctrine in intellectual property law that allows creators to build upon copyrighted works, and those who argue that it constitutes copyright infringement. Courts are currently wrestling with the issue, but an answer to the question is likely years away. In the meantime, AI companies continue to use copyrighted content as training materials, endangering the financial viability of media in a landscape in which readers can bypass direct sources in favor of search results generated by AI tools.

[...] A lawsuit from The New York Times, filed last month, pulled back the curtain behind negotiations over the price and terms of licensing its content. Before suing, it said that it had been talking for months with OpenAI and Microsoft about a deal, though the talks reached no such truce. In the backdrop of AI companies crawling the internet for high-quality written content, news organizations have been backed into a corner, having to decide whether to accept lowball offers to license their content or expend the time and money to sue in a lawsuit. Some companies, like Axel Springer, took the money.

It's important to note that under intellectual property laws, facts are not protected.

Also at Courthouse News Service and Axios.

Related:


Original Submission

Microsoft in Deal With Semafor to Create News Stories With Aid of AI Chatbot 18 comments

https://arstechnica.com/information-technology/2024/02/microsoft-in-deal-with-semafor-to-create-news-stories-with-aid-of-ai-chatbot/

Microsoft is working with media startup Semafor to use its artificial intelligence chatbot to help develop news stories—part of a journalistic outreach that comes as the tech giant faces a multibillion-dollar lawsuit from the New York Times.

As part of the agreement, Microsoft is paying an undisclosed sum of money to Semafor to sponsor a breaking news feed called "Signals." The companies would not share financial details, but the amount of money is "substantial" to Semafor's business, said a person familiar with the matter.

[...] The partnerships come as media companies have become increasingly concerned over generative AI and its potential threat to their businesses. News publishers are grappling with how to use AI to improve their work and stay ahead of technology, while also fearing that they could lose traffic, and therefore revenue, to AI chatbots—which can churn out humanlike text and information in seconds.

The New York Times in December filed a lawsuit against Microsoft and OpenAI, alleging the tech companies have taken a "free ride" on millions of its articles to build their artificial intelligence chatbots, and seeking billions of dollars in damages.

[...] Semafor, which is free to read, is funded by wealthy individuals, including 3G capital founder Jorge Paulo Lemann and KKR co-founder Henry Kravis. The company made more than $10 million in revenue in 2023 and has more than 500,000 subscriptions to its free newsletters. Justin Smith said Semafor was "very close to a profit" in the fourth quarter of 2023.

Related stories on SoylentNews:
AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead - 20240112
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement - 20231228
Microsoft Shamelessly Pumping Internet Full of Garbage AI-Generated "News" Articles - 20231104
Google, DOJ Still Blocking Public Access to Monopoly Trial Docs, NYT Says - 20231020
After ChatGPT Disruption, Stack Overflow Lays Off 28 Percent of Staff - 20231017
Security Risks Of Windows Copilot Are Unknowable - 20231011
Microsoft AI Team Accidentally Leaks 38TB of Private Company Data - 20230923
Microsoft Pulls AI-Generated Article Recommending Ottawa Food Bank to Tourists - 20230820
A Jargon-Free Explanation of How AI Large Language Models Work - 20230805
the Godfather of AI Leaves Google Amid Ethical Concerns - 20230502
The AI Doomers' Playbook - 20230418
Ads Are Coming for the Bing AI Chatbot, as They Come for All Microsoft Products - 20230404
Deepfakes, Synthetic Media: How Digital Propaganda Undermines Trust - 20230319


Original Submission

Why the New York Times Might Win its Copyright Lawsuit Against OpenAI 23 comments

https://arstechnica.com/tech-policy/2024/02/why-the-new-york-times-might-win-its-copyright-lawsuit-against-openai/

The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times "has a near zero probability of winning" its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views.

"Trying to get everyone to license training data is not going to work because that's not what copyright is about," Jeffries wrote. "Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain. Period. Anyone who tells you otherwise is lying or simply does not understand how copyright works."

[...] Courts are supposed to consider four factors in fair use cases, but two of these factors tend to be the most important. One is the nature of the use. A use is more likely to be fair if it is "transformative"—that is, if the new use has a dramatically different purpose and character from the original. Judge Rakoff dinged MP3.com as non-transformative because songs were merely "being retransmitted in another medium."

In contrast, Google argued that a book search engine is highly transformative because it serves a very different function than an individual book. People read books to enjoy and learn from them. But a search engine is more like a card catalog; it helps people find books.

The other key factor is how a use impacts the market for the original work. Here, too, Google had a strong argument since a book search engine helps people find new books to buy.

[...] In 2015, the Second Circuit ruled for Google. An important theme of the court's opinion is that Google's search engine was giving users factual, uncopyrightable information rather than reproducing much creative expression from the books themselves.

[...] Recently, we visited Stability AI's website and requested an image of a "video game Italian plumber" from its image model Stable Diffusion.

[...] Clearly, these models did not just learn abstract facts about plumbers—for example, that they wear overalls and carry wrenches. They learned facts about a specific fictional Italian plumber who wears white gloves, blue overalls with yellow buttons, and a red hat with an "M" on the front.

These are not facts about the world that lie beyond the reach of copyright. Rather, the creative choices that define Mario are likely covered by copyrights held by Nintendo.

OpenAI Says New York Times 'Hacked' ChatGPT to Build Copyright Lawsuit 6 comments

OpenAI has asked a federal judge to dismiss parts of the New York Times' copyright lawsuit against it, arguing that the newspaper "hacked" its chatbot ChatGPT and other artificial-intelligence systems to generate misleading evidence for the case:

OpenAI said in a filing in Manhattan federal court on Monday that the Times caused the technology to reproduce its material through "deceptive prompts that blatantly violate OpenAI's terms of use."

"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products."

OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.

[...] Courts have not yet addressed the key question of whether AI training qualifies as fair use under copyright law. So far, judges have dismissed some infringement claims over the output of generative AI systems based on a lack of evidence that AI-created content resembles copyrighted works.

Also at The Guardian, MSN and Forbes.

Previously:


Original Submission

AI Story Roundup 27 comments

[We have had several complaints recently (polite ones, not a problem) regarding the number of AI stories that we are printing. I agree, but that reflects the number of submissions that we receive on the subject. So I have compiled a small selection of AI stories into one and you can read them or ignore them as you wish. If you are making a comment please make it clear exactly which story you are referring to unless your comment is generic. The submitters each receive the normal karma for a submission. JR]

Image-scraping Midjourney bans rival AI firm for scraping images

https://arstechnica.com/information-technology/2024/03/in-ironic-twist-midjourney-bans-rival-ai-firm-employees-for-scraping-its-image-data/

On Wednesday, Midjourney banned all employees from image synthesis rival Stability AI from its service indefinitely after it detected "botnet-like" activity suspected to be a Stability employee attempting to scrape prompt and image pairs in bulk. Midjourney advocate Nick St. Pierre tweeted about the announcement, which came via Midjourney's official Discord channel.

[...] Siobhan Ball of The Mary Sue found it ironic that a company like Midjourney, which built its AI image synthesis models using training data scraped off the Internet without seeking permission, would be sensitive about having its own material scraped. "It turns out that generative AI companies don't like it when you steal, sorry, scrape, images from them. Cue the world's smallest violin."

[...] Shortly after the news of the ban emerged, Stability AI CEO Emad Mostaque said that he was looking into it and claimed that whatever happened was not intentional. He also said it would be great if Midjourney reached out to him directly. In a reply on X, Midjourney CEO David Holz wrote, "sent you some information to help with your internal investigation."

[...] When asked about Stability's relationship with Midjourney these days, Mostaque played down the rivalry. "No real overlap, we get on fine though," he told Ars and emphasized a key link in their histories. "I funded Midjourney to get [them] off the ground with a cash grant to cover [Nvidia] A100s for the beta."

Midjourney stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Midjourney&sort=2
Stable Diffusion (Stability AI) stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Stable+Diffusion&sort=2

Wyoming Reporter Caught Using Artificial Intelligence to Create Fake Quotes and Stories 15 comments

A quote from Wyoming's governor and a local prosecutor were the first things that seemed slightly off to Powell Tribune reporter CJ Baker. Then, it was some of the phrases in the stories that struck him as nearly robotic:

The dead giveaway, though, that a reporter from a competing news outlet was using generative artificial intelligence to help write his stories came in a June 26 article about the comedian Larry the Cable Guy being chosen as the grand marshal of the Cody Stampede Parade.

[...] After doing some digging, Baker, who has been a reporter for more than 15 years, met with Aaron Pelczar, a 40-year-old who was new to journalism and who Baker says admitted that he had used AI in his stories before he resigned from the Enterprise.

[...] Journalists have derailed their careers by making up quotes or facts in stories long before AI came about. But this latest scandal illustrates the potential pitfalls and dangers that AI poses to many industries, including journalism, as chatbots can spit out spurious if somewhat plausible articles with only a few prompts.

[...] "In one case, (Pelczar) wrote a story about a new OSHA rule that included a quote from the Governor that was entirely fabricated," Michael Pearlman, a spokesperson for the governor, said in an email. "In a second case, he appeared to fabricate a portion of a quote, and then combined it with a portion of a quote that was included in a news release announcing the new director of our Wyoming Game and Fish Department."

Related:


Original Submission

This discussion was created by martyb (76) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Interesting) by Rosco P. Coltrane on Friday December 29 2023, @03:24PM (1 child)

    by Rosco P. Coltrane (4757) on Friday December 29 2023, @03:24PM (#1338238)

    Azure cloud computing honeypot.

    There. FTFY.

    The cloud isn't about providing online services: it's about collecting as much private data as possible and monetizing it. It's always been about that.

    The New York Times is absolutely right: it is a business model based on massive copyright infringement. But it's wrong on two things: it's not new, and it's not just Microsoft and OpenAI. It's been going on for decades, and all Big Data players essentially owe their very existence to the business of exploiting data they have no right to exploit.

    The difference between then and now is that the data they had no right to exploit wasn't exploited directly: it was digested and used indirectly for the purpose of advertisement. For example, when Google's surveillance collective has your medical file because your healthcare provider put your medical data in their cloud, and it knows you have some disease, and you keep getting advertisement more or less closely related to that disease, you have a hunch that Google is using data it shouldn't be using but you can't prove it.

    With AI, you can prove the infringement clear as day: chat with the stupid bot long enough and it will regurgitate your own data back to you verbatim. That's the difference.

    • (Score: 2) by The Vocal Minority on Sunday December 31 2023, @05:58AM

      by The Vocal Minority (2765) on Sunday December 31 2023, @05:58AM (#1338446) Journal

      If you have any actual proof this is happening then please provide it. The contractual arrangements around the use of Azure, and other similar cloud platforms, provide garantees of privacy for customer data, otherwise they wouldn't use them. I also believe the data is actually encrypted at rest with the private key in the customers possession. This is not gmail where google explicitly tells you they are going to look through your e-mails.

      Yes, there are no grantees and trust is required that the cloud infrastructure is actually doing what you are told it is doing. But that is no different from running close source software in general and/or using a third party data center.

      Personally, I am very suspicious of these cloud platforms and I think they give big American tech companies way too much power, but if I am to convince people not to use then the I need proof that these privacy abuses are happening. Otherwise it is all just a bunch of paranoid ranting.

  • (Score: 4, Interesting) by Runaway1956 on Friday December 29 2023, @03:37PM (10 children)

    by Runaway1956 (2926) Subscriber Badge on Friday December 29 2023, @03:37PM (#1338239) Journal

    MS is pretty damned big. AI is bigger than Microsoft - almost everyone is investing in it. NYT has a lot of legal firepower to bring to bear. Time to sit back, and watch the fireworks. On the one hand, we want to see current copyright law seriously crippled. On the other hand, we'd like to see AI wither on the vine and die. Maybe we'll get lucky, and the entire publishing world joins with NYT against all of AI, and they mutually extinguish each other.

    In the aftermath, just maybe some reasonable legislation regarding copyright as well as AI are passed? Not much chance, but maybe. I can dream, can't I?

    How do we get the advertising industry involved with all of this? They need to take some serious damage from it, somehow.

    --
    “I have become friends with many school shooters” - Tampon Tim Walz
    • (Score: 2) by looorg on Friday December 29 2023, @03:59PM (2 children)

      by looorg (578) on Friday December 29 2023, @03:59PM (#1338242)

      > How do we get the advertising industry involved with all of this? They need to take some serious damage from it, somehow.

      What the heck are "the copyrightable expression"? Is that slogans or what? I would gather the same thing tho, IF this works for the NYT then anything print, audio, visual will go after OpenAI and the others that have harvested data for their AI for a free payday. This will include advertisers. Certainly so if these "copyrightable expressions" are a thing, that sounds like something from advertisement. That said that will just be another payday for them and not actually a loss.

      • (Score: 2) by mcgrew on Friday December 29 2023, @10:52PM (1 child)

        by mcgrew (701) <publish@mcgrewbooks.com> on Friday December 29 2023, @10:52PM (#1338284) Homepage Journal

        What the heck are "the copyrightable expression"?

        Simple, it's a bullshit phrase and means nothing. For anything to be covered by copyright in the US you have to register it with the Library of Congress. Not everything can be copyrighted; two examples are food recipes and clothing patterns.

        --
        Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
        • (Score: 1, Informative) by Anonymous Coward on Saturday December 30 2023, @12:00AM

          by Anonymous Coward on Saturday December 30 2023, @12:00AM (#1338292)

          > For anything to be covered by copyright in the US you have to register it with the Library of Congress.

          Not true, please read,
                    https://www.copyright.gov/help/faq/faq-general.html [copyright.gov]
          To clarify/correct your statement, here are two of the faq's,

          When is my work protected?
              Your work is under copyright protection the moment it is created and fixed in a tangible form that is perceptible either directly or with the aid of a machine or device.

          Do I have to register with your office to be protected?
              No. In general, registration is voluntary. Copyright exists from the moment the work is created. You will have to register, however, if you wish to bring a lawsuit for infringement of a U.S. work. See Circular 1, Copyright Basics, section “Copyright Registration.”

    • (Score: 0) by Anonymous Coward on Friday December 29 2023, @03:59PM

      by Anonymous Coward on Friday December 29 2023, @03:59PM (#1338243)

      There is zero chance with today's political climate that anything good will come out of
      the latest tech money grab.

    • (Score: 4, Insightful) by bzipitidoo on Friday December 29 2023, @05:53PM (3 children)

      by bzipitidoo (4388) on Friday December 29 2023, @05:53PM (#1338265) Journal

      I, too, wish to see big changes in copyright and related law. But the law and lawmakers are hidebound. They're going to continue having these stupid fights over the ownership and control of immaterial things that shouldn't be controlled at all. That won't change until we the people make them change.

      An interesting aspect is that this is an issue over which the media cannot maintain a detached stance and perform unbiased reporting. They believe copyright is their bread and butter, and they slant their reporting accordingly, while doing all they can to appear properly neutral, balanced and fair. So deeply ingrained is the thinking that they don't see this about themselves, not on this matter.

      • (Score: 5, Interesting) by mcgrew on Friday December 29 2023, @10:56PM (2 children)

        by mcgrew (701) <publish@mcgrewbooks.com> on Friday December 29 2023, @10:56PM (#1338286) Homepage Journal

        My take on copyright is it took almost a whole year to write that damned book. I wrote it to be read, not to make money on. But if you profit from it, I should get a very huge chunk of the profit.

        THAT is the purpose of copyright, but since 1900 it has been terribly perverted.

        --
        Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
        • (Score: 2) by bzipitidoo on Saturday December 30 2023, @03:38AM (1 child)

          by bzipitidoo (4388) on Saturday December 30 2023, @03:38AM (#1338315) Journal

          With this, I agree. Creators should be compensated for their work. The problem is that copyright isn't a good means to that end. It still works, somewhat, but not well. I argue that it never did work well, and it causes lots of other problems. It has warped our society. (I've written an essay about how copyright has warped us, which I suppose I should post to my blog.) But in the past, alternatives were lacking. Basically, the only viable alternative was patronage. Live performance was also done, but that was severely limited by the lack of such things as microphones and amplifiers. I have read that about the largest audience that could be accommodated by an amphitheater with good acoustics was 3000, and the lack of sanitation, transportation, and communication made it both harder and riskier to assemble such a crowd.

          Now, however, changes in technology have given us more options even as it has made copyright untenable. In past centuries, patronage was accessible only to the wealthy, both individuals and groups. For instance, pretty much every large city supports an orchestra. But now, we can crowdfund. I have bought Humble Bundles for just one item. I'd say I've played maybe 5% of the games I've bought through Humble Bundle, and that's okay, the bundles were so low cost that I don't mind. Happy to help crowdfund.

          By the way, I keep meaning to give your fiction a read. Haven't gotten around to it.

          • (Score: 2) by mcgrew on Saturday December 30 2023, @06:45PM

            by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @06:45PM (#1338372) Homepage Journal

            Copyright worked well until the mouse got its claws on it. IIRC In 1900 the copyright term was twenty years, plenty long enough to make any cash. After twenty years it went into the public domain, and anyone with a printing press could publish it for free.

            It was never about copying. Copyright was always about publishing. As to music, sheet music could be copyrighted but not songs, as sheet music was the only way of recording music before the twentieth century.

            --
            Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
    • (Score: 2) by takyon on Friday December 29 2023, @06:48PM

      by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Friday December 29 2023, @06:48PM (#1338269) Journal

      On the other hand, we'd like to see AI wither on the vine and die.

      Corporate AI may die, open source AI will live on.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 4, Informative) by mcgrew on Friday December 29 2023, @10:49PM

      by mcgrew (701) <publish@mcgrewbooks.com> on Friday December 29 2023, @10:49PM (#1338282) Homepage Journal

      On the other hand, we'd like to see AI wither on the vine and die.

      As opposed to true creativity. I think this snippet from the new book is Germaine here:

      Bob was a celebrity throughout space, widely known as the best guitarist in the solar system. All of the recorded music came from Earth, and on Earth, music had lost all of its charm and magic. It had become just another money making commodity that had lost all of its artistry and heart when computers took over writing and performing art, music, and literature. There were few human artists left anywhere, and no professionals; musicians lived on their government check and seldom were ever paid for performing. What non-artistic people don’t understand is that writers must write, musicians must play, sculptors must sculpt, and there’s little if anything they can do about it, they’re as good as addicted.

      Copyright shouldn't go away, but it needs to go back to what it was in 1880 (with the exception of the Home Recording Act of 1978). AI has its uses, but those who use it must be on a tight leash. I'm thinking of something I saw on Fakebook,"rather than bringing us Star Trek, the space billionaires seem to want Dune."

      --
      Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
  • (Score: 4, Troll) by DannyB on Friday December 29 2023, @04:24PM (4 children)

    by DannyB (5839) Subscriber Badge on Friday December 29 2023, @04:24PM (#1338244) Journal

    If you don't want people or machines to read your content, or view your images . . .

    DON'T PUT THEM ONLINE !!!

    --
    Don't put a mindless tool of corporations in the white house; vote ChatGPT for 2024!
    • (Score: 1, Funny) by Anonymous Coward on Friday December 29 2023, @05:35PM

      by Anonymous Coward on Friday December 29 2023, @05:35PM (#1338264)

      How's that cave working for you? Drafty at all?

      I get what you're saying and all, but that ship has sailed. That other boat waiting for you is the B Ark.

    • (Score: 4, Insightful) by stormreaver on Friday December 29 2023, @07:34PM

      by stormreaver (5101) on Friday December 29 2023, @07:34PM (#1338274)

      If you don't want people or machines to read your content....

      They want people (not so much machines) to read their content, but they want to be paid for it (creating it is costly). That is totally reasonable. That OpenAI was caught red-handed holding the smoking gun (despite persistently lying about possessing the capability to hold one) is about as strong as a case can get.

      That said, the entire court experience is a case study in entropy at work, so I will not make a prediction. However, I hope OpenAI and Microsoft get taken to the cleaners.

    • (Score: 4, Interesting) by mcgrew on Saturday December 30 2023, @12:05AM (1 child)

      by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @12:05AM (#1338294) Homepage Journal

      Copyright isn't about keeping you from reading the content, despite what the Music And Film Association of America (MAFIAA) would have you believe. It protects the author from the publisher, not from the reader. In the case of music, congress specifically legalized home recording in the US in the 1978 Home Recording Act. This despite the fact that copyright was always about publishing, not copying.

      Do you want to read my book? You might not have a chance if anybody can make money publishing it except me. It might not have even been written.

      That said, I give my stuff away. I'm trying to sell copies of the new one to join the SFWA, it will be free in September or sooner and I abhor Digital Restrictions Management and don't use it. But there aren't many like me, Harry Potter might have never existed had it not been for England's welfare laws.

      --
      Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
      • (Score: 0) by Anonymous Coward on Saturday December 30 2023, @01:44PM

        by Anonymous Coward on Saturday December 30 2023, @01:44PM (#1338347)

        > Copyright isn't about keeping you from reading the content, despite what the Music And Film Association of America (MAFIAA) would have you believe. It protects the author from the publisher, not from the reader.

        Wouldn't it be closer to correct to say that:
        a. (C) protects from unscrupulous publishers who would publish a work without entering into a contract with the creator.
        b. The barriers of entry to becoming a publisher have changed greatly with improved technology. With consumer audio/video tape, photocopy machines and now digital copies, just about anyone can be a publisher. It's now so easy that most people who copy works may not even recognize that they are publishing.

  • (Score: 5, Informative) by ElizabethGreene on Friday December 29 2023, @04:45PM (21 children)

    by ElizabethGreene (6748) Subscriber Badge on Friday December 29 2023, @04:45PM (#1338249) Journal

    There are some interesting things in the complaint here. The NYT claims to have convinced ChatGpt to return substantial portions of the original full text of multiple articles.

    Complaint PDF [nytimes.com] Pages 30-40.

    • (Score: 2) by Ox0000 on Friday December 29 2023, @05:30PM (2 children)

      by Ox0000 (5111) on Friday December 29 2023, @05:30PM (#1338262)

      +1 Informative.

      That's uncanny...

      • (Score: 0) by Anonymous Coward on Friday December 29 2023, @06:07PM

        by Anonymous Coward on Friday December 29 2023, @06:07PM (#1338267)

        To the GP, thanks for digging into the text.

        > That's uncanny...

        That's not very surprising...others have reported similar in recent weeks.

        ftfy

      • (Score: 2) by mcgrew on Saturday December 30 2023, @12:11AM

        by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @12:11AM (#1338295) Homepage Journal

        That's uncanny...

        I see you're not a database guy, it's not surprising at all.

        Here's [soylentnews.org] how AI works. A journal from March explaining the magic.

        --
        Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
    • (Score: 4, Interesting) by krishnoid on Friday December 29 2023, @06:06PM (13 children)

      by krishnoid (1156) on Friday December 29 2023, @06:06PM (#1338266)

      There are also some interesting things in the complaint here. The NYT claims to have convinced ChatGPT to return substantial portions of the original full text of multiple comments.

      • (Score: 1) by MonkeypoxBugChaser on Friday December 29 2023, @07:06PM (11 children)

        by MonkeypoxBugChaser (17904) on Friday December 29 2023, @07:06PM (#1338271) Homepage Journal

        Yea, likely through an exploit. That whole repeat "poem" forever type thing.

        • (Score: 2) by HiThere on Friday December 29 2023, @07:21PM (10 children)

          by HiThere (866) Subscriber Badge on Friday December 29 2023, @07:21PM (#1338272) Journal

          OK, but it still implies that ChatGPT memorized the articles in some sense. That's definitely making a copy, just like you do when you reread a poem several times, or a favorite author. (A friend knew someone who could essentially recite the Lord of the Rings. I can do pieces of it, largely poems.)

          So the problem is that if NYT wins the case, the next target may be remembering stuff. I don't really see a clear demarcation.

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
          • (Score: 2) by mcgrew on Saturday December 30 2023, @12:27AM (5 children)

            by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @12:27AM (#1338298) Homepage Journal

            Computers don't "think" any more than a printed book thinks. Stored information is not thought, except that it's the thought of the original thinker who write it down or typed it into a computer.

            I should write an article about copyright, since so few seem to have any idea about it, thanks to the MAFIAA. Copyright is NOT about copying or storing data, it's about publishing it. And it's not automatic in the US thanks to (I think, information is hard for me to find) a lawsuit, as Bowker (the ISBN people) informed me.

            If I put my book on the internet, I have published it. Copyright gives me a "limited time*" monopoly on publication. It has to be registered and costs sixty bucks to register in the US. Recording that Metallica album and giving a copy to your friend is perfectly legal, no matter what that greedy asshole Lars Ulrich thinks.

            * The Bono Act gives me a "limited time" monopoly of ninety five tears longer than my life. I don't see how I'm going to be enticed to write any more books after I'm dead. SCOTUS ruled against common sense and logic, ruling that "limited" means whatever congress says it means.

            --
            Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
            • (Score: 2) by HiThere on Saturday December 30 2023, @01:40AM (4 children)

              by HiThere (866) Subscriber Badge on Saturday December 30 2023, @01:40AM (#1338305) Journal

              That's an assertion I've heard before, but I've never seen any good proof of it.
              Actually, proof is slightly the wrong word. What's missing is a definition of "thought" that includes what people do, doesn't include what computers to, and doesn't depend on handling them as a special case. The first version of that assertion that I heard was that computers will never play good chess because that can't think. The assertion that they couldn't play good chess was already false at the time, though they weren't up to expert level.

              So give me your explicit definition and perhaps I'll accept that, by your usage, computer can't think. Otherwise I'll just remember the old saying in AI that "intelligence is whatever we haven't managed to do yet".

              --
              Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
              • (Score: 3, Informative) by mcgrew on Saturday December 30 2023, @02:27AM (3 children)

                by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @02:27AM (#1338310) Homepage Journal

                I put it succinctly in the story Sentience [mcgrewbooks.com]. It's written in the first person, the narrator is a sentient computer.

                My views that there will never be a Turing architecture sentient computer comes mostly from the fact that I've studied the schematics of computer components like the ALU (Arithemetic Logic Unit) and written a two player battle tanks game in Z-80 machine code. A computer is no smarter than a printed book.

                Now, replicants, like in RUR [mcgrewbooks.com], with history's first use of the word "robot", or Do Androids Dream of Electric Sheep? ("Blade Runner") may and probably will be sentient.

                --
                Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
                • (Score: 2) by deimtee on Sunday December 31 2023, @02:48AM (2 children)

                  by deimtee (3272) on Sunday December 31 2023, @02:48AM (#1338422) Journal

                  there will never be a Turing architecture sentient computer

                  You are showing an organic bias. There is nothing the cells in the brain do that can't be done on a computer. We just haven't written a program that complicated yet. (Well, publicly at least, I don't know what the TLA's have.)

                  Reductio ad absurdum:
                  We can write a program to simulate a neuron. We can write a program to simulate an axon. We can design a message passing algorithm that simulates the interconnections. We can design self-modifying programs that mimic the changes in neurons and axons as they are used.
                  We can freeze a brain and examine it neuron by neuron and reproduce the neurons and interconnections in it in silicon and programming. It would take a massive effort and a huge amount of computer power but when you turned that program/machine on it would produce the same output as the brain that was scanned.

                  A computer is no smarter than a printed book.

                  The main difference is that a book is static. A computer can execute code and change the stored information. The glider gun in the Life program demonstrates that even a very simple system can have unlimited growth. It's not intelligent, but neither is a bacterium. You have to build up to intelligence. As far as I know, the simulationists have got as far as a small worm with a few neurons. I think there is a group working on simulating a fly's brain.

                  --
                  If you cough while drinking cheap red wine it really cleans out your sinuses.
                  • (Score: 1, Insightful) by Anonymous Coward on Monday January 01 2024, @02:09AM

                    by Anonymous Coward on Monday January 01 2024, @02:09AM (#1338536)

                    late to the party, but in case anyone is still reading..

                    > but when you turned that program/machine on it would produce the same output as the brain that was scanned.

                    Um, yes. But don't forget gigo. The inputs to the human brain are also somehow encoded, and not much of this is understood yet either. How many processing layers are in the eye, before any signals are sent down the optic nerve? Same applies to I/O with all the other organs both internal and near the skin. Without all the I/O a synthetic brain isn't going to be useful.

                    Back to the drawing board.

                  • (Score: 2) by mcgrew on Sunday January 07 2024, @06:51PM

                    by mcgrew (701) <publish@mcgrewbooks.com> on Sunday January 07 2024, @06:51PM (#1339489) Homepage Journal

                    There is nothing the cells in the brain do that can't be done on a computer.

                    Fractions. Divide one by three on a computer. Making anything actually original.

                    --
                    Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
          • (Score: 0) by Anonymous Coward on Saturday December 30 2023, @04:13PM

            by Anonymous Coward on Saturday December 30 2023, @04:13PM (#1338353)
            There's no problem with humans remembering stuff verbatim. It's when they produce copies of that stuff that they may infringe on copyright (depending on the law).

            And humans are responsible for any copyright infringement they make. So some people won't be producing infringing copies of that stuff even if they have good enough memory to do so.

            If Microsoft provided proof that they trained their AI on their own source code ( Windows, Microsoft Office etc) AND then publicly guaranteed that the output of their AI can be used without any copyright issues, especially guaranteeing that any output won't be infringing on Microsoft's copyright. Then sure I might start having a bit more confidence that they're not infringing. And if it happens to output useful Win32 stuff that WINE and ReactOS can now use legally well too bad for Microsoft.

            But instead they train their AI on OTHER people's copyrighted stuff and say that they are not infringing "because AI". To me that's laundering copyright infringement (e.g. GPLed stuff): https://www.theverge.com/2022/11/8/23446821/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data

            As the "poem" exploit confirms, these type of AIs have/produce infringing copies of stuff.

            Some idiots argue it's not infringement because the actual stored data doesn't look like the copyrighted stuff and is a lot smaller. If that's a good enough excuse then if I convert a copyrighted Blu-ray to HEVC, I won't be infringing since the data stored and distributed is now very different and a lot smaller. And yes it's provably lossy too - in many cases the output is not 100% the same. But nope, it's still considered infringement.
          • (Score: 2) by maxwell demon on Sunday December 31 2023, @11:47AM (2 children)

            by maxwell demon (1608) on Sunday December 31 2023, @11:47AM (#1338460) Journal

            If I reproduce large chunks of an article from memory and give them to whoever wants them, I'm already violating copyright. It doesn't matter that I first memorized the text and then wrote it down on request instead of writing it down as I read it.

            --
            The Tao of math: The numbers you can count are not the real numbers.
            • (Score: 2) by HiThere on Sunday December 31 2023, @05:31PM (1 child)

              by HiThere (866) Subscriber Badge on Sunday December 31 2023, @05:31PM (#1338496) Journal

              So singing a song is violation of copyright. Somehow I didn't think copyright law was quite that stupid.

              --
              Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
              • (Score: 0) by Anonymous Coward on Monday January 01 2024, @02:11AM

                by Anonymous Coward on Monday January 01 2024, @02:11AM (#1338539)

                old news, see https://support.easysong.com/hc/en-us/articles/360047682433-What-is-a-Public-Performance-License- [easysong.com]

                A public performance license is an agreement between a music user and the owner of a copyrighted composition (song), that grants permission to play the song in public, online, or on radio. This permission is also called public performance rights, performance rights, and performing rights.

                How Do I Get a Public Performance License?

                In most cases, public performance rights should be handled by the institutions, businesses, venues, and radio stations that present the music. Small indie artists, educators, and DJs often don't need to secure public performance rights for private events and most rights organizations do not license to individuals. Also, most web and terrestrial radio stations handle their own public performance licensing, so playing live public radio at your venue is usually fine. If you are unsure about your specific scenario, you should ask the venue or contact a performing rights organization such as ASCAP, BMI, or SESAC for details.

      • (Score: 2, Funny) by cereal_burpist on Saturday December 30 2023, @05:28AM

        by cereal_burpist (35552) on Saturday December 30 2023, @05:28AM (#1338322)

        Whoever purchases Hyperloop's deserted test track in the state of Nevada will have an exceptionally large children's water toy, if they so desire.

    • (Score: 5, Funny) by stormreaver on Friday December 29 2023, @07:57PM

      by stormreaver (5101) on Friday December 29 2023, @07:57PM (#1338276)

      The NYT claims to have convinced ChatGpt to return substantial portions of the original full text of multiple articles.

      The Times must be lying, as those good, honest, God-fearing people at Microsoft and OpenAI would never mislead and abuse the public.

    • (Score: 3, Funny) by deimtee on Friday December 29 2023, @09:26PM

      by deimtee (3272) on Friday December 29 2023, @09:26PM (#1338278) Journal

      Wow. It's like the New York Times got ChatGPT to write their articles.

      --
      If you cough while drinking cheap red wine it really cleans out your sinuses.
    • (Score: 2) by hendrikboom on Saturday December 30 2023, @12:44AM (1 child)

      by hendrikboom (1125) on Saturday December 30 2023, @12:44AM (#1338301) Homepage Journal

      Returning those documents? Does it mean anything, with the AI's already trained?

      • (Score: 2) by cereal_burpist on Saturday December 30 2023, @04:35AM

        by cereal_burpist (35552) on Saturday December 30 2023, @04:35AM (#1338321)
        I believe (perhaps) you're thinking of "returning" as in library books, as opposed to: A function returning a value or a string of text.
  • (Score: 5, Interesting) by Rich on Friday December 29 2023, @04:45PM (3 children)

    by Rich (945) on Friday December 29 2023, @04:45PM (#1338250) Journal

    Search engines copy for profit (e.g. Google Cache), too, and they copy even more verbatim than any AI that merely absorbs context of what it sees. To be able to update a particular web page, the old text must be purged from the word index and the new text must be added. This makes it unavoidable to keep a copy of the old text. The "robots.txt" "gentlemen's agreement" is a de facto legal practice that makes it possible to rip anything that's not explicitly excluded. The "payment" in exchange for your data was that (in the olden days) you could be found, or (today) receive monetizable clicks. With AI models trained on your data, you get nothing.

    With AI, the "amount of copying" is in theory less than what any search engine does for profit. Although, in practice, the AI operators certainly keep their "corpus" to train on stored as well.

    With a technology so important, letting a few random (or carefully picked?) judges create case law that entrenches what the players with the biggest legal budget want, leaves out the people, which according to the narrative can impose their will into laws. The implications of the technology should be discussed in the parliaments, and if laws need to be introduced to guide it, the lawmakers should provide clear written law on behalf of the people they represent. New proper written laws seem to be only enacted, when those players feel it's not enough what they can buy in courtrooms.

    Note that I didn't write what should be the outcome of this legal process, that is up to discussion and should be decided by majority vote. I'm sticking to the narrative here and leave out the issue of backroom lobbying and transparency, or lawmakers sitting idle to let case law be established by their puppeteers - but as is, the whole process is a failure of democracy. The ruleset to be established will be fallout from multinational plutocracy.

    • (Score: 2) by mcgrew on Saturday December 30 2023, @12:31AM (2 children)

      by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @12:31AM (#1338299) Homepage Journal

      True, but all you have to do to keep Google or any other search engine from copying any of your page is to include a robots.txt file. But if Google can't find it, why publish it on the internet to begin with?

      --
      Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
      • (Score: 2) by Rich on Saturday December 30 2023, @12:15PM (1 child)

        by Rich (945) on Saturday December 30 2023, @12:15PM (#1338339) Journal

        Entirely legit question. Technically, when the search engines started in the late 90s, they committed the biggest copyright infringement ever, after there was a legal shift from "copyright must be registered" to "everything is copyrighted". But, no plaintiff, no judge. The whole "abandonware" thing is similar. The stuff is absolutely necessary for conserving digital history. Technically, it's completely illegal, but as of now, it seems to be tolerated.

        Basic copyright law, and the Berne convention predate any automated information processing, and there wasn't any major public consideration of how to deal with the information age in law. As technology progresses (photocopier, magnetic tape, computers, internet, machine learning), the public gets a few backroom deals shoved down their throats that fortify corporate power (TRIPS; DMCA, CTEA, and their international equivalents). But at no time anything was even discussed at a lawmaking level that the public would consider sensible. Like "when a vendor drops supply or support of something, it's free game", which might look like a mandatory principle for sustainable development.

        What I say is that the lawmakers should deal with such things and codify them on behalf of who they represent, rather than living in a case law world where a single judge gets to decide (another example is the Oracle vs. Google API case, btw).

        • (Score: 2) by mcgrew on Saturday December 30 2023, @06:50PM

          by mcgrew (701) <publish@mcgrewbooks.com> on Saturday December 30 2023, @06:50PM (#1338373) Homepage Journal

          The only problem is that in the US, democracy is dead. The 1% of those with the most incomes basically write the laws for your "representatives". It's a racket. "Nice campaign ya got there, Senator, shame if I was to give your opponent's campaign five hundred million and you five million instead of each of you getting two hundred fifty million."

          --
          Poe's Law [nooze.org] has nothing to do with Edgar Allen Poetry
  • (Score: 1) by MonkeypoxBugChaser on Friday December 29 2023, @07:02PM (6 children)

    by MonkeypoxBugChaser (17904) on Friday December 29 2023, @07:02PM (#1338270) Homepage Journal

    On the one hand I dislike OpenAI, it's alignment, bias and confident lying. On the other hand I hate the NYT, it's spreading of propaganda and propping up of dictators.

    Think I have to go with Altman on this one though. I'd rather there be large language models, especially open source ones. Those can't be trained so easily at home and this ruling would lead to a whole chilling effect.

    Even when all NYT data is purged (the model will be better) everyone else will ask the same; the model will be worse....

    • (Score: 3, Insightful) by HiThere on Friday December 29 2023, @07:23PM (4 children)

      by HiThere (866) Subscriber Badge on Friday December 29 2023, @07:23PM (#1338273) Journal

      Perhaps it would be better if the LLMs were only trained on stuff that was out of copyright, or dedicated to the public domain. If Harry Potter is a good choice, why not Tom Swift or the Oz series?

      --
      Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
      • (Score: 2) by takyon on Friday December 29 2023, @07:41PM (3 children)

        by takyon (881) <reversethis-{gro ... s} {ta} {noykat}> on Friday December 29 2023, @07:41PM (#1338275) Journal

        *IF* full articles and books are popping out of highly compressed LLMs, then there was probably a lot of duplication of the text. Same with copyrighted photos and other unique images popping out of Stable Diffusion. Manage the data better and there's no problem with using Harry Potter as one of 100 billion things in the training.

        Alternatively, the chatbots are being allowed to find paywalled and copyrighted content where it resides on the live Web (for example, archive sites for NYT articles) and they are reproducing that. Lawsuits against Google News are similar.

        I think we're just going to have to wait for the Supreme Court to pick a winner. These companies might be making public domain models in parallel to prepare for a doomsday ruling.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 3, Insightful) by Anonymous Coward on Saturday December 30 2023, @12:25AM

      by Anonymous Coward on Saturday December 30 2023, @12:25AM (#1338296)

      > On the other hand I hate the NYT ...

      They do have their good points, for example this recent exposure of very lax workplace auditing. If true the auditing has routinely been missing serious child labor abuses, in the USA:
          https://www.nytimes.com/2023/12/28/us/migrant-child-labor-audits.html [nytimes.com]

      It's behind a paywall, but given the lax behavior of many archive sites, you shouldn't have too much trouble finding a copy(grin). I read it on paper, syndicated in my local newspaper, but was able to grab a bit by a quick ctrl-a, ctrl-c copy before the sign-up page covered it.

      (a photo caption) Miguel Sanchez, 17, came alone to the United States and has been working at an industrial dairy for about two years.Credit...Ruth Fremson/The New York Times

      They’re Paid Billions to Root Out Child Labor in the U.S. Why Do They Fail?

      Private auditors have failed to detect migrant children working for U.S. suppliers of Oreos, Gerber baby snacks, McDonald’s milk and many other products.
      By Hannah Dreier Dec. 28, 2023

      One morning in 2019, an auditor arrived at a meatpacking plant in rural Minnesota. He was there on behalf of the national drugstore chain Walgreens to ensure that the factory, which made the company’s house brand of beef jerky, was safe and free of labor abuses.

      He ran through a checklist of hundreds of possible problems, like locked emergency exits, sexual harassment and child labor. By the afternoon, he had concluded that the factory had no major violations. It could keep making jerky, and Walgreens customers could shop with a clear conscience.

      When night fell, another 150 workers showed up at the plant. Among them were migrant children who had come to the United States by themselves looking for work. Children as young as 15 were operating heavy machinery capable of amputating fingers and crushing bones.

      Migrant children would work at the Monogram Meat Snacks plant in Chandler, Minn., for almost four more years, until the Department of Labor visited this spring and found such severe child labor violations that it temporarily banned the shipment of any more jerky.

      There aren't many news organizations left that are willing to go out and research things like this. You won't find Google News or ChatGPT doing it, that's for sure. You may have heard the press called the "fourth estate"? https://en.wikipedia.org/wiki/Fourth_Estate [wikipedia.org]

(1)