Media outlets are calling foul play over AI companies using their content to build chatbots. They may find friends in the Senate:
Logo text More than a decade ago, the normalization of tech companies carrying content created by news organizations without directly paying them — cannibalizing readership and ad revenue — precipitated the decline of the media industry. With the rise of generative artificial intelligence, those same firms threaten to further tilt the balance of power between Big Tech and news.
On Wednesday, lawmakers in the Senate Judiciary Committee referenced their failure to adopt legislation that would've barred the exploitation of content by Big Tech in backing proposals that would require AI companies to strike licensing deals with news organizations.
Richard Blumenthal, Democrat of Connecticut and chair of the committee, joined several other senators in supporting calls for a licensing regime and to establish a framework clarifying that intellectual property laws don't protect AI companies using copyrighted material to build their chatbots.
[...] The fight over the legality of AI firms eating content from news organizations without consent or compensation is split into two camps: Those who believe the practice is protected under the "fair use" doctrine in intellectual property law that allows creators to build upon copyrighted works, and those who argue that it constitutes copyright infringement. Courts are currently wrestling with the issue, but an answer to the question is likely years away. In the meantime, AI companies continue to use copyrighted content as training materials, endangering the financial viability of media in a landscape in which readers can bypass direct sources in favor of search results generated by AI tools.
[...] A lawsuit from The New York Times, filed last month, pulled back the curtain behind negotiations over the price and terms of licensing its content. Before suing, it said that it had been talking for months with OpenAI and Microsoft about a deal, though the talks reached no such truce. In the backdrop of AI companies crawling the internet for high-quality written content, news organizations have been backed into a corner, having to decide whether to accept lowball offers to license their content or expend the time and money to sue in a lawsuit. Some companies, like Axel Springer, took the money.
It's important to note that under intellectual property laws, facts are not protected.
Also at Courthouse News Service and Axios.
Related:
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
- Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
- Writers and Publishers Face an Existential Threat From AI: Time to Embrace the True Fans Model
Related Stories
Writers and publishers face an existential threat from AI: time to embrace the true fans model:
Walled Culture has written several times about the major impact that generative AI will have on the copyright landscape. More specifically, these systems, which can create quickly and cheaply written material on any topic and in any style, are likely to threaten the publishing industry in profound ways. Exactly how is spelled out in this great post by Suw Charman-Anderson on her Word Count blog. The key point is that large language models (LLMs) are able to generate huge quantities of material. The fact that much of it is poorly written makes things worse, because it becomes harder to find the good stuff[.]
[...] One obvious approach is to try to use AI against AI. That is, to employ automated vetting systems to weed out the obvious rubbish. That will lead to an expensive arms race between competing AI software, with unsatisfactory results for publishers and creators. If anything, it will only cause LLMs to become better and to produce material even faster in an attempt to fool or simply overwhelm the vetting AIs.
The real solution is to move to an entirely different business model, which is based on the unique connection between human creators and their fans. The true fans approach has been discussed here many times in other contexts, and once more reveals itself as resilient in the face of change brought about by rapidly-advancing digital technologies.
OpenAI could be fined up to $150,000 for each piece of infringing content:
Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.
NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.
Neither OpenAI nor the Times immediately responded to Ars' request to comment.
If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.
Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711
Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"
Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.
The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."
[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.
"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.
There's an old saying about the news business: If you want to make a small fortune, start with a large one:
As the prospects for news publishers waned in the last decade, billionaires swooped in to buy some of the country's most fabled brands. Jeff Bezos, the founder of Amazon, bought The Washington Post in 2013 for about $250 million. Dr. Patrick Soon-Shiong, a biotechnology and start-up billionaire, purchased The Los Angeles Times in 2018 for $500 million. Marc Benioff, the founder of the software giant Salesforce, purchased Time magazine with his wife, Lynne, for $190 million in 2018.
[...] But it increasingly looks like the billionaires are struggling just like nearly everyone else. Time, The Washington Post and The Los Angeles Times all lost millions of dollars last year, people with knowledge of the companies' finances have said, after considerable investment from their owners and intensive efforts to drum up new revenue streams.
[...] In the middle of last year, The Times was on track to lose $30 million to $40 million in 2023, according to three people with knowledge of the projections. Last year, the company cut about 74 jobs, and executives have met in recent days to discuss the possibility of deep job cuts, according to two other people familiar with the conversations. Members of The Los Angeles Times union have called an emergency meeting for Thursday to discuss the possibility of another "major" round of layoffs: "This is the big one," read the email to employees.
[...] Mr. Bezos hasn't fared much better at The Washington Post. Like many news organizations, The Post has struggled to hold onto the momentum it gained in the wake of the 2020 election. Sagging subscriptions and advertising revenue led to losses of about $100 million last year. At the end of the year, the company eliminated 240 of its 2,500 jobs through buyouts, including some of its well-regarded journalists.
[...] Time is facing similar headwinds. The publication lost around $20 million in 2023, according to two people with knowledge of the publication's financial picture. Time has weighed cutting costs in the first quarter of the year to help offset some of the losses, one of the people said.
Related:
- AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead
- Co-Founder of Salesforce Buys Time Magazine for $190 Million
- Publisher Drops Tronc Name, Reverts to Tribune Publishing
Microsoft is working with media startup Semafor to use its artificial intelligence chatbot to help develop news stories—part of a journalistic outreach that comes as the tech giant faces a multibillion-dollar lawsuit from the New York Times.
As part of the agreement, Microsoft is paying an undisclosed sum of money to Semafor to sponsor a breaking news feed called "Signals." The companies would not share financial details, but the amount of money is "substantial" to Semafor's business, said a person familiar with the matter.
[...] The partnerships come as media companies have become increasingly concerned over generative AI and its potential threat to their businesses. News publishers are grappling with how to use AI to improve their work and stay ahead of technology, while also fearing that they could lose traffic, and therefore revenue, to AI chatbots—which can churn out humanlike text and information in seconds.
The New York Times in December filed a lawsuit against Microsoft and OpenAI, alleging the tech companies have taken a "free ride" on millions of its articles to build their artificial intelligence chatbots, and seeking billions of dollars in damages.
[...] Semafor, which is free to read, is funded by wealthy individuals, including 3G capital founder Jorge Paulo Lemann and KKR co-founder Henry Kravis. The company made more than $10 million in revenue in 2023 and has more than 500,000 subscriptions to its free newsletters. Justin Smith said Semafor was "very close to a profit" in the fourth quarter of 2023.
Related stories on SoylentNews:
AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead - 20240112
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement - 20231228
Microsoft Shamelessly Pumping Internet Full of Garbage AI-Generated "News" Articles - 20231104
Google, DOJ Still Blocking Public Access to Monopoly Trial Docs, NYT Says - 20231020
After ChatGPT Disruption, Stack Overflow Lays Off 28 Percent of Staff - 20231017
Security Risks Of Windows Copilot Are Unknowable - 20231011
Microsoft AI Team Accidentally Leaks 38TB of Private Company Data - 20230923
Microsoft Pulls AI-Generated Article Recommending Ottawa Food Bank to Tourists - 20230820
A Jargon-Free Explanation of How AI Large Language Models Work - 20230805
the Godfather of AI Leaves Google Amid Ethical Concerns - 20230502
The AI Doomers' Playbook - 20230418
Ads Are Coming for the Bing AI Chatbot, as They Come for All Microsoft Products - 20230404
Deepfakes, Synthetic Media: How Digital Propaganda Undermines Trust - 20230319
The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times "has a near zero probability of winning" its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views.
"Trying to get everyone to license training data is not going to work because that's not what copyright is about," Jeffries wrote. "Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain. Period. Anyone who tells you otherwise is lying or simply does not understand how copyright works."
[...] Courts are supposed to consider four factors in fair use cases, but two of these factors tend to be the most important. One is the nature of the use. A use is more likely to be fair if it is "transformative"—that is, if the new use has a dramatically different purpose and character from the original. Judge Rakoff dinged MP3.com as non-transformative because songs were merely "being retransmitted in another medium."
In contrast, Google argued that a book search engine is highly transformative because it serves a very different function than an individual book. People read books to enjoy and learn from them. But a search engine is more like a card catalog; it helps people find books.
The other key factor is how a use impacts the market for the original work. Here, too, Google had a strong argument since a book search engine helps people find new books to buy.
[...] In 2015, the Second Circuit ruled for Google. An important theme of the court's opinion is that Google's search engine was giving users factual, uncopyrightable information rather than reproducing much creative expression from the books themselves.
[...] Recently, we visited Stability AI's website and requested an image of a "video game Italian plumber" from its image model Stable Diffusion.
[...] Clearly, these models did not just learn abstract facts about plumbers—for example, that they wear overalls and carry wrenches. They learned facts about a specific fictional Italian plumber who wears white gloves, blue overalls with yellow buttons, and a red hat with an "M" on the front.
These are not facts about the world that lie beyond the reach of copyright. Rather, the creative choices that define Mario are likely covered by copyrights held by Nintendo.
OpenAI has asked a federal judge to dismiss parts of the New York Times' copyright lawsuit against it, arguing that the newspaper "hacked" its chatbot ChatGPT and other artificial-intelligence systems to generate misleading evidence for the case:
OpenAI said in a filing in Manhattan federal court on Monday that the Times caused the technology to reproduce its material through "deceptive prompts that blatantly violate OpenAI's terms of use."
"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products."
OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.
[...] Courts have not yet addressed the key question of whether AI training qualifies as fair use under copyright law. So far, judges have dismissed some infringement claims over the output of generative AI systems based on a lack of evidence that AI-created content resembles copyrighted works.
Also at The Guardian, MSN and Forbes.
Previously:
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
- Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
- Why the New York Times Might Win its Copyright Lawsuit Against OpenAI
- AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead
[We have had several complaints recently (polite ones, not a problem) regarding the number of AI stories that we are printing. I agree, but that reflects the number of submissions that we receive on the subject. So I have compiled a small selection of AI stories into one and you can read them or ignore them as you wish. If you are making a comment please make it clear exactly which story you are referring to unless your comment is generic. The submitters each receive the normal karma for a submission. JR]
Image-scraping Midjourney bans rival AI firm for scraping images
On Wednesday, Midjourney banned all employees from image synthesis rival Stability AI from its service indefinitely after it detected "botnet-like" activity suspected to be a Stability employee attempting to scrape prompt and image pairs in bulk. Midjourney advocate Nick St. Pierre tweeted about the announcement, which came via Midjourney's official Discord channel.
[...] Siobhan Ball of The Mary Sue found it ironic that a company like Midjourney, which built its AI image synthesis models using training data scraped off the Internet without seeking permission, would be sensitive about having its own material scraped. "It turns out that generative AI companies don't like it when you steal, sorry, scrape, images from them. Cue the world's smallest violin."
[...] Shortly after the news of the ban emerged, Stability AI CEO Emad Mostaque said that he was looking into it and claimed that whatever happened was not intentional. He also said it would be great if Midjourney reached out to him directly. In a reply on X, Midjourney CEO David Holz wrote, "sent you some information to help with your internal investigation."
[...] When asked about Stability's relationship with Midjourney these days, Mostaque played down the rivalry. "No real overlap, we get on fine though," he told Ars and emphasized a key link in their histories. "I funded Midjourney to get [them] off the ground with a cash grant to cover [Nvidia] A100s for the beta."
Midjourney stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Midjourney&sort=2
Stable Diffusion (Stability AI) stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Stable+Diffusion&sort=2
A quote from Wyoming's governor and a local prosecutor were the first things that seemed slightly off to Powell Tribune reporter CJ Baker. Then, it was some of the phrases in the stories that struck him as nearly robotic:
The dead giveaway, though, that a reporter from a competing news outlet was using generative artificial intelligence to help write his stories came in a June 26 article about the comedian Larry the Cable Guy being chosen as the grand marshal of the Cody Stampede Parade.
[...] After doing some digging, Baker, who has been a reporter for more than 15 years, met with Aaron Pelczar, a 40-year-old who was new to journalism and who Baker says admitted that he had used AI in his stories before he resigned from the Enterprise.
[...] Journalists have derailed their careers by making up quotes or facts in stories long before AI came about. But this latest scandal illustrates the potential pitfalls and dangers that AI poses to many industries, including journalism, as chatbots can spit out spurious if somewhat plausible articles with only a few prompts.
[...] "In one case, (Pelczar) wrote a story about a new OSHA rule that included a quote from the Governor that was entirely fabricated," Michael Pearlman, a spokesperson for the governor, said in an email. "In a second case, he appeared to fabricate a portion of a quote, and then combined it with a portion of a quote that was included in a news release announcing the new director of our Wyoming Game and Fish Department."
Related:
- AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
- A Financial News Site Uses AI to Copy Competitors — Wholesale
- Sports Illustrated Published Articles by Fake, AI-Generated Writers
- OpenAI Has Released the Largest Version Yet of its Fake-News-Spewing AI
(Score: 2) by looorg on Friday January 12 2024, @05:30PM (11 children)
Starting to think that the next big thing is tech that prevents AI from harvesting the content. Time to get the greymatter working on that idea ...
(Score: 4, Interesting) by RS3 on Friday January 12 2024, @07:25PM (6 children)
I hate to be cynical or pessimistic, but my money is on the AI figuring it out much faster than our bio brains can try to block them. It's going to be up to the AI admins to know what the AI is doing, and programmers and admins putting limits on it. I'm sure most reading this can see the morass this is heading toward.
(Score: 4, Informative) by ikanreed on Friday January 12 2024, @07:54PM (4 children)
AI(as this current crop of AI companies pitch it) doesn't "figure" anything out. It scrapes a shit-ton of shit, and then finds patterns in it.
"Putting limits on it" at this point has amounted to a second layer of training it to not say anything too offensive. That's it.
The problem we face now is a pissload of absolutely useless content produced to grab pennies of advertising dollars. And any attempt to "limit" that will face inherent problems with the inoffensiveness of the "desired" content. The only problem is quantity and there's no way to police that.
(Score: 2) by RS3 on Friday January 12 2024, @08:16PM (3 children)
I understand you, but it might be a very fine-line definition. And maybe some AI have much more capability than we're being told. I'm reasonably certain there's much more research into much higher levels of reasoning, "figuring out", etc.
I'm more interested in what constitutes moral and ethical values in AI as AI development ensues.
(Score: 5, Interesting) by ikanreed on Friday January 12 2024, @08:26PM (1 child)
No. It really doesn't. When not processing or generating text the transformator model doesn't "think" on its own.
It has two modes where the matrix of weights is being read or written in memory(outside of debugging tools, of course), and that's
1: it's being trained. If it's reading in new data the weights are being changed based on the difference between it sees and what it expected to see.
2: it's being asked to generate content, then it processes the input data through a convolutional matrix and spits out output.
Nowhere in that process is it abstractly considering trying to solve a problem "out of scope" of anticipating the outputs for the inputs. The code simply does not work that way.
(Score: 3, Insightful) by RS3 on Friday January 12 2024, @10:46PM
Not an arguer; don't mean to argue. I'm sure you're right, for some given code, meaning some very specific (and limiting) definition of "AI".
But, you can't be sure that nobody is working on much higher levels of "thinking", even if it's mostly iterative in guessing outcomes and pattern-matching them against known conclusions. Our brains pretty much work that way, hopefully we learn things like touching the hot stove is not one of the better possible paths. Even sci-fi authors have envisioned "thinking" computers that have a huge database, including iterative and multi-step events / processes and outcomes, some degree of random generator that conjures possibilities, and tests them against known outcomes, and databanks (caches) them too. I dunno, doesn't seem all that far-fetched, but I'm not deep in that world.
(Score: 3, Insightful) by hendrikboom on Friday January 12 2024, @11:45PM
Yes, there is research into "figuring out".
One team is having an AI generate some computer code and its proof of correctness. Then they feed that into anotger system that checks the proof using traditional formal logical systems, feeding thenerrors back to the generator.
Apparently they claim to reach about 60% success. I do not know how big the programs are. My guess is small.
(Score: 3, Insightful) by looorg on Friday January 12 2024, @08:01PM
I'm not saying the solution won't be utter snakeoil but I wouldn't be all to surprise if I soon get calls about people trying to sell the latest and greatest in AI-blocking technology.
(Score: 2) by crafoo on Friday January 12 2024, @08:29PM (1 child)
there might be money in that if you can come up with something good. worth thinking about.
on the other side of the battle line, "big tech" could hire 10,000 Indians to busily paraphrase all news articles in their own words and feed that gooblygook into the AI feeder maw. I don't believe reading and article and paraphrasing it in your own words is illegal. yet.
(Score: 1, Interesting) by Anonymous Coward on Friday January 12 2024, @09:34PM
Legally you can only do that to facts. Anyone who sues that AI company will be claiming their content is fiction.
(Score: 2) by mhajicek on Friday January 12 2024, @10:42PM (1 child)
Some sites are using alternate character sets, that look like normal letters.
The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
(Score: 2) by looorg on Friday January 12 2024, @10:58PM
While that could work it would seem to be a simple thing for the AI to process. I guess that is the crux of the matter. It should not inconvenience the human, and their eyeballs (and advert$), but it should at the same time be annoying -- very annoying for scraping. Not that the AI couldn't solve the problem if it was set to task, or if you could prevent that it would be genius. But just annoying enough so that if you scrape you get crap.
(Score: 2, Interesting) by Anonymous Coward on Friday January 12 2024, @06:58PM
Down the slippery slope we go.
(Score: 4, Interesting) by Rosco P. Coltrane on Friday January 12 2024, @08:21PM (4 children)
Yeah but fiction is, and news organizations these days serve more of that than facts. So they do have a leg to stand one.
(Score: 4, Insightful) by ikanreed on Friday January 12 2024, @08:29PM
And presentation of facts is too. If I make a map, that is copyrightable. I can't claim copyright to the idea of there being a street through downtown, but I can sure as hell claim copyright to a particular representation of that fact. (map companies include fake streets in their data for sake of showing that it was their presentation being ripped off)
(Score: 1) by granaiogeek on Saturday January 13 2024, @02:16AM (1 child)
Limit AI to scientific subjects.
Leave the arts to humans.
(Score: 4, Interesting) by PiMuNu on Saturday January 13 2024, @11:19AM
Uh, no thanks. I don't want their algorithms dumping a shedload of made up papers into the refereeing process that look convincing but are utter nonsense.
(Score: 2) by Thexalon on Saturday January 13 2024, @03:19AM
Facts aren't protected, but an article paragraph or sentence that records a fact is. So, for example, "The US and UK launched air strikes against the Houthis in Yemen today." can be copyrighted. But if somebody else writes "In Yemen today, the Houthis were struck by missiles from US fighters.", the writer of the first sentence can't sue the writer of the second sentence or vice versa.
What news organizations serve up more than facts or outright fiction is commentary. Commentary bears some basis in fact, but then adds the opinion of the talking heads and ideally gets them arguing with each other. If you pay attention to a TV news broadcast, you'll often see a format that amounts to 1-2 sentences worth of facts about an issue (20 seconds), and then they introduce the commentary section with things like "And now to discuss this issue, we have ____, ____, ____, and ____" and then they shout at each other until the next commercial break (3-4 minutes). Even local news likes that format, because finding out facts about an issue is hard and expensive, writing those facts into a coherent story is only a bit easier, but commentators yelling at each other is pretty cheap and easy to do.
And to quote a more wise talking head: "Same as it ever was".
The only thing that stops a bad guy with a compiler is a good guy with a compiler.