If you've posted on Reddit, you're likely feeding the future of AI:
On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site's content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.
Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.
After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.
In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades' worth of human-generated content.
If the reported $60 million/year deal goes through, it's quite possible that if you've ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.
Related Stories
Reddit CEO Steve Huffman is standing by Reddit's decision to block companies from scraping the site without an AI agreement.
Last week, 404 Media noticed that search engines that weren't Google were no longer listing recent Reddit posts in results. This was because Reddit updated its Robots Exclusion Protocol (txt file) to block bots from scraping the site. The file reads: "Reddit believes in an open Internet, but not the misuse of public content." Since the news broke, OpenAI announced SearchGPT, which can show recent Reddit results.
[...]
In an interview with The Verge today, Huffman stood by the changes that led to Google temporarily being the only search engine able to show recent discussions from Reddit. Reddit and Google signed an AI training deal in February said to be worth $60 million a year. It's unclear how much Reddit's OpenAI deal is worth.
[...]
Per The Verge, Huffman claimed that Microsoft, Anthropic, and Perplexity haven't been negotiating. The three companies haven't commented on Huffman's interview."[It's been] a real pain in the ass to block these companies," Huffman told The Verge.
[...]
A Microsoft spokesperson told me last week that "Microsoft respects the robots.txt standard and we honor the directions provided by websites that do not want content on their pages to be used with our generative AI models."
[...]
Huffman also reportedly made reference to a June CNBC interview where Mustafa Suleyman, CEO of Microsoft AI, said: "I think that with respect to content that is already on the open web, the social contract of that content since the '90s has been that it is fair use. Anyone can copy it, re-create with it, reproduce with it. That has been freeware, if you like. That's been the understanding." Suleyman added that his comment didn't refer to certain types of web content, like news organizations."We've had Microsoft, Anthropic, and Perplexity act as though all of the content on the internet is free for them to use. That's their real position," Huffman said.
Related stories on SoylentNews:
Reddit Faces New Reality After Cashing in on its IPO - 20240328
Reddit Aims for $6.4bn Valuation Ahead of Initial Public Offering - 20240313
Reddit Sells Training Data to Unnamed AI Company Ahead of IPO - 20240223
Reddit is Removing Ability to Opt Out of Ad Personalization Based on Your Activity on the Platform - 20231004
Reddit Beats Film Industry, Won't Have to Identify Users Who Admitted Torrenting - 20230803
No Apologies as Reddit Halfheartedly Tries to Repair Ties With Moderators - 20230722
Ongoing Reddit Woes: Blackout Explained, Threatened Hacker Leak, Creative Continuing Protests - 20230620
Reddit Rollup: IPO Dreams and Developer Discontent - 20230612
(Score: 5, Funny) by mrpg on Saturday February 24 2024, @05:03PM (2 children)
I suppose some users will revolt and begin posting crap.
(Score: 5, Insightful) by VLM on Saturday February 24 2024, @05:14PM
Begin?
This is Reddit we're talking about here.
(Score: 3, Touché) by corey on Monday February 26 2024, @11:13AM
Yeah I was wondering why anyone would want to train from Reddit. Of course there are informative and well written posts in some subreddits but mostly it’s a pool of opinions, vitriol and misinformed echo chambers.
(Score: 4, Funny) by Rosco P. Coltrane on Saturday February 24 2024, @05:05PM
Anybody heard of it?
(Score: 5, Touché) by VLM on Saturday February 24 2024, @05:12PM
I've been to Reddit, and overall the content from Reddit would be primarily useful for training a mental health counseling bot.
They've gone through multiple pogroms in the past having the primary purpose of forcing out mentally healthy people.
On the other hand, the resulting pr0n will be interesting to read, and the idea of an "image generator" being trained off "/r/gonewild" would be interesting.
(Score: 5, Insightful) by VLM on Saturday February 24 2024, @05:52PM
Worth pointing out that Reddit goes through periodic digital book-burning episodes where they wipe entire genres of discussion.
I wonder what happens when you feed the image-generating AI bot the contents of the old banned subreddit that was mostly nonconsenting pictures of young ladies under-dressed in public.
Or subreddits that consisted of textual double-plus-ungood-think before Reddit's political pogroms.
I wonder if they're selling a full archive, or censored to 2024 far-left-wing political standards, or possibly even more censored than that.
(Score: 5, Informative) by Rich on Saturday February 24 2024, @07:08PM
2 days ago, Reuters Exclusive: Reddit in AI content licensing deal with Google
https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/ [reuters.com]