Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Saturday February 24, @03:51PM   Printer-friendly

If you've posted on Reddit, you're likely feeding the future of AI:

On Friday, Bloomberg reported that Reddit has signed a contract allowing an unnamed AI company to train its models on the site's content, according to people familiar with the matter. The move comes as the social media platform nears the introduction of its initial public offering (IPO), which could happen as soon as next month.

Reddit initially revealed the deal, which is reported to be worth $60 million a year, earlier in 2024 to potential investors of an anticipated IPO, Bloomberg said. The Bloomberg source speculates that the contract could serve as a model for future agreements with other AI companies.

After an era where AI companies utilized AI training data without expressly seeking any rightsholder permission, some tech firms have more recently begun entering deals where some content used for training AI models similar to GPT-4 (which runs the paid version of ChatGPT) comes under license. In December, for example, OpenAI signed an agreement with German publisher Axel Springer (publisher of Politico and Business Insider) for access to its articles. Previously, OpenAI has struck deals with other organizations, including the Associated Press. Reportedly, OpenAI is also in licensing talks with CNN, Fox, and Time, among others.

In April 2023, Reddit founder and CEO Steve Huffman told The New York Times that it planned to charge AI companies for access to its almost two decades' worth of human-generated content.

If the reported $60 million/year deal goes through, it's quite possible that if you've ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video. Even without the deal, experts have discovered in the past that Reddit has been a key source of training data for large language models and AI image generators.


Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Funny) by mrpg on Saturday February 24, @05:03PM (2 children)

    by mrpg (5708) <{mrpg} {at} {soylentnews.org}> on Saturday February 24, @05:03PM (#1346088) Homepage

    I suppose some users will revolt and begin posting crap.

    • (Score: 5, Insightful) by VLM on Saturday February 24, @05:14PM

      by VLM (445) on Saturday February 24, @05:14PM (#1346092)

      begin posting crap

      Begin?

      This is Reddit we're talking about here.

    • (Score: 3, Touché) by corey on Monday February 26, @11:13AM

      by corey (2202) on Monday February 26, @11:13AM (#1346293)

      Yeah I was wondering why anyone would want to train from Reddit. Of course there are informative and well written posts in some subreddits but mostly it’s a pool of opinions, vitriol and misinformed echo chambers.

  • (Score: 4, Funny) by Rosco P. Coltrane on Saturday February 24, @05:05PM

    by Rosco P. Coltrane (4757) on Saturday February 24, @05:05PM (#1346089)

    Anybody heard of it?

  • (Score: 5, Touché) by VLM on Saturday February 24, @05:12PM

    by VLM (445) on Saturday February 24, @05:12PM (#1346091)

    it's quite possible that if you've ever posted on Reddit, some of that material may be used to train the next generation of AI models that create text, still pictures, and video.

    I've been to Reddit, and overall the content from Reddit would be primarily useful for training a mental health counseling bot.

    They've gone through multiple pogroms in the past having the primary purpose of forcing out mentally healthy people.

    On the other hand, the resulting pr0n will be interesting to read, and the idea of an "image generator" being trained off "/r/gonewild" would be interesting.

  • (Score: 5, Insightful) by VLM on Saturday February 24, @05:52PM

    by VLM (445) on Saturday February 24, @05:52PM (#1346095)

    some of that material

    Worth pointing out that Reddit goes through periodic digital book-burning episodes where they wipe entire genres of discussion.

    I wonder what happens when you feed the image-generating AI bot the contents of the old banned subreddit that was mostly nonconsenting pictures of young ladies under-dressed in public.

    Or subreddits that consisted of textual double-plus-ungood-think before Reddit's political pogroms.

    I wonder if they're selling a full archive, or censored to 2024 far-left-wing political standards, or possibly even more censored than that.

  • (Score: 5, Informative) by Rich on Saturday February 24, @07:08PM

    by Rich (945) on Saturday February 24, @07:08PM (#1346103) Journal

    2 days ago, Reuters Exclusive: Reddit in AI content licensing deal with Google

    https://www.reuters.com/technology/reddit-ai-content-licensing-deal-with-google-sources-say-2024-02-22/ [reuters.com]

(1)