Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 7 submissions in the queue.

Submission Preview

Link to Story

Is AI Running Out of Training Data?

Accepted submission by fliptop at 2025-10-16 23:42:51 from the a-copy-of-a-copy-of-a-copy dept.
Software

The meteoric rise of artificial intelligence [businessinsider.com] may appear unstoppable — but it's facing a shortage of training data [businessinsider.com]:

"We've already run out of data," Neema Raphael, Goldman Sachs' chief data officer and head of data engineering, said on the bank's "Exchanges" podcast published on Tuesday.

Raphael said that this shortage may already be influencing how new AI systems [businessinsider.com] are built.

He pointed to China's DeepSeek [businessinsider.com] as an example, saying one hypothesis for its purported development costs came from training on the outputs of existing models rather than entirely new data.

[...] With the web tapped out, developers are turning to synthetic data — machine-generated text, images, and code. That approach offers limitless supply, but also risks overwhelming models with low-quality output or AI slop.

However, Raphael said he doesn't think the lack of fresh data will be a massive constraint, in part because companies are sitting on untapped reserves of information.

Rick Beato talked about [youtube.com] how he broke ChatGPT with a simple question and exposed the gaps in AI's "knowledge" that are filled with synthetic data.

Related: The Real (Economic) AI Apocalypse is Nigh [soylentnews.org]


Original Submission