Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 9 submissions in the queue.
posted by hubie on Tuesday January 21 2025, @09:39AM   Printer-friendly
from the avoiding-the-ouroboros-of-LLM-slop dept.

Blogger Matt Webb point out that nations have begun to need a strategic fact reserve, in light of the problem arising from LLMs and other AI models starting to consume and re-process the slop which they themselves have produced.

The future needs trusted, uncontaminated, complete training data.

From the point of view of national interests, each country (or each trading bloc) will need its own training data, as a reserve, and a hedge against the interests of others.

Probably the best way to start is to take a snapshot of the internet and keep it somewhere really safe. We can sift through it later; the world's data will never be more available or less contaminated than it is today. Like when GitHub stored all public code in an Arctic vault (02/02/2020): a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. Or the Svalbard Global Seed Vault.

But actually I think this is a job for librarians and archivists.

What we need is a long-term national programme to slowly, carefully accept digital data into a read-only archive. We need the expertise of librarians, archivists and museums in the careful and deliberate process of acquisition and accessioning (PDF).

(Look and if this is an excuse for governments to funnel money to the cultural sector then so much the better.)

It should start today.

Already, AI slop is filling the WWW and starting to drown out legitimate, authoritative sources through sheer volume.

Previously
(2025) Meta's AI Profiles Are Already Polluting Instagram and Facebook With Slop
(2024) Thousands Turned Out For Nonexistent Halloween Parade Promoted By AI Listing
(2024) Annoyed Redditors Tanking Google Search Results Illustrates Perils of AI Scrapers


Original Submission

 
This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by Freeman on Tuesday January 21 2025, @06:09PM

    by Freeman (732) on Tuesday January 21 2025, @06:09PM (#1389708) Journal

    I've worked in an academic library for nearly 20 years. Perhaps the biggest issue we've faced, essentially the entire time has been the perception that people can just Google stuff. Why does the Library even exist? The running gag is "Do you think our budget will be the same as it has been for the last 20 years?" The budget for our Library has essentially not changed in over 20 years. Yet database vendors want an extra 3% or more, every year. Looking at the state of Universities in general in the USA. We're not doing so great.

    --
    Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
    Starting Score:    1  point
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3