Blogger Matt Webb point out that nations have begun to need a strategic fact reserve, in light of the problem arising from LLMs and other AI models starting to consume and re-process the slop which they themselves have produced.
The future needs trusted, uncontaminated, complete training data.
From the point of view of national interests, each country (or each trading bloc) will need its own training data, as a reserve, and a hedge against the interests of others.
Probably the best way to start is to take a snapshot of the internet and keep it somewhere really safe. We can sift through it later; the world's data will never be more available or less contaminated than it is today. Like when GitHub stored all public code in an Arctic vault (02/02/2020): a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. Or the Svalbard Global Seed Vault.
But actually I think this is a job for librarians and archivists.
What we need is a long-term national programme to slowly, carefully accept digital data into a read-only archive. We need the expertise of librarians, archivists and museums in the careful and deliberate process of acquisition and accessioning (PDF).
(Look and if this is an excuse for governments to funnel money to the cultural sector then so much the better.)
It should start today.
Already, AI slop is filling the WWW and starting to drown out legitimate, authoritative sources through sheer volume.
Previously
(2025) Meta's AI Profiles Are Already Polluting Instagram and Facebook With Slop
(2024) Thousands Turned Out For Nonexistent Halloween Parade Promoted By AI Listing
(2024) Annoyed Redditors Tanking Google Search Results Illustrates Perils of AI Scrapers
(Score: 1, Flamebait) by VLM on Tuesday January 21 2025, @06:15PM (1 child)
Eventually kids will be taught that they can't use anything post 2020 copyright as a primary source and people will seek out pre-2020 sources.
Life is going to be REALLY difficult for technical stuff like post 2025 engineering data sheets or technical manuals.
Really non-fiction is not going to be much harder than what we're already doing with boycotting post 2010 woke-era fiction. There might be some parallels in that post-AI non-fiction sales might collapse just like post-woke fiction sales and viewership collapsed. Looking at what happened when fiction became enshitified, we may very well be in the last days of technical books (think Manning, Oreilly, and friends... not so much Packt which is already, um...)
(Score: 2) by Azuma Hazuki on Tuesday January 21 2025, @10:23PM
Define "woke."
I am "that girl" your mother warned you about...