Blogger Matt Webb point out that nations have begun to need a strategic fact reserve, in light of the problem arising from LLMs and other AI models starting to consume and re-process the slop which they themselves have produced.
The future needs trusted, uncontaminated, complete training data.
From the point of view of national interests, each country (or each trading bloc) will need its own training data, as a reserve, and a hedge against the interests of others.
Probably the best way to start is to take a snapshot of the internet and keep it somewhere really safe. We can sift through it later; the world's data will never be more available or less contaminated than it is today. Like when GitHub stored all public code in an Arctic vault (02/02/2020): a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. Or the Svalbard Global Seed Vault.
But actually I think this is a job for librarians and archivists.
What we need is a long-term national programme to slowly, carefully accept digital data into a read-only archive. We need the expertise of librarians, archivists and museums in the careful and deliberate process of acquisition and accessioning (PDF).
(Look and if this is an excuse for governments to funnel money to the cultural sector then so much the better.)
It should start today.
Already, AI slop is filling the WWW and starting to drown out legitimate, authoritative sources through sheer volume.
Previously
(2025) Meta's AI Profiles Are Already Polluting Instagram and Facebook With Slop
(2024) Thousands Turned Out For Nonexistent Halloween Parade Promoted By AI Listing
(2024) Annoyed Redditors Tanking Google Search Results Illustrates Perils of AI Scrapers
(Score: 5, Insightful) by SomeGuy on Tuesday January 21 2025, @12:31PM (5 children)
What we need are more books. Written by actual people who know a topic.
It used to if you wanted to know about something, you bought a book or went to the library. Not so much any more.
These days you're lucky if you can find a web site that contains some incomplete half assed one-page description of a topic.
(Score: 3, Funny) by c0lo on Tuesday January 21 2025, @01:24PM
Easy-peasy... self-publishing on Amazon and with an audiobook format too - AI can help with the latter :large-grin:
https://www.youtube.com/@ProfSteveKeen https://soylentnews.org/~MichaelDavidCrawford
(Score: 2) by Freeman on Tuesday January 21 2025, @03:00PM (2 children)
Libraries still exist. Now, most US Public libraries may be glorified Internet Cafes with popular novels. However, there are still quite a number of legitimately good Public libraries and for better or worse Academic libraries have an entirely different agenda.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 3, Informative) by Thexalon on Tuesday January 21 2025, @05:37PM (1 child)
At least the academic libraries I've perused have had lots of really excellent bits of writing just sitting in the stacks not bothering anybody, often from a wide variety of eras. For the most part, the librarians are content to let those books stay right where they are.
The idea that librarians are rubbing their hands together in glee having successfully pushed some kind of agenda is just plain silly conspiracy theorizing, mostly by people who are pursuing a political agenda by trying to ban books that express ideas they don't like. I will also mention that the people trying to ban books often haven't even read them, they've just seen some list go by on the Internet and decided that nobody should be able to read them.
If your ideology is capable of being completely destroyed by its adherents reading or hearing about competing ideologies, then there's a good chance that your ideology is incorrect or immoral and deserves to go away.
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
(Score: 3, Informative) by Freeman on Tuesday January 21 2025, @06:09PM
I've worked in an academic library for nearly 20 years. Perhaps the biggest issue we've faced, essentially the entire time has been the perception that people can just Google stuff. Why does the Library even exist? The running gag is "Do you think our budget will be the same as it has been for the last 20 years?" The budget for our Library has essentially not changed in over 20 years. Yet database vendors want an extra 3% or more, every year. Looking at the state of Universities in general in the USA. We're not doing so great.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2) by VLM on Tuesday January 21 2025, @06:31PM
I'm not entirely disagreeing with you, but it's worth considering that perhaps books USED to be sold as 'the curriculum' for a topic whereas IRL the curriculum was usually instructor or self-learner driven but books tried to pretend to be a curriculum, and sometimes were successful-ish.
I think the "new online curriculum" is the online video series. Ranging from free uni open-courseware videos to subsidized free-ish videos (udemy is free at my public library if you pay property taxes in my community, more or less) to outright pay to watch and online credit and non-credit classes. Not books anymore.
Probably, the days of "buy a book" to blink a LED on an arduino or replace a power window motor on a van are simply gone; that's going to be some mix of website, blog, AI slop, and youtube video.
I've noticed I won't buy shovelware physical books anymore. There's a publisher well known to basically shovel lightly edited manpages and howto docs that is pretty much dead to me. On the other hand I'm keeping classic "curriculum" type books. Will I ever get rid of my Strang's Linear Algebra or my copies of Knuth or even Feynman's lecture series? Probably not.
I am keeping the REALLY long form howtos. "Modern C" or some of Holm's books are kind of like an entire fan website collection of well edited todo that flow into each other and I'm keeping those kind of books but not really buying more. What if you had a LISP howto doc that was 300+ pages long and really well written, or at least interesting as heck; well that's some of Holm's books, good stuff. Nobody is releasing that for free online; not yet anyway, maybe someday.
I think you might see a rise in almost vanity press type ebooks assisted by AI. I don't have time IRL to write a book titled "CoAP and LwM2M on ESP32" although in theory I could write a tolerable decent one; I could ask a suitably advanced LLM to shovel out some slop that I could insert real working code into and then thoroughly edit it and someone into that protocol on that platform might actually enjoy reading it.