Scientists once hoarded pre-nuclear steel; now we’re hoarding pre-AI content: SoylentNews Submission

Scientists once hoarded pre-nuclear steel; now we’re hoarding pre-AI content

Accepted submission by Freeman at 2025-06-20 16:50:46 from the resistance is futile dept.

Freeman [soylentnews.org] writes:

https://arstechnica.com/ai/2025/06/why-one-man-is-archiving-human-made-content-from-before-the-ai-explosion/ [arstechnica.com]

Former Cloudflare executive John Graham-Cumming [wikipedia.org] recently announced that he launched a website [lowbackgroundsteel.ai], lowbackgroundsteel.ai, that treats pre-AI, human-created content like a precious commodity—a time capsule of organic creative expression from a time before machines joined the conversation. "The idea is to point to sources of text, images and video that were created prior to the explosion of AI-generated content," Graham-Cumming wrote [jgc.org] on his blog last week. The reason? To preserve what made non-AI media uniquely human.
[...]
ChatGPT in particular triggered an avalanche of AI-generated text across the web, forcing at least one research project to shut down entirely.

That casualty was wordfreq [wiktionary.org], a Python library created by researcher Robyn Speer that tracked word frequency usage across more than 40 languages by analyzing millions of sources, including Wikipedia, movie subtitles, news articles, and social media. The tool was widely used by academics and developers to study how language evolves and to build natural language processing applications. The project announced [github.com] in September 2024 that it will no longer be updated because "the Web at large is full of slop generated by large language models, written by no one to communicate nothing."
[...]
The website points [lowbackgroundsteel.ai] to several major archives of pre-AI content, including a Wikipedia dump from August 2022 (before ChatGPT's November 2022 release), Project Gutenberg's collection of public domain books, the Library of Congress photo archive, and GitHub's Arctic Code Vault [github.com]—a snapshot of open source code buried in a former coal mine near the North Pole in February 2020. The wordfreq project appears on the list as well, flash-frozen from a time before AI contamination made its methodology untenable.
[...]
As atmospheric nuclear testing ended and background radiation returned to natural levels, low-background steel eventually became unnecessary for most uses. Whether pre-AI content will follow a similar trajectory remains a question.

Still, it feels reasonable to protect sources of human creativity now, including archival ones, because these repositories may become useful in ways that few appreciate at the moment. [arstechnica.com]
[...]
For now, lowbackgroundsteel.ai stands as a modest catalog of human expression from what may someday be seen as the last pre-AI era.

Original Submission

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Submission Preview

Scientists once hoarded pre-nuclear steel; now we’re hoarding pre-AI content