Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by hubie on Tuesday January 21 2025, @09:39AM   Printer-friendly
from the avoiding-the-ouroboros-of-LLM-slop dept.

Blogger Matt Webb point out that nations have begun to need a strategic fact reserve, in light of the problem arising from LLMs and other AI models starting to consume and re-process the slop which they themselves have produced.

The future needs trusted, uncontaminated, complete training data.

From the point of view of national interests, each country (or each trading bloc) will need its own training data, as a reserve, and a hedge against the interests of others.

Probably the best way to start is to take a snapshot of the internet and keep it somewhere really safe. We can sift through it later; the world's data will never be more available or less contaminated than it is today. Like when GitHub stored all public code in an Arctic vault (02/02/2020): a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. Or the Svalbard Global Seed Vault.

But actually I think this is a job for librarians and archivists.

What we need is a long-term national programme to slowly, carefully accept digital data into a read-only archive. We need the expertise of librarians, archivists and museums in the careful and deliberate process of acquisition and accessioning (PDF).

(Look and if this is an excuse for governments to funnel money to the cultural sector then so much the better.)

It should start today.

Already, AI slop is filling the WWW and starting to drown out legitimate, authoritative sources through sheer volume.

Previously
(2025) Meta's AI Profiles Are Already Polluting Instagram and Facebook With Slop
(2024) Thousands Turned Out For Nonexistent Halloween Parade Promoted By AI Listing
(2024) Annoyed Redditors Tanking Google Search Results Illustrates Perils of AI Scrapers


Original Submission

 
This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Insightful) by DadaDoofy on Tuesday January 21 2025, @02:26PM (3 children)

    by DadaDoofy (23827) on Tuesday January 21 2025, @02:26PM (#1389662)

    "trusted, uncontaminated, complete training data."

    And just how would this be accomplished? Anything done by humans is inherently biased. Why would the curation of AI training data be any different?

    AI is being sold as some kind of impartial arbiter of knowledge, but that couldn't be further from the truth. The sooner people catch on, the sooner this bubble will pop and we can move on.

    https://www.discovermagazine.com/technology/ai-systems-reflect-the-ideology-of-their-creators-say-scientists [discovermagazine.com]

    Starting Score:    1  point
    Moderation   +2  
       Insightful=2, Total=2
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 0, Flamebait) by Freeman on Tuesday January 21 2025, @03:02PM

    by Freeman (732) on Tuesday January 21 2025, @03:02PM (#1389670) Journal

    That's okay, they'll just train the AI that holds the keys to the Nuclear Launch Codes on the likes of "A Modest Proposal" and "Mein Kampf".

    --
    Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
  • (Score: 3, Disagree) by VLM on Tuesday January 21 2025, @06:43PM (1 child)

    by VLM (445) Subscriber Badge on Tuesday January 21 2025, @06:43PM (#1389712)

    Its the classic echo-chamber subculture problem where a weird small group (perhaps a LLM) feeds upon its own output until its space alien time, unable to interact with the rest of reality.

    For a human-ish example, look how weird Reddit posters are, or at least can be, compared to functioning humans IRL. The TDS salt there has been hilarious in recent days LOL.

    Real human generated data costing nothing, and AI slop output costing near nothing, the usual people will try to cut corners by feeding the output back into the input and it's not going to turn out very well.

    At some point a disconnected groupthink subculture becomes useless to the larger culture that's theoretically funding it based on the idea that it'll be useful. For example, could a LLM trained on itself until it's indistinguishable from 4chan complete with n-word stacks and rule 34 memeposting write useful text for ... anything else? Probably not.

    Then too there are other philosophical arguments like social communities always go right wing without massive centralized censorship and control, so a LLM feeding back on itself will inevitably go full Mein Kampf. Now if that's actually bad or not is open for debate, if its the natural inevitable evolution of thought, but some folks, mostly from the left, don't like that factual interpretation of reality.

    • (Score: 2) by VLM on Tuesday January 21 2025, @06:47PM

      by VLM (445) Subscriber Badge on Tuesday January 21 2025, @06:47PM (#1389713)

      Real human generated data costing nothing, and AI slop output costing near nothing

      ... Real human generated data costing much more than nothing, and AI slop output costing near nothing...