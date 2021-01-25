from the avoiding-the-ouroboros-of-LLM-slop dept.
Blogger Matt Webb point out that nations have begun to need a strategic fact reserve, in light of the problem arising from LLMs and other AI models starting to consume and re-process the slop which they themselves have produced.
The future needs trusted, uncontaminated, complete training data.
From the point of view of national interests, each country (or each trading bloc) will need its own training data, as a reserve, and a hedge against the interests of others.
Probably the best way to start is to take a snapshot of the internet and keep it somewhere really safe. We can sift through it later; the world's data will never be more available or less contaminated than it is today. Like when GitHub stored all public code in an Arctic vault (02/02/2020): a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. Or the Svalbard Global Seed Vault.
But actually I think this is a job for librarians and archivists.
What we need is a long-term national programme to slowly, carefully accept digital data into a read-only archive. We need the expertise of librarians, archivists and museums in the careful and deliberate process of acquisition and accessioning (PDF).
(Look and if this is an excuse for governments to funnel money to the cultural sector then so much the better.)
It should start today.
Already, AI slop is filling the WWW and starting to drown out legitimate, authoritative sources through sheer volume.
Previously
Related Stories
"Spreading misinformation suddenly becomes a noble goal," Redditor says:
A trend on Reddit that sees Londoners giving false restaurant recommendations in order to keep their favorites clear of tourists and social media influencers highlights the inherent flaws of Google Search's reliance on Reddit and Google's AI Overview.
In May, Google launched AI Overviews in the US, an experimental feature that populates the top of Google Search results with a summarized answer based on an AI model built into Google's web rankings. When Google first debuted AI Overview, it quickly became apparent that the feature needed work with accuracy and its ability to properly summarize information from online sources. AI Overviews are "built to only show information that is backed up by top web results," Liz Reid, VP and head of Google Search, wrote in a May blog post. But as my colleague Benj Edwards pointed out at the time, that setup could contribute to inaccurate, misleading, or even dangerous results: "The design is based on the false assumption that Google's page-ranking algorithm favors accurate results and not SEO-gamed garbage."
As Edwards alluded to, many have complained about Google Search results' quality declining in recent years, as SEO spam and, more recently, AI slop float to the top of searches. As a result, people often turn to the Reddit hack to make Google results more helpful. By adding "site:reddit.com" to search results, users can hone their search to more easily find answers from real people. Google seems to understand the value of Reddit and signed an AI training deal with the company that's reportedly worth $60 million per year.
But disgruntled foodies in London are reminding us of the inherent dangers of relying on the scraping of user-generated content to provide what's supposed to be factual, helpful information.
Apparently, some London residents are getting fed up with social media influencers whose reviews make long lines of tourists at their favorite restaurants, sometimes just for the likes. Christian Calgie, a reporter for London-based news publication Daily Express, pointed out this trend on X yesterday, noting the boom of Redditors referring people to Angus Steakhouse, a chain restaurant, to combat it.
Thousands Turn Out For Nonexistent Halloween Parade Promoted By AI Listing:
Thousands of Dubliners showed up for the city's much-anticipated Halloween parade on Thursday evening. They lined the streets from Parnell Street to Christchurch Cathedral, waiting for the promised three-hour parade that would "[transform] Dublin into a lively tapestry of costumes, artistic performances, and cultural festivities." A likely story. There was no parade, and never was one.
Would-be revelers started getting suspicious about an hour after the parade was supposed to begin, according to one attendee. The Gardaí, Ireland's national police service, tried to disperse the crowds and put out the message on social media that "contrary to information being circulated online, no Halloween parade is scheduled to take place in Dublin city centre this evening or tonight."
Over the remainder of the night, sleuths gradually teased out the culprit: a website based in Pakistan that consists solely of listings for Halloween events, some real and some totally made-up. Possibly the first clue that the Dublin parade was in the latter category was the listing's implication that Cristiano Ronaldo and MrBeast might appear. But in the days before the non-event, hype started trickling down via social media posts from actual people, which makes it harder to claim Dubliners should have known—if you see a friend posting about a Halloween parade, why wouldn't you believe there was going to be a Halloween parade?
The patient zero of this farce, however, appears to be a combination of classic SEO bait tactics and newfangled AI slop content. Every autumn, lots of people search for Halloween events nearby, and a site entirely devoted to cataloguing them will naturally rise in the Google rankings, which incentivize lots of things that are not necessarily "quality" or "accuracy." You click on the site, which looks professional enough, and they get some money for the ads you're served.
[...] That a fake listing for a Halloween parade would even be a thing anyone would want to create and promote is a product of all sorts of fucked-up incentives baked into our various tech platforms to produce authoritative-seeming garbage at scale. This is only a problem if you are a human who would like to attend a Halloween parade. But don't worry, our tech barons have promised that the slop faucets will not stop running until we've all drowned.
Hollow-eyed insincere robot posters are already flooding Meta's sites and they're everything we dreaded:
Update 1/03/24: After the publication of this article, Meta told 404 Media that it had begun to delete the AI-generated accounts and that many had been managed by humans. Since then, Meta has deleted the accounts. Our original story follows below.
As I stared into the dead-eyed visage of "Carter," one of Meta's new AI posters, I remembered a line from Dawn of the Dead. "When there's no more room in hell, the dead will walk the Earth."
Something about George Romero's 1978 film about doomed survivors riding out the zombie apocalypse in a shopping mall feels resonant today as I look across Meta's suite of AI-created profiles. The movie's blue-skinned corpses don't know they're dead. They just wander through the shopping center on autopilot, looking for something new to consume.
That's how many of our social media spaces feel now. Digital town squares populated by undead posters, zombies spouting lines they learned from an LLM, the digested material from decades of the internet spewed back at the audience. That's what Meta is selling now.
Meta's various sites have over 3 billion users, an incredible percentage of the world's population. But businesses demand constant growth and, not content with almost half of the living people on the planet, Meta has decided to cut out the middle-man. It is flooding Facebook and Instagram with AI-generated posters of its own creation.
A December 27, 2024 article in Financial Times laid out the vision. "We expect these AIs to actually, over time, exist on our platforms, kind of in the same way that accounts do," Connor Hayes, vice president of generative AI at Meta, told the outlet. "They'll have bios and profile pictures and be able to generate and share content powered by AI on the platform . . . that's where we see all of this going."
[...] The AIs don't seem to be faring well on Instagram. They have low engagement numbers and people are calling them out as AI slop. It's different on Facebook, where the norm has been AI-powered slop for a year now. The post has 13 likes and 2 comments on Instagram and 192 likes, 112 comments, and 33 shares on Facebook. Many of the comments are spam, links to other profiles, or phishing bait of one kind or another.
But it's all interaction and, on a spreadsheet, that's all that matters.
[...] The AI apocalypse is here and it's far stupider and more depressing than we were promised. Instead of being hunted down by a gleaming metal skeleton in a post-apocalyptic wasteland, we are surrounded by zombies endlessly repeating our own posts back to us.
And the worst is yet to come. Remember that to power these nightmares Big Tech is going to revive the nuclear power industry. That's our future. A barren mall kept alight with nuclear power, filled with the dead and the never-born.
(Score: 2) by canopic jug on Tuesday January 21, @09:45AM
One of the problems with AI Slop is that by obscuring authoritative sources, people begin to distrust everything because they have to.
A better link for the last paragraph is Radical disbelief and its causes [cybershow.uk] from the same site. That one speaks to the need for a strategic fact reserve in much more detail. AI slop will only get worse as it becomes even more common plus you get AI LLMs feeding on the growing pool of AI slop as it displaces legitimate material. Having one or more strategic fact reserves will at least allow for future models to train on actual information.
