Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by hubie on Tuesday October 29 2024, @09:22PM   Printer-friendly

"Spreading misinformation suddenly becomes a noble goal," Redditor says:

A trend on Reddit that sees Londoners giving false restaurant recommendations in order to keep their favorites clear of tourists and social media influencers highlights the inherent flaws of Google Search's reliance on Reddit and Google's AI Overview.

In May, Google launched AI Overviews in the US, an experimental feature that populates the top of Google Search results with a summarized answer based on an AI model built into Google's web rankings. When Google first debuted AI Overview, it quickly became apparent that the feature needed work with accuracy and its ability to properly summarize information from online sources. AI Overviews are "built to only show information that is backed up by top web results," Liz Reid, VP and head of Google Search, wrote in a May blog post. But as my colleague Benj Edwards pointed out at the time, that setup could contribute to inaccurate, misleading, or even dangerous results: "The design is based on the false assumption that Google's page-ranking algorithm favors accurate results and not SEO-gamed garbage."

As Edwards alluded to, many have complained about Google Search results' quality declining in recent years, as SEO spam and, more recently, AI slop float to the top of searches. As a result, people often turn to the Reddit hack to make Google results more helpful. By adding "site:reddit.com" to search results, users can hone their search to more easily find answers from real people. Google seems to understand the value of Reddit and signed an AI training deal with the company that's reportedly worth $60 million per year.

But disgruntled foodies in London are reminding us of the inherent dangers of relying on the scraping of user-generated content to provide what's supposed to be factual, helpful information.

Apparently, some London residents are getting fed up with social media influencers whose reviews make long lines of tourists at their favorite restaurants, sometimes just for the likes. Christian Calgie, a reporter for London-based news publication Daily Express, pointed out this trend on X yesterday, noting the boom of Redditors referring people to Angus Steakhouse, a chain restaurant, to combat it.

[...] As of this writing, asking Google for the best steak, steakhouse, or steak sandwich in London (or similar) isn't generating an AI Overview result for me. But when I searched for the best steak sandwich in London, the top result is from Reddit, including a thread from four days ago titled "Which Angus Steakhouse do you recommend for their steak sandwich?" and one from two days ago titled "Had to see what all the hype was about, best steak sandwich I've ever had!" with a picture of an Angus Steakhouse.

[...] Again, at this point the Angus Steakhouse hype doesn't appear to have made it into AI Overview. But it is appearing in Search results. And while this is far from being a dangerous attempt to manipulate search results or AI algorithms, it does highlight the pitfalls of Google results becoming dependent on content generated by users who could very easily have intentions other than providing helpful information. This is also far from the first time that online users, including on platforms outside of Reddit, have publicly declared plans to make inaccurate or misleading posts in an effort to thwart AI scrapers.

This also presents an interesting position for Reddit, which is banking heavily on AI deals to help it become profitable. In an interview with The Wall Street Journal published today, Reddit CEO Steve Huffman said that he believes Reddit has some of the world's best AI training data.

When asked if he fears "low quality, shallow content generated by AI" will make its way onto Reddit, Huffman answered, in part, that the source of AI is "actual intelligence," and that "there's a general lowering of quality on the internet because more content is written by AI. But I think that actually makes Reddit stand out more as the place where there's all of this human content. What people want is to hear from other people."


Original Submission

Related Stories

The Need for a Strategic Fact Reserve 46 comments

Blogger Matt Webb point out that nations have begun to need a strategic fact reserve, in light of the problem arising from LLMs and other AI models starting to consume and re-process the slop which they themselves have produced.

The future needs trusted, uncontaminated, complete training data.

From the point of view of national interests, each country (or each trading bloc) will need its own training data, as a reserve, and a hedge against the interests of others.

Probably the best way to start is to take a snapshot of the internet and keep it somewhere really safe. We can sift through it later; the world's data will never be more available or less contaminated than it is today. Like when GitHub stored all public code in an Arctic vault (02/02/2020): a very-long-term archival facility 250 meters deep in the permafrost of an Arctic mountain. Or the Svalbard Global Seed Vault.

But actually I think this is a job for librarians and archivists.

What we need is a long-term national programme to slowly, carefully accept digital data into a read-only archive. We need the expertise of librarians, archivists and museums in the careful and deliberate process of acquisition and accessioning (PDF).

(Look and if this is an excuse for governments to funnel money to the cultural sector then so much the better.)

It should start today.

Already, AI slop is filling the WWW and starting to drown out legitimate, authoritative sources through sheer volume.

Previously
(2025) Meta's AI Profiles Are Already Polluting Instagram and Facebook With Slop
(2024) Thousands Turned Out For Nonexistent Halloween Parade Promoted By AI Listing
(2024) Annoyed Redditors Tanking Google Search Results Illustrates Perils of AI Scrapers


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Insightful) by Rosco P. Coltrane on Tuesday October 29 2024, @09:45PM (2 children)

    by Rosco P. Coltrane (4757) on Tuesday October 29 2024, @09:45PM (#1379337)

    but for a different reason: to poison AI training data.

    • (Score: 5, Funny) by Frosty Piss on Wednesday October 30 2024, @12:37AM (1 child)

      by Frosty Piss (4971) on Wednesday October 30 2024, @12:37AM (#1379369)

      I've been peppering Reddit with bullshit for YEARS, because, well, Reddit is bullshit.

      • (Score: 1, Touché) by Anonymous Coward on Thursday October 31 2024, @03:55PM

        by Anonymous Coward on Thursday October 31 2024, @03:55PM (#1379617)

        Is there a need for peppering then?

  • (Score: 3, Funny) by Tork on Tuesday October 29 2024, @10:21PM (3 children)

    by Tork (3914) Subscriber Badge on Tuesday October 29 2024, @10:21PM (#1379345) Journal
    I was musing earlier today that I'd like an app that takes my photos and adds the wrong number of fingers to them. Preferably one that'd automate doing this as I share images on social media. Even if they find a magical AI way to mitigate it it still makes things more expensive for them.
    --
    🏳️‍🌈 Proud Ally 🏳️‍🌈
    • (Score: 4, Funny) by looorg on Tuesday October 29 2024, @11:04PM (2 children)

      by looorg (578) on Tuesday October 29 2024, @11:04PM (#1379350)

      If it's just done in a subtle way then why stop with fingers -- add an extra nose or ear, a third eye or a forked tongue. Have every one have red eyes or no iris or heterochromia (different colours on both eyes). Constantly changing eye colour might be more subtle, and easier, then adding or removing fingers.

      • (Score: 4, Insightful) by Unixnut on Wednesday October 30 2024, @12:18AM (1 child)

        by Unixnut (5779) on Wednesday October 30 2024, @12:18AM (#1379366)

        It seems the best way would be to use the same ML system to generate the nonsense you want to use to poison their ML models, especially if you can flood the system with such images in order to mess up the training of their models enough.

        A colossal waste of resources, both human and natural, but such is the nature of the human race.

        • (Score: 3, Informative) by khallow on Wednesday October 30 2024, @01:48PM

          by khallow (3766) Subscriber Badge on Wednesday October 30 2024, @01:48PM (#1379429) Journal
          Apparently, there is another approach called data poisoning [technologyreview.com] that attacks AI models by generating images with subtle flaws in them that are alleged to cause large failures in the AI. For example:

          The tool, called Nightshade, is intended as a way to fight back against AI companies that use artists’ work to train their models without the creator’s permission. Using it to “poison” this training data could damage future iterations of image-generating AI models, such as DALL-E, Midjourney, and Stable Diffusion, by rendering some of their outputs useless—dogs become cats, cars become cows, and so forth.

(1)