Stories
Slash Boxes
Comments

SoylentNews is people

posted by chromas on Thursday July 04 2019, @01:28AM   Printer-friendly
from the This-is-important-information-if-you-or-your-loved-one-is-different-from-pondering-your-boat-engine dept.

Endless AI-generated spam risks clogging up Google's search results

Over the past year, AI systems have made huge strides in their ability to generate convincing text, churning out everything from song lyrics to short stories. Experts have warned that these tools could be used to spread political disinformation, but there's another target that's equally plausible and potentially more lucrative: gaming Google.

Instead of being used to create fake news, AI could churn out infinite blogs, websites, and marketing spam. The content would be cheap to produce and stuffed full of relevant keywords. But like most AI-generated text, it would only have surface meaning, with little correspondence to the real world. It would be the information equivalent of empty calories, but still potentially difficult for a search engine to distinguish from the real thing.

Just take a look at this blog post answering the question: "What Photo Filters are Best for Instagram Marketing?" At first glance it seems legitimate, with a bland introduction followed by quotes from various marketing types. But read a little more closely and you realize it references magazines, people, and — crucially — Instagram filters that don't exist:

You might not think that a mumford brush would be a good filter for an Insta story. Not so, said Amy Freeborn, the director of communications at National Recording Technician magazine. Freeborn's picks include Finder (a blue stripe that makes her account look like an older block of pixels), Plus and Cartwheel (which she says makes your picture look like a topographical map of a town.

The rest of the site is full of similar posts, covering topics like "How to Write Clickbait Headlines" and "Why is Content Strategy Important?" But every post is AI-generated, right down to the authors' profile pictures. It's all the creation of content marketing agency Fractl, who says it's a demonstration of the "massive implications" AI text generation has for the business of search engine optimization, or SEO.

"Because [AI systems] enable content creation at essentially unlimited scale, and content that humans and search engines alike will have difficulty discerning [...] we feel it is an incredibly important topic with far too little discussion currently," Fractl partner Kristin Tynski tells The Verge.

[...] The key question, then, is: can we reliably detect AI-generated text? Rowan Zellers of the Allen Institute for AI says the answer is a firm "yes," at least for now. Zellers and his colleagues were responsible for creating Grover, the tool Fractl used for its fake blog posts, and were able to also engineer a system that can spot Grover-generated text with 92 percent accuracy.

"We're a pretty long way away from AI being able to generate whole news articles that are undetectable," Zellers tells The Verge. "So right now, in my mind, is the perfect opportunity for researchers to study this problem, because it's not totally dangerous."

Spotting fake AI text isn't too hard, says Zellers, because it has a number of linguistic and grammatical tells. He gives the example of AI's tendency to re-use certain phrases and nouns. "They repeat things ... because it's safer to do that rather than inventing a new entity," says Zellers. It's like a child learning to speak; trotting out the same words and phrases over and over, without considering the diminishing returns.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by pipedwho on Thursday July 04 2019, @04:45AM (3 children)

    by pipedwho (2032) on Thursday July 04 2019, @04:45AM (#863033)

    The problem with all these systems is that they are effectively anonymous. There is nothing securely tying the actual identity of the poster to the posts/blogs/articles/news/etc.

    In the early days of the internet there was at least a human (with a few tiny exceptions) spewing the garbage. But, most people didn't and it was pretty easy to know what/who to ignore.

    These days, if you go to a reputable site, you can generally trust the information is not gibberish. But, that doesn't mean Google knows how to tell the difference. I imagine that general blogging sites or 'fly by night pop up sites' are likely to be trusted until they have developed a reputation for solid content.

    Google at some point will have to stop arbitrarily indexing bot/spam networks, and anything they link becomes useless. They'll have to start (maybe they already have) building a reputation/trust system for sites, and content generators. The concept is that a site that is normally trusted can be indexed and/or used for ranking, but untrusted sites cannot.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 1) by fustakrakich on Thursday July 04 2019, @04:59AM

    by fustakrakich (6150) on Thursday July 04 2019, @04:59AM (#863037) Journal

    reputation/trust system for sites

    Just as easy to game.

    Let's just get used to the fact that you won't know if you're talking to a real human unless you're face to face, and even that will be out the window when robots get really good.

    --
    La politica e i criminali sono la stessa cosa..
  • (Score: 3, Interesting) by Runaway1956 on Thursday July 04 2019, @07:56AM (1 child)

    by Runaway1956 (2926) Subscriber Badge on Thursday July 04 2019, @07:56AM (#863066) Journal

    Ouch.

    Let us give Google some credit for attempting what you describe. In days past, Google wanted to ensure that everyone using G+ and related services actually were real people. In short, they wanted to eliminate aliases, such as my own. They began policing users, and in fact, contacted me on multiple occasions about the name I use on those services. They were pretty sure I was not who I claimed to be, but they weren't sure who I actually was. I replied to all inquiries that I have a long and sordid history, complete with enemies who wanted to see me dead. Not true, but they couldn't prove or disprove it. If publishing my actual identity on the internet would result in multiple hitmen descending on my home, then Google would be complicit in all the deaths that resulted from publishing my identity.

    Google backed off - this time around. The next time they get this notion, they may not back off. At which time, I will lose the use of all Google services, because I won't back down either.

    As for a reputation/trust system, they're already working on it. Many have noticed that conservative stuff is tending to be hidden away behind progressive stuff that Google wants to promote.

    You can see that I am opposed to your idea, despite the fact that it seems a good idea, on the surface.

    • (Score: 0) by Anonymous Coward on Friday July 05 2019, @01:58AM

      by Anonymous Coward on Friday July 05 2019, @01:58AM (#863318)

      I replied to all inquiries that I have a long and sordid history, complete with enemies who wanted to see me dead. Not true, but they couldn't prove or disprove it.

      Just because you are paranoid, that does not mean that Google can prove it. 'Specially if you are Runaway.

      And Runaway1956 has a reputation as an inverse reputable source. If Runaway says it, it must be false. If he claims it is "progressive" or "liberal" or "democrat", it must be true. Tanks, Runsaway!