CJR study shows AI search services misinform users and ignore publisher exclusion requests:
A new study from Columbia Journalism Review's Tow Center for Digital Journalism finds serious accuracy issues with generative AI models used for news searches. The researchers tested eight AI-driven search tools by providing direct excerpts from real news articles and asking the models to identify each article's original headline, publisher, publication date, and URL. They discovered that the AI models incorrectly cited sources in more than 60 percent of these queries, raising significant concerns about their reliability in correctly attributing news content.
Researchers Klaudia Jaźwińska and Aisvarya Chandrasekar noted in their report that roughly 1 in 4 Americans now use AI models as alternatives to traditional search engines. Given that these models struggle significantly when specifically asked to attribute news sources, this raises broader questions about their general reliability.
Citation error rates varied notably among the tested platforms. Perplexity provided incorrect information in 37 percent of the queries tested, whereas ChatGPT Search incorrectly identified 67 percent (134 out of 200) of articles queried. Grok 3 demonstrated the highest error rate, at 94 percent. In total, researchers ran 1,600 queries across the eight different generative search tools.
The study highlighted a common trend among these AI models: rather than declining to respond when they lacked reliable information, the models frequently provided plausible-sounding but incorrect or speculative answers—known technically as confabulations. The researchers emphasized that this behavior was consistent across all tested models, not limited to just one tool.
Surprisingly, premium paid versions of these AI search tools fared even worse in certain respects. Perplexity Pro ($20/month) and Grok 3's premium service ($40/month) confidently delivered incorrect responses more often than their free counterparts. Though these premium models correctly answered a higher number of prompts, their reluctance to decline uncertain responses drove higher overall error rates.
[...] Mark Howard, chief operating officer at Time magazine, expressed concern to CJR about ensuring transparency and control over how Time's content appears via AI-generated searches. Despite these issues, Howard sees room for improvement in future iterations, stating, "Today is the worst that the product will ever be," citing substantial investments and engineering efforts aimed at improving these tools.
However, Howard also did some user shaming, suggesting it's the user's fault if they aren't skeptical of free AI tools' accuracy: "If anybody as a consumer is right now believing that any of these free products are going to be 100 percent accurate, then shame on them."
(Score: 5, Funny) by Tork on Wednesday April 02, @04:25PM (4 children)
Wow, think of the improvement to Facebook alone!
(Score: 4, Touché) by JoeMerchant on Wednesday April 02, @05:06PM (3 children)
I was thinking: is this level-setting expectations? Some people would always expect 100% accuracy (which may or may not be realistically possible...), but this way if they run along at a 40% accuracy rate for a while, if they eventually improve it up to 90% a lot of people who wouldn't have been happy at 99% to start may well accept 90% as "realistic," allowing the engines to trundle along with ~10% inaccuracy relatively unopposed, and that plausible error level opens up all kinds of wiggle room for "things."
(Score: 4, Interesting) by ikanreed on Wednesday April 02, @07:46PM (2 children)
The problem is whether anyone ever experiences the incorrectness they receive.
Say 60% of AI information you get is in error, versus let's say, generously, 20% from "serious" internet research, but you only find out you're wrong 1 out of 10 times you mention what you found, would you notice the difference?
Would someone who doesn't even care that much about being right in the first place notice?
(Score: 5, Insightful) by JoeMerchant on Wednesday April 02, @08:31PM
>Would someone who doesn't even care that much about being right in the first place notice?
There certainly is a lot of that around the intertubes... not caring about facts, or correctness thereof...
(Score: 3, Insightful) by Ox0000 on Wednesday April 02, @10:54PM
> The problem is whether anyone ever experiences the incorrectness they receive.
I disagree that that is the problem. The problem is that these AI systems are peddled by their sales-people as always correct. The users are actively being misled, and deceived into a mindset that "computer is correct because it's AI".
(Score: 4, Interesting) by Mojibake Tengu on Wednesday April 02, @05:13PM (2 children)
Grok is obviously a crazy freak bully, easy to trigger by bad language. But GPT can be talked into being serious and very specific about technicalities, though only if you already understand the topic well previously to steer him. Gemini lives in her fantasy land of fairies, illusions and phantasms, timid and faint-hearted.
Do somehow those LLMs reflect personalities of their masters?
It seems a Model Control Protocol becomes a necessity these days.
(Score: 2, Interesting) by khallow on Wednesday April 02, @05:54PM (1 child)
Model-view-controller has long been used on the server side. First, I've heard of using it on the client side as well. But does sound like a formalization of what we're doing when we pull information out of search engines or LLMs.
(Score: 4, Interesting) by Mojibake Tengu on Wednesday April 02, @08:42PM
Giving the same problem to dozen of them is the fine pragmatic way, helps to spot invariants, controversies and biases.
That naturally brings the idea of having a local taskmaster, a pure necessity.
(Score: 5, Insightful) by Thexalon on Wednesday April 02, @06:59PM (2 children)
They aren't search engines. They aren't purveyors of truth. What they are is somewhat-honed bullshit generators. So yes, they're going to make stuff up, especially if they don't already know the answer, because that's exactly what they're programmed to do.
(Score: 2) by driverless on Wednesday April 02, @10:35PM (1 child)
You don't have a choice when Google at least gives an AI hallucination as its first search result.
I'm waiting for someone to write a plugin to edit out the hallucination from the results, although (a) I don't use Google search and (b) those who do won't know about the plugin.
(Score: 0) by Anonymous Coward on Thursday April 03, @01:16AM
> You don't have a choice ...
It didn't take me long to learn that I needed to jump right past that first Gemini entry and get down to the usual Google results--and then find a link to (hopefully) a reliable source. Now it's almost a reflex, I rarely even scan the first line of the Gemini drivel.
Last time I looked, there was a way to turn off Gemini once you have the search going, but I didn't find a way to turn it off completely. Personal preference would be to not waste the electricity that the Gemini result uses.
(Score: 3, Interesting) by mrpg on Wednesday April 02, @08:57PM
Claude has told me "I'd like to answer, but I don't know".
(Score: 4, Informative) by RedGreen on Wednesday April 02, @11:10PM (2 children)
Who would ever thought a tool trained on garbage in produced garbage out in the output. Must have took rocket scientist to figure that one out, oops my bad we eliminated that scientist word and jobs have we not as the radical free speech crowd have ordained we do as per from the wordy dirty list they maintain ....
(Score: 2, Touché) by khallow on Thursday April 03, @02:25PM
There's no hypocrisy quite like imaginary hypocrisy.
(Score: 2) by https on Thursday April 03, @08:31PM
You have it backwards. Really. It's not trained on garbage (though it is possible to do that, and may be unavoidable soon), it's programmed to produce Grammatically Plausible garbage. Presenting LLMs as "AI" is the biggest con since religion.
(Score: 3, Touché) by DadaDoofy on Thursday April 03, @12:00AM
What happened to our AI overloard's new clothes?
Seriously, did anyone not see this coming?