https://arstechnica.com/ai/2025/10/ai-powered-search-engines-rely-on-less-popular-sources-researchers-find/ [arstechnica.com]
Since last year’s disastrous rollout of Google’s AI Overviews [arstechnica.com], the world at large has been aware of how AI-powered search results can differ wildly from the traditional list of links search engines have generated for decades. Now, new research helps quantify that difference, showing that AI search engines tend to cite less popular websites and ones that wouldn’t even appear in the Top 100 links listed in an “organic” Google search.
In the pre-print paper “Characterizing Web Search in The Age of Generative AI [arxiv.org],” researchers from Ruhr University in Bochum, Germany, and the Max Planck Institute for Software Systems compared traditional link results from Google’s search engine to its AI Overviews and Gemini-2.5-Flash [search.google]. The researchers also looked at GPT-4o’s web search mode and the separate “GPT-4o with Search Tool,” which resorts to searching the web only when the LLM decides it needs information found outside its own pre-trained data.
[...]
Overall, the sources cited in results from the generative search tools tended to be from sites that were less popular than those that appeared in the top 10 of a traditional search, as measured by the domain-tracker Tranco [tranco-list.eu]. Sources cited by the AI engines were more likely than those linked in traditional Google searches to fall outside both the top 1,000 and top 1,000,000 domains tracked by Tranco. Gemini search in particular showed a tendency to cite unpopular domains, with the median source falling outside Tranco’s top 1,000 across all results.
[...]
For search terms pulled from Google’s list of Trending Queries for September 15, the researchers found GPT-4o with Search Tool often responded with messages along the lines of “could you please provide more information” rather than actually searching the web for up-to-date information.While the researchers didn’t determine whether AI-based search engines were overall “better” or “worse” than traditional search engine links, they did urge future research on “new evaluation methods that jointly consider source diversity, conceptual coverage, and synthesis behavior in generative search systems.”