Stories
Slash Boxes
Comments

SoylentNews is people

Submission Preview

Link to Story

LLMs Are Different and LLMs Are the Same

Accepted submission by hubie at 2026-05-10 04:08:05
Career & Education

Commercial LLMs challenged with tests of originality and creativity generate results that are more similar to one another than people's responses [duke.edu]:

There are already hundreds of thousands of large language models (LLMs) in existence with a few dozen commercial systems dominating the market. Between options such as GPT-4, Claude and Gemini, many people have their favorite, especially when it comes to creative tasks such as writing.

Those preferences, however, are likely entirely in the eye of the beholder. According to new research from Duke University, the creative outputs of commercial LLMs are more similar to each other than users might hope. When challenged with three standard tasks assessing creativity, answers from commercial LLMs are much more alike than their human counterparts.

"People might wonder if different LLMs will take them in different directions with the same prompts for creative projects," said Emily Wenger, the Cue Family Assistant Professor of Electrical and Computer Engineering at Duke. "This paper basically says no. LLMs are less creative as a population than humans."

[...] One seminal paper in this emerging field conducted by Anil Doshi and Oliver Hauser found that writers who used GPT-4 produced more creative stories than humans working alone. However, the same study showed that those LLM-aided stories were more similar to each other than were stories from human writers working solo.

[...] "Commercial LLMs have all been trained on the same dataset—the entirety of the internet—and they all have the same goal," Wenger said. "It seemed likely to me that this would limit the amount of diversity we'd see in their creativity, so I decided to find out."

[...] "Significant empirical research on the past few decades highlight how much human creativity depends on variability," said Yoed Kenett. "The problem, as we and others are increasingly showing, is that while LLMs appear to generate extremely original outputs, they are overly homogenized and not variable in their responses. This could have detrimental long-term impact on human creative thinking and thus must be addressed."

The results, which aimed to measure the variability and originality in responses between LLMs and people, were clear. While individual LLMs might outperform individual people in levels of creativity, as a whole, the algorithms' responses were much more similar to each other than the people's. Importantly, altering the LLM system prompt to encourage higher creativity only slightly increased their variability—and human responses still won out.

"This work has broad implications as people continue adopting and integrating LLMs into their daily life," Wenger said. "Over reliance on these tools will smooth the world's work toward the same underlying set of words or grammar, tending to make writing all look the same."

"If you're trying to come up with an original concept or product to stand out from the crowd," Wenger continued, "this work highly suggests you should bring together a diverse group of people to brainstorm rather than relying on AI."

Journal Reference: "Large language models are homogeneously creative." Emily Wenger and Yoed N. Kenett. PNAS Nexus, 2026, 5, pgag042. DOI: 10.1093/pnasnexus/pgag042 [doi.org]


Original Submission