In late 2013, the Spike Jonze film Her imagined a future where people would form emotional connections with AI voice assistants. Nearly 12 years later, that fictional premise has veered closer to reality with the release of a new conversational voice model from AI startup Sesame that has left many users both fascinated and unnerved.
"I tried the demo, and it was genuinely startling how human it felt," wrote one Hacker News user who tested the system.
[...]
In late February, Sesame released a demo for the company's new Conversational Speech Model (CSM) that appears to cross over what many consider the "uncanny valley" of AI-generated speech
[...]
"At Sesame, our goal is to achieve 'voice presence'—the magical quality that makes spoken interactions feel real, understood, and valued," writes the company in a blog post.
[...]
Sometimes the model tries too hard to sound like a real human. In one demo posted online by a Reddit user called MetaKnowing, the AI model talks about craving "peanut butter and pickle sandwiches."
[...]
"I've been into AI since I was a child, but this is the first time I've experienced something that made me definitively feel like we had arrived," wrote one Reddit user.
[...]
Many other Reddit threads express similar feelings of surprise, with commenters saying it's "jaw-dropping" or "mind-blowing."
[...]
Mark Hachman, a senior editor at PCWorld, wrote about being deeply unsettled by his interaction with the Sesame voice AI. "Fifteen minutes after 'hanging up' with Sesame's new 'lifelike' AI, and I'm still freaked out," Hachman reported.
[...]
Others have compared Sesame's voice model to OpenAI's Advanced Voice Mode for ChatGPT, saying that Sesame's CSM features more realistic voices, and others are pleased that the model in the demo will roleplay angry characters, which ChatGPT refuses to do.
[...]
Under the hood, Sesame's CSM achieves its realism by using two AI models working together (a backbone and a decoder) based on Meta's Llama architecture that processes interleaved text and audio. Sesame trained three AI model sizes, with the largest using 8.3 billion parameters (an 8 billion backbone model plus a 300 million parameter decoder) on approximately 1 million hours of primarily English audio.
[...] Despite CSM's technological impressiveness, advancements in conversational voice AI carry significant risks for deception and fraud. The ability to generate highly convincing human-like speech has already supercharged voice phishing scams, allowing criminals to impersonate family members, colleagues, or authority figures with unprecedented realism.
[...]
Unlike current robocalls that often contain tell-tale signs of artificiality, next-generation voice AI could eliminate these red flags entirely.
[...]
It has inspired some people to share a secret word or phrase with their family for identity verification.
[...]
OpenAI itself held back its own voice technology from wider deployment over fears of misuse.Sesame sparked a lively discussion on Hacker News about its potential uses and dangers.
[...]
In one case, a parent recounted how their 4-year-old daughter developed an emotional connection with the AI model, crying after not being allowed to talk to it again.
[...]
The company says it plans to open-source "key components" of its research under an Apache 2.0 license, enabling other developers to build upon their work.
[...]
You can try the Sesame demo on the company's website, assuming that it isn't too overloaded with people who want to simulate a rousing [argument].
[Last link in article added by submitter.]
(Score: 3, Informative) by pkrasimirov on Saturday March 08, @11:19AM (3 children)
For people like me who don't want to actually talk to a machine, here's an example of someone who did it:
https://www.youtube.com/watch?v=fgvRn86B5X0 [youtube.com]
(Score: 0) by Anonymous Coward on Saturday March 08, @01:44PM
(Score: 2) by AnonTechie on Saturday March 08, @09:11PM (1 child)
Well, I tried it out, and it seemed quite realistic, although the responses were limited. With time, this will definitely improve. However, I worry that such technology will be used to scam vulnerable people.
Albert Einstein - "Only two things are infinite, the universe and human stupidity, and I'm not sure about the former."
(Score: 0) by Anonymous Coward on Sunday March 09, @09:32AM
They'll be used to scam scammers too.
https://www.npr.org/2024/12/10/nx-s1-5220362/daisy-ai-granny-o2-fraud-spam-prevention [npr.org]
https://www.theguardian.com/money/2025/feb/04/ai-granny-scammers-phone-fraud [theguardian.com]
Then the AIs will be talking to each other...
Hopefully by then they won't be burning tons of coal to generate the electricity to power the AIs involved.