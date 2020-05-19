19/05/20/0329203 story
posted by martyb on Monday May 20, @02:03PM
from the you-dont-say dept.
Developers at business AI company Dessa have come up with a new text-to-speech system called "RealTalk". In the version they demoed, it was trained to speak with the voice of popular podcaster Joe Rogan. The developers have put up a site with a blind test at http://fakejoerogan.com/. They must have been so impressed by their own creation that they discuss the implications at https://medium.com/@dessa_/real-talk-speech-synthesis-5dd0897eef7f.
Your humble submitter did the blind test and just barely had a majority of correct guesses, but was so impressed by the quality that he considered it newsworthy - how do you fare in the test?
(Score: 2) by takyon on Monday May 20, @02:25PM (1 child)
It seems to blow the tinny Lyrebird demos out of the water:
https://www.theverge.com/2017/4/24/15406882/ai-voice-synthesis-copy-human-speech-lyrebird [theverge.com]
But we can't be too sure, because:
This is the same copout crap [soylentnews.org] we see from OpenAI. In this case, it's easy to lecture people about ethics, when releasing the research would help your competitors.
(Score: 2) by Rich on Monday May 20, @03:10PM
I had a similar feeling about that non-release, but I'm fairly certain that any party able to do a billion-dollar buyout already has an ethics panel that ensures the knowledge rests in very good hands. :)
As for how it works, the former papers of their mastermind Alex Krizhevsky and the traces the other mentioned guys left behind on the net might be a hint. If I had to have a go at it, I'd model the vocal tract up to a point where phonetic decomposition and resynthesis of original voice is somewhat better than good enough, have a net that translates a sliding window of letters and mixed in grammar information from further away to phonemes. The grammar analysis would be required to get intonation and accents right, and resolve ambiguities; i'd try that with a separate net. But what net topologies to use, and how deep, i'm completely clueless. The guys involved have enough experience to choose this for the probability of something thrown at it to stick - and in this case it seriously did.
(Score: 2) by danmars on Monday May 20, @02:54PM
I got them all right first try. What is so hard about this? The AI version is terrible at determining when to pause or change pace or inflection. I imagine it would be even easier for someone who actually listens to Joe Rogan.