Developers at business AI company Dessa have come up with a new text-to-speech system called "RealTalk". In the version they demoed, it was trained to speak with the voice of popular podcaster Joe Rogan. The developers have put up a site with a blind test at http://fakejoerogan.com/. They must have been so impressed by their own creation that they discuss the implications at https://medium.com/@dessa_/real-talk-speech-synthesis-5dd0897eef7f.
Your humble submitter did the blind test and just barely had a majority of correct guesses, but was so impressed by the quality that he considered it newsworthy - how do you fare in the test?
(Score: 3, Insightful) by Rich on Monday May 20 2019, @03:10PM
I had a similar feeling about that non-release, but I'm fairly certain that any party able to do a billion-dollar buyout already has an ethics panel that ensures the knowledge rests in very good hands. :)
As for how it works, the former papers of their mastermind Alex Krizhevsky and the traces the other mentioned guys left behind on the net might be a hint. If I had to have a go at it, I'd model the vocal tract up to a point where phonetic decomposition and resynthesis of original voice is somewhat better than good enough, have a net that translates a sliding window of letters and mixed in grammar information from further away to phonemes. The grammar analysis would be required to get intonation and accents right, and resolve ambiguities; i'd try that with a separate net. But what net topologies to use, and how deep, i'm completely clueless. The guys involved have enough experience to choose this for the probability of something thrown at it to stick - and in this case it seriously did.