Developers at business AI company Dessa have come up with a new text-to-speech system called "RealTalk". In the version they demoed, it was trained to speak with the voice of popular podcaster Joe Rogan. The developers have put up a site with a blind test at http://fakejoerogan.com/. They must have been so impressed by their own creation that they discuss the implications at https://medium.com/@dessa_/real-talk-speech-synthesis-5dd0897eef7f.
Your humble submitter did the blind test and just barely had a majority of correct guesses, but was so impressed by the quality that he considered it newsworthy - how do you fare in the test?
(Score: 4, Interesting) by danmars on Monday May 20 2019, @02:54PM (3 children)
I got them all right first try. What is so hard about this? The AI version is terrible at determining when to pause or change pace or inflection. I imagine it would be even easier for someone who actually listens to Joe Rogan.
(Score: 3, Interesting) by Rich on Monday May 20 2019, @03:26PM
It was late at night when I did the test, and I was paying attention to local modulation glitches, rather than overall flow. I've been developing an electronic music instrument over the last years and am probably conditioned on picking up the slightest flaw in transients, rather than processing the composition of longer spoken text. I re-listened, focusing on pausing, like you suggested, and got 6 out of 8 right, one more than in my first listening - it is not as easy as you suggest, at least not to me (fluent, but non-native English, mostly exposed to German, test on MacBook Pro 2009 built-in speakers, dent at 3kHz in my right ear).
(Score: 3, Interesting) by Alfred on Monday May 20 2019, @07:29PM
I think they are getting closer to what the CIA/NSA was doing last decade. Part of that is to take a mic feed and translate it into different voice in real time. That analysis helps with landing the right inflections and stresses too.
(Score: 3, Interesting) by KilroySmith on Monday May 20 2019, @07:54PM
7 out of 8 for me - only missed the first one. If I had listened to several before guessing, then it would have been 8 of 8. The pacing is wrong, and there are fewer intonation changes in the faux Rogan selections, making it easy for me to tell the difference.
That said, they samples were remarkably good. I'm quite impressed.