Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

Machine-Learning Based Text-To-Speech System Clones Podcaster's Voice

posted by martyb on Monday May 20 2019, @02:03PM

from the you-dont-say dept.

Rich writes:

Developers at business AI company Dessa have come up with a new text-to-speech system called "RealTalk". In the version they demoed, it was trained to speak with the voice of popular podcaster Joe Rogan. The developers have put up a site with a blind test at http://fakejoerogan.com/. They must have been so impressed by their own creation that they discuss the implications at https://medium.com/@dessa_/real-talk-speech-synthesis-5dd0897eef7f.

Your humble submitter did the blind test and just barely had a majority of correct guesses, but was so impressed by the quality that he considered it newsworthy - how do you fare in the test?

Original Submission

This discussion has been archived. No new comments can be posted.

Machine-Learning Based Text-To-Speech System Clones Podcaster's Voice | Log In/Create an Account | Top | 9 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:release the code (Score: 3, Insightful) by Rich on Monday May 20 2019, @03:10PM

by Rich (945) on Monday May 20 2019, @03:10PM (#845532) Journal

I had a similar feeling about that non-release, but I'm fairly certain that any party able to do a billion-dollar buyout already has an ethics panel that ensures the knowledge rests in very good hands. :)

As for how it works, the former papers of their mastermind Alex Krizhevsky and the traces the other mentioned guys left behind on the net might be a hint. If I had to have a go at it, I'd model the vocal tract up to a point where phonetic decomposition and resynthesis of original voice is somewhat better than good enough, have a net that translates a sliding window of letters and mixed in grammar information from further away to phonemes. The grammar analysis would be required to get intonation and accents right, and resolve ambiguities; i'd try that with a separate net. But what net topologies to use, and how deep, i'm completely clueless. The guys involved have enough experience to choose this for the probability of something thrown at it to stick - and in this case it seriously did.

Parent

Starting Score:	1		point
Moderation		+1
Insightful=1, Total=1
Extra 'Insightful' Modifier		0
Karma-Bonus Modifier		+1

Total Score:		3

Moderator Help

And tomorrow will be like today, only more so. -- Isaiah 56:12, New Standard Version

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Machine-Learning Based Text-To-Speech System Clones Podcaster's Voice

Re:release the code (Score: 3, Insightful) by Rich on Monday May 20 2019, @03:10PM