Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.

Sections

SoylentNews

Log In

Create Account | Retrieve Password

Gift a Subscription

Why Gift

Machine-Learning Based Text-To-Speech System Clones Podcaster's Voice

posted by martyb on Monday May 20 2019, @02:03PM

from the you-dont-say dept.

Rich writes:

Developers at business AI company Dessa have come up with a new text-to-speech system called "RealTalk". In the version they demoed, it was trained to speak with the voice of popular podcaster Joe Rogan. The developers have put up a site with a blind test at http://fakejoerogan.com/. They must have been so impressed by their own creation that they discuss the implications at https://medium.com/@dessa_/real-talk-speech-synthesis-5dd0897eef7f.

Your humble submitter did the blind test and just barely had a majority of correct guesses, but was so impressed by the quality that he considered it newsworthy - how do you fare in the test?

Original Submission

Meta has unveiled a new AI tool, dubbed 'Voicebox', which it claims represents a breakthrough in AI-powered speech generation. However, the company won't be unleashing it on the public just yet - because doing so could be disastrous.
Voicebox is currently able to produce audio clips of speech in six languages (all of which are European of origin), and - according to a blog post from Meta - is the first AI model of its kind capable of completing tasks beyond what it was 'specifically trained to accomplish'. Meta claims that Voicebox handily outperforms competing speech-generation AIs in virtually every area.
So what exactly is it capable of? Well, for starters, it can spew out reasonably accurate text-to-speech replications of a person's voice using a sample audio file as short as two seconds, a seemingly innocuous ability that holds a huge amount of destructive potential in the wrong hands.
[...] Meta clearly believes its new tool is good enough to fool at least the majority of people [since] it's explicitly not releasing Voicebox to the public, but instead publishing a research paper and detailing a classifier tool that can identify Voicebox-generated speech from real human speech. Meta describes the classifier as "highly effective" - though notably not perfectly effective.
[...] A little caution, patience, and respect for the magnitude of this technology is a welcome sight - although I doubt Meta will sit on Voicebox for too long, since the shareholders will no doubt be wondering how much money it can make them...

Original Submission

This discussion has been archived. No new comments can be posted.

Machine-Learning Based Text-To-Speech System Clones Podcaster's Voice | Log In/Create an Account | Top | 9 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

(1)

release the code release the code (Score: 2) by takyon on Monday May 20 2019, @02:25PM (2 children)

by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday May 20 2019, @02:25PM (#845520) Journal

It seems to blow the tinny Lyrebird demos out of the water:
https://www.theverge.com/2017/4/24/15406882/ai-voice-synthesis-copy-human-speech-lyrebird [theverge.com]
But we can't be too sure, because:
A crucial advantage and responsibility we have as an applied AI company is knowing that there’s a huge difference between exploring AI in research and implementing it into the real world. To work on things like this responsibly, we think the public should first be made aware of the implications that speech synthesis models present before releasing anything open source.
Because of this, at this time we will not be releasing our research, model or datasets publicly.
This is the same copout crap [soylentnews.org] we see from OpenAI. In this case, it's easy to lecture people about ethics, when releasing the research would help your competitors.

--
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
- Re:release the code (Score: 3, Insightful) by Rich on Monday May 20 2019, @03:10PM
  
  by Rich (945) on Monday May 20 2019, @03:10PM (#845532) Journal
  
  I had a similar feeling about that non-release, but I'm fairly certain that any party able to do a billion-dollar buyout already has an ethics panel that ensures the knowledge rests in very good hands. :)
  As for how it works, the former papers of their mastermind Alex Krizhevsky and the traces the other mentioned guys left behind on the net might be a hint. If I had to have a go at it, I'd model the vocal tract up to a point where phonetic decomposition and resynthesis of original voice is somewhat better than good enough, have a net that translates a sliding window of letters and mixed in grammar information from further away to phonemes. The grammar analysis would be required to get intonation and accents right, and resolve ambiguities; i'd try that with a separate net. But what net topologies to use, and how deep, i'm completely clueless. The guys involved have enough experience to choose this for the probability of something thrown at it to stick - and in this case it seriously did.
  
  Parent
- Re:release the code (Score: 4, Insightful) by RamiK on Monday May 20 2019, @04:53PM
  
  by RamiK (1813) on Monday May 20 2019, @04:53PM (#845556)
  
  Since when turning "Power to the People" to "Power to the Corporations" has become ethical? Have we learned nothing from all those DNA labs that eagerly falsified tests for the police? If you have technology that allows falsifying text / voice / video recordings and you decide not to release it, you're guaranteeing someone else will develop it and use it for evil. The only moral thing to do is release it and make it trivial to falsify those sorts of recordings so their creditably falls and society adapts by implementing secure encryption right off the sensors.
  
  --
  compiling...
  
  Parent
You did that poorly? You did that poorly? (Score: 4, Interesting) by danmars on Monday May 20 2019, @02:54PM (3 children)

by danmars (3662) on Monday May 20 2019, @02:54PM (#845527)

I got them all right first try. What is so hard about this? The AI version is terrible at determining when to pause or change pace or inflection. I imagine it would be even easier for someone who actually listens to Joe Rogan.
- Re:You did that poorly? (Score: 3, Interesting) by Rich on Monday May 20 2019, @03:26PM
  
  by Rich (945) on Monday May 20 2019, @03:26PM (#845535) Journal
  
  It was late at night when I did the test, and I was paying attention to local modulation glitches, rather than overall flow. I've been developing an electronic music instrument over the last years and am probably conditioned on picking up the slightest flaw in transients, rather than processing the composition of longer spoken text. I re-listened, focusing on pausing, like you suggested, and got 6 out of 8 right, one more than in my first listening - it is not as easy as you suggest, at least not to me (fluent, but non-native English, mostly exposed to German, test on MacBook Pro 2009 built-in speakers, dent at 3kHz in my right ear).
  
  Parent
- Re:You did that poorly? (Score: 3, Interesting) by Alfred on Monday May 20 2019, @07:29PM
  
  by Alfred (4006) on Monday May 20 2019, @07:29PM (#845604) Journal
  
  Dang, you overachiever. I missed one. If the samples were longer i think it would make a difference. and i bet these were the best samples they had with many others being very obviously bad.
  
  I think they are getting closer to what the CIA/NSA was doing last decade. Part of that is to take a mic feed and translate it into different voice in real time. That analysis helps with landing the right inflections and stresses too.
  
  Parent
- Re:You did that poorly? (Score: 3, Interesting) by KilroySmith on Monday May 20 2019, @07:54PM
  
  by KilroySmith (2113) on Monday May 20 2019, @07:54PM (#845613)
  
  7 out of 8 for me - only missed the first one. If I had listened to several before guessing, then it would have been 8 of 8. The pacing is wrong, and there are fewer intonation changes in the faux Rogan selections, making it easy for me to tell the difference.
  That said, they samples were remarkably good. I'm quite impressed.
  
  Parent
False Equivalence False Equivalence (Score: 2, Funny) by Anonymous Coward on Monday May 20 2019, @09:33PM (1 child)

by Anonymous Coward on Monday May 20 2019, @09:33PM (#845647)

There is no proof that Joe Rogan isn't already artificially intelligent.
- Re:False Equivalence (Score: 0) by Anonymous Coward on Tuesday May 21 2019, @07:00PM
  
  by Anonymous Coward on Tuesday May 21 2019, @07:00PM (#845894)
  
  joe_rogan_mind_blown.jpg
  
  Parent

(1)

Moderator Help

The last time I saw him he was walking down Lover's Lane holding his own hand. -- Fred Allen

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In

Machine-Learning Based Text-To-Speech System Clones Podcaster's Voice

Related Stories

release the code release the code (Score: 2) by takyon on Monday May 20 2019, @02:25PM (2 children)

Re:release the code (Score: 3, Insightful) by Rich on Monday May 20 2019, @03:10PM

Re:release the code (Score: 4, Insightful) by RamiK on Monday May 20 2019, @04:53PM

You did that poorly? You did that poorly? (Score: 4, Interesting) by danmars on Monday May 20 2019, @02:54PM (3 children)

Re:You did that poorly? (Score: 3, Interesting) by Rich on Monday May 20 2019, @03:26PM

Re:You did that poorly? (Score: 3, Interesting) by Alfred on Monday May 20 2019, @07:29PM

Re:You did that poorly? (Score: 3, Interesting) by KilroySmith on Monday May 20 2019, @07:54PM

False Equivalence False Equivalence (Score: 2, Funny) by Anonymous Coward on Monday May 20 2019, @09:33PM (1 child)

Re:False Equivalence (Score: 0) by Anonymous Coward on Tuesday May 21 2019, @07:00PM

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In

Related Links

Machine-Learning Based Text-To-Speech System Clones Podcaster's Voice

Related Stories

release the code release the code (Score: 2) by takyon on Monday May 20 2019, @02:25PM (2 children)

Re:release the code (Score: 3, Insightful) by Rich on Monday May 20 2019, @03:10PM

Re:release the code (Score: 4, Insightful) by RamiK on Monday May 20 2019, @04:53PM

You did that poorly? You did that poorly? (Score: 4, Interesting) by danmars on Monday May 20 2019, @02:54PM (3 children)

Re:You did that poorly? (Score: 3, Interesting) by Rich on Monday May 20 2019, @03:26PM

Re:You did that poorly? (Score: 3, Interesting) by Alfred on Monday May 20 2019, @07:29PM

Re:You did that poorly? (Score: 3, Interesting) by KilroySmith on Monday May 20 2019, @07:54PM

False Equivalence False Equivalence (Score: 2, Funny) by Anonymous Coward on Monday May 20 2019, @09:33PM (1 child)

Re:False Equivalence (Score: 0) by Anonymous Coward on Tuesday May 21 2019, @07:00PM