Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 16 submissions in the queue.

Submission Preview

Link to Story

Google's "WaveNet" Neural Network Improves Text-to-Speech Quality

Accepted submission by takyon at 2016-09-11 12:44:57
Software

Google-owned DeepMind has presented a neural network that can generate more convincing human speech [bigthink.com], and other forms of audio such as classical music:

The company claims that its WaveNet creates speech that can mimic any human voice and closes the gap with human speech performance by more than 50%. Google's 500-person blind test study found people rating WaveNet's English speech at a 4.21 (5 being realistic human speech), while concatenate speech got a 3.86 and parametric an even worse 3.67. WaveNet also generated speech in Mandarin, which got similar results.

They did this by re-imagining currently used text-to-speech (TTS) processes. The two most common being concatenative TTS, used by Apple's Siri, which involves pre-recorded fragments of speech, and parametric TTS, which sounds even less natural, getting speech generated through computer algorithms. What's different about WaveNet is that its can directly model the raw waveform of an audio signal, an extremely complicated task that required a novel neural network. WaveNet learns from voice recordings, then on its own creates speech. This independence also allows the program to generate other kinds of audio, like music.

To bolster their claim, DeepMind released some samples, comparing their WaveNets with samples made by concatenate and parametric TTS. You be the judge.

Parametric: parametric-1.wav [googleapis.com] [and] parametric-2.wav [googleapis.com]

And now, this is what WaveNet generated: wavenet-1.wav [googleapis.com] [and] wavenet-2.wav [googleapis.com]

Daniel (UK) will live forever.

Google paper about WaveNet. [google.com]


Original Submission