Researchers at Google DeepMind have released a paper (PDF) and writeup on their new "WaveNet" neural network. WaveNet is able to generate speech that arguably sounds far better than current text-to-speech (TTS) programs, and was also used to synthesize other audio, such as piano music.
We trained WaveNet using some of Google's TTS datasets so we could evaluate its performance. The following figure shows the quality of WaveNets on a scale from 1 to 5, compared with Google's current best TTS systems (parametric and concatenative), and with human speech using Mean Opinion Scores (MOS). MOS are a standard measure for subjective sound quality tests, and were obtained in blind tests with human subjects (from over 500 ratings on 100 test sentences). As we can see, WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese.
For both Chinese and English, Google's current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement.
There are multiple audio samples included on the writeup page.
Google-owned DeepMind has presented a neural network that can generate more convincing human speech, and other forms of audio such as classical music:
To bolster their claim, DeepMind released some samples, comparing their WaveNets with samples made by concatenate and parametric TTS. You be the judge.
Parametric: parametric-1.wav [and] parametric-2.wav
And now, this is what WaveNet generated: wavenet-1.wav [and] wavenet-2.wav
Daniel (UK) will live forever.
(Score: -1, Troll) by Anonymous Coward on Monday September 12 2016, @10:05AM
I am interested to know if their TTS engine also brainwashes you with subliminal messages added to the speech.
These subliminal messages will be hard to detect (by experts) and predictably easy to be called innocent NN mistakes. The TTS will detect if its being tested for subliminal messages or it is a naïve person listening to it and behave likewise.
No thanks, I do not want more (or any) software or hardware of any kind from google. I do not need to hear any more by persuasive TTS engines that I should give my money to the jews... because ... holocaust.