Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 13 submissions in the queue.
posted by janrinok on Monday September 12 2016, @05:14AM   Printer-friendly
from the what-did-you-say? dept.

Researchers at Google DeepMind have released a paper (PDF) and writeup on their new "WaveNet" neural network. WaveNet is able to generate speech that arguably sounds far better than current text-to-speech (TTS) programs, and was also used to synthesize other audio, such as piano music.

We trained WaveNet using some of Google's TTS datasets so we could evaluate its performance. The following figure shows the quality of WaveNets on a scale from 1 to 5, compared with Google's current best TTS systems (parametric and concatenative), and with human speech using Mean Opinion Scores (MOS). MOS are a standard measure for subjective sound quality tests, and were obtained in blind tests with human subjects (from over 500 ratings on 100 test sentences). As we can see, WaveNets reduce the gap between the state of the art and human-level performance by over 50% for both US English and Mandarin Chinese.

For both Chinese and English, Google's current TTS systems are considered among the best worldwide, so improving on both with a single model is a major achievement.

There are multiple audio samples included on the writeup page.

Google-owned DeepMind has presented a neural network that can generate more convincing human speech, and other forms of audio such as classical music:

To bolster their claim, DeepMind released some samples, comparing their WaveNets with samples made by concatenate and parametric TTS. You be the judge.

Parametric: parametric-1.wav [and] parametric-2.wav

And now, this is what WaveNet generated: wavenet-1.wav [and] wavenet-2.wav

Daniel (UK) will live forever.


Original Submission #1   Original Submission #2

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Monday September 12 2016, @05:23AM

    by Anonymous Coward on Monday September 12 2016, @05:23AM (#400529)

    I don't need any "this is Google calling" robots telling me to turn off my adblocker and accept tech support for problems I don't even have.

    • (Score: 2) by jimtheowl on Monday September 12 2016, @05:39AM

      by jimtheowl (5929) on Monday September 12 2016, @05:39AM (#400533)
      No idea what you are ranting about. Are you running Windows by any chance?
      • (Score: 0) by Anonymous Coward on Monday September 12 2016, @05:45AM

        by Anonymous Coward on Monday September 12 2016, @05:45AM (#400534)

        OK, dumbfuck, let me spell it out for you. Right now, Indians in call centers cold-call people and claim to be "Windows calling" and try to convince idiots to install malware on their computers. Convincing voice synthesis can remove the need for the living Indian person, and make cold-call malware scams even more profitable. And who is facilitating this process by improving voice synthesis to the point where you don't even need to hire humans to run phone scams anymore? GOOGLE. Are you getting the message, yet, moron??

        • (Score: 0) by Anonymous Coward on Monday September 12 2016, @08:06AM

          by Anonymous Coward on Monday September 12 2016, @08:06AM (#400574)

          I think these samples must sound a bit too much like your ex-wife ?

        • (Score: 0) by Anonymous Coward on Monday September 12 2016, @08:40AM

          by Anonymous Coward on Monday September 12 2016, @08:40AM (#400584)

          Right now, Indians in call centers cold-call people and claim to be "Windows calling" and try to convince idiots to install malware on their computers.

          Really? And there are people idiotic enough to fall for this?

          • (Score: 2) by TheRaven on Monday September 12 2016, @12:51PM

            by TheRaven (270) on Monday September 12 2016, @12:51PM (#400670) Journal
            Yes, a surprising number. Someone calls from Microsoft and says that you're computer detected that you're having problems with them, what are the chances that:
            • You're using Windows (most people who use computers).
            • You're having problems with it (most people who use computers).
            • You're going to believe that someone cold calling can help (most people who don't know the tech industry well enough to know that tech support that is any use is incredibly rare).
            • You're going to install a helpful tool that the person on the support line said would help them fix your problem (most people who got this far).

            The thing that makes this scam work so well is that they often do help with whatever problem the user says they're suffering from, they just install some malware as a side effect.

            --
            sudo mod me up
  • (Score: -1, Troll) by Anonymous Coward on Monday September 12 2016, @10:05AM

    by Anonymous Coward on Monday September 12 2016, @10:05AM (#400613)

    I am interested to know if their TTS engine also brainwashes you with subliminal messages added to the speech.

    These subliminal messages will be hard to detect (by experts) and predictably easy to be called innocent NN mistakes. The TTS will detect if its being tested for subliminal messages or it is a naïve person listening to it and behave likewise.

    No thanks, I do not want more (or any) software or hardware of any kind from google. I do not need to hear any more by persuasive TTS engines that I should give my money to the jews... because ... holocaust.

  • (Score: 0) by Anonymous Coward on Monday September 12 2016, @02:09PM

    by Anonymous Coward on Monday September 12 2016, @02:09PM (#400710)

    One more step towards making it easier to frame/fake someone.

    They can do the video/CGI already, motion capture: https://www.youtube.com/watch?v=o_7CfWlkqm8 [youtube.com]
    See the CGI demos by various small studios. Even individuals: https://www.youtube.com/watch?v=34Q0BB8-2nA [youtube.com]

  • (Score: 2) by FatPhil on Monday September 12 2016, @03:40PM

    by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Monday September 12 2016, @03:40PM (#400761) Homepage
    Personally I think I prefered the parametric samples. The intonation was dreadful for the concatenative, it was painful to listen too (I'm no fan of many US accents generally, I'll be open about that, and most of that is because of the grating intonation), and not particularly brilliant for wavenet, but the most annoying feature of wavenet was the horrible scratchy noise.
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves