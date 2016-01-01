from the this-will-be-the-voice-of-skynet dept.
A research paper published by Google this month—which has not been peer reviewed—details a text-to-speech system called Tacotron 2, which claims near-human accuracy at imitating audio of a person speaking from text.
The system is Google's second official generation of the technology, which consists of two deep neural networks. The first network translates the text into a spectrogram (pdf), a visual way to represent audio frequencies over time. That spectrogram is then fed into WaveNet, a system from Alphabet's AI research lab DeepMind, which reads the chart and generates the corresponding audio elements accordingly.
[...] The Google researchers also demonstrate that Tacotron 2 can handle hard-to-pronounce words and names, as well as alter the way it enunciates based on punctuation. For instance, capitalized words are stressed, as someone would do when indicating that specific word is an important part of a sentence.
[...] Unlike some core AI research the company does, this technology is immediately useful to Google. WaveNet, first announced in 2016, is now used to generate the voice in Google Assistant. Once readied for production, Tacotron 2 could be an even more powerful addition to the service.
However, the system is only trained to mimic the one female voice; to speak like a male or different female, Google would need to train the system again.
(Score: 1, Funny) by Anonymous Coward on Friday December 29, @06:54AM
Your phone has virus making your battery slow. Please to be signing up for more targeted advertising to win a free battery replacement.
Reply to This
(Score: 0, Funny) by Anonymous Coward on Friday December 29, @07:07AM (2 children)
Female voices make me want to kill every woman.
Reply to This
(Score: 0) by Anonymous Coward on Friday December 29, @07:52AM
Red Pillar detected! Penisectomy indicated. Proceed?
Reply to This
Parent
(Score: 2) by LoRdTAW on Friday December 29, @02:06PM
Found the German BMW driver! [hoaxorfact.com]
Reply to This
Parent
(Score: 2) by MostCynical on Friday December 29, @09:35AM (1 child)
have they tested how *annoying* the voice is?
(Score: tau, Irrational)
Reply to This
(Score: 4, Funny) by LoRdTAW on Friday December 29, @02:09PM
I hear Fran Drescher was its trainer.
Reply to This
Parent
(Score: 1, Insightful) by Anonymous Coward on Friday December 29, @10:33AM (1 child)
Once you have the spectrogram, output is defined. What is wavenet really doing?
Reply to This
(Score: 2) by crafoo on Friday December 29, @03:36PM
Vocalizing the audio in any voice you would like. I don't think the spectrogram is 100% actual voice data ready to send to an output audio buffer.
Reply to This
Parent
(Score: 0) by Anonymous Coward on Friday December 29, @02:16PM (1 child)
But still cannot pronounce my name right. Some great fake AI there.
Reply to This
(Score: 0) by Anonymous Coward on Friday December 29, @02:26PM
AC is a very common name (here on SN), I suspect "she" can pronounce it perfectly.
Reply to This
Parent
(Score: 2) by crafoo on Friday December 29, @03:38PM
Will they publish their NN datasets? Are they using TensorFlow or some modified version? How soon until I can buy a box of "voice chips"?
Reply to This
(Score: 2) by donkeyhotay on Friday December 29, @04:39PM (1 child)
I'll grant that the results are pretty good, however, I had no trouble distinguishing the humans from the AI voices, even though in each case there was just one sentence. I suspect if there were an entire paragraph of speech it would become even more obvious.
Reply to This
(Score: 2) by takyon on Friday December 29, @05:23PM
It's pretty damn good compared to voice assistants that are in use or Daniel (UK) or whatever.
Let's hope this can be used with stuff like Mycroft [mycroft.ai], Jasper [github.io], or Lucida [lucida.ai].
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
Reply to This
Parent