Text-to-speech model can preserve speaker's emotional tone and acoustic environment:
On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.
Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.
(Score: 5, Insightful) by janrinok on Monday January 16 2023, @08:57AM (5 children)
I'll agree that it is easier to think of abusive applications than useful ones, but there are a few that spring to mind. For example, talking books/audio books are popular on smart devices and are extremely useful for those with impaired sight could be produced using well known voices. Perhaps even using an unknown neutral voice which might even reduce the cost of manufacturing - not that we will see any reduction in price!
Many cartoon-type films - which are currently voiced by well known (and expensive) actors - might also become cheaper to produce, particularly if translated into several languages or more.
However, in the time it has taken me to write this reply I have probably thought of a dozen or more ways in which it could be abused particularly by the criminal fraternity, or by those wishing to influence an individual's political popularity or to change the outcome of elections.
This is why we can't have nice things.....
I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
(Score: 0) by Anonymous Coward on Monday January 16 2023, @02:46PM (4 children)
"Many cartoon-type films - which are currently voiced by well known (and expensive) actors - might also become cheaper to produce"
translation: Voice actors will be screwed out a living by something imitating them. It already is an intensely competitive field, and they do maintain rights to the way their voice sounds.
I can't wait for a not-quite John DiMaggio AI...
(Score: 2) by DannyB on Monday January 16 2023, @05:14PM (1 child)
Don't worry. Soon enough they'll learn to create custom voices not made from a human sample. Use existing systems that simulate the human vocal tract. You don't need to come up with an entire voice any longer. Just a few good seconds. Tune for different desirable voices. Build a catalog of a few dozen good quality voices that can be used for legitimate porpoises. Speech to text. Car navigation systems. Audio books.
Oh, here is an application. A boss takes the word document he typed, use speech synthesis to prepare a dictation tape. The secretary listens to the tape and types the boss's document nice and neat.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 2) by gtomorrow on Tuesday January 17 2023, @12:53PM
That's ridiculous! Nobody uses tape anymore!
(Score: 0) by Anonymous Coward on Monday January 16 2023, @06:58PM
Hmm, I always thought his name was Joe
(Score: 2) by Mykl on Monday January 16 2023, @10:59PM
It might be considered a trade-off if the movie price were reduced to reflect the cheaper cost to produce the film, but we all know what the chances of that are...