Text-to-speech model can preserve speaker's emotional tone and acoustic environment:
On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.
Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.
(Score: 2) by DannyB on Monday January 16 2023, @05:14PM (1 child)
Don't worry. Soon enough they'll learn to create custom voices not made from a human sample. Use existing systems that simulate the human vocal tract. You don't need to come up with an entire voice any longer. Just a few good seconds. Tune for different desirable voices. Build a catalog of a few dozen good quality voices that can be used for legitimate porpoises. Speech to text. Car navigation systems. Audio books.
Oh, here is an application. A boss takes the word document he typed, use speech synthesis to prepare a dictation tape. The secretary listens to the tape and types the boss's document nice and neat.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 2) by gtomorrow on Tuesday January 17 2023, @12:53PM
That's ridiculous! Nobody uses tape anymore!