Text-to-speech model can preserve speaker's emotional tone and acoustic environment:
On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.
Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.
(Score: 2) by DannyB on Monday January 16 2023, @05:16PM
Hi,
This is the voice of a rich Nigerian prince who recently died. This voice is from beyond the grave. I spent my entire life trying to give away my substantial fortune by email, but nobody would return my emails!
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.