Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio: SoylentNews Submission

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio

Accepted submission by Freeman at 2023-01-10 15:04:25 from the My Voice is no longer my password dept.

Freeman [soylentnews.org] writes:

https://arstechnica.com/information-technology/2023/01/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio/ [arstechnica.com]

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E [github.io] that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3 [arstechnica.com].

Original Submission

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Submission Preview

Microsoft’s new AI can simulate anyone’s voice with 3 seconds of audio