Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Monday January 16 2023, @07:56AM   Printer-friendly
from the my-voice-is-no-longer-my-password dept.

Text-to-speech model can preserve speaker's emotional tone and acoustic environment:

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.


Original Submission

 
This discussion was created by Fnord666 (652) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Monday January 16 2023, @02:46PM (4 children)

    by Anonymous Coward on Monday January 16 2023, @02:46PM (#1287056)

    "Many cartoon-type films - which are currently voiced by well known (and expensive) actors - might also become cheaper to produce"

    translation: Voice actors will be screwed out a living by something imitating them. It already is an intensely competitive field, and they do maintain rights to the way their voice sounds.

    I can't wait for a not-quite John DiMaggio AI...

  • (Score: 2) by DannyB on Monday January 16 2023, @05:14PM (1 child)

    by DannyB (5839) Subscriber Badge on Monday January 16 2023, @05:14PM (#1287085) Journal

    Don't worry. Soon enough they'll learn to create custom voices not made from a human sample. Use existing systems that simulate the human vocal tract. You don't need to come up with an entire voice any longer. Just a few good seconds. Tune for different desirable voices. Build a catalog of a few dozen good quality voices that can be used for legitimate porpoises. Speech to text. Car navigation systems. Audio books.

    Oh, here is an application. A boss takes the word document he typed, use speech synthesis to prepare a dictation tape. The secretary listens to the tape and types the boss's document nice and neat.

    --
    The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
    • (Score: 2) by gtomorrow on Tuesday January 17 2023, @12:53PM

      by gtomorrow (2230) on Tuesday January 17 2023, @12:53PM (#1287211)

      Oh, here is an application. A boss takes the word document he typed, use speech synthesis to prepare a dictation tape. The secretary listens to the tape and types the boss's document nice and neat.

      That's ridiculous! Nobody uses tape anymore!

  • (Score: 0) by Anonymous Coward on Monday January 16 2023, @06:58PM

    by Anonymous Coward on Monday January 16 2023, @06:58PM (#1287106)

    Hmm, I always thought his name was Joe

  • (Score: 2) by Mykl on Monday January 16 2023, @10:59PM

    by Mykl (1112) on Monday January 16 2023, @10:59PM (#1287157)

    It might be considered a trade-off if the movie price were reduced to reflect the cheaper cost to produce the film, but we all know what the chances of that are...