Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Monday January 16 2023, @07:56AM   Printer-friendly
from the my-voice-is-no-longer-my-password dept.

Text-to-speech model can preserve speaker's emotional tone and acoustic environment:

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.


Original Submission

 
This discussion was created by Fnord666 (652) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by janrinok on Monday January 16 2023, @08:57AM (5 children)

    by janrinok (52) Subscriber Badge on Monday January 16 2023, @08:57AM (#1287043) Journal

    I'll agree that it is easier to think of abusive applications than useful ones, but there are a few that spring to mind. For example, talking books/audio books are popular on smart devices and are extremely useful for those with impaired sight could be produced using well known voices. Perhaps even using an unknown neutral voice which might even reduce the cost of manufacturing - not that we will see any reduction in price!

    Many cartoon-type films - which are currently voiced by well known (and expensive) actors - might also become cheaper to produce, particularly if translated into several languages or more.

    However, in the time it has taken me to write this reply I have probably thought of a dozen or more ways in which it could be abused particularly by the criminal fraternity, or by those wishing to influence an individual's political popularity or to change the outcome of elections.

    This is why we can't have nice things.....

    --
    I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
    Starting Score:    1  point
    Moderation   +4  
       Insightful=4, Total=4
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 0) by Anonymous Coward on Monday January 16 2023, @02:46PM (4 children)

    by Anonymous Coward on Monday January 16 2023, @02:46PM (#1287056)

    "Many cartoon-type films - which are currently voiced by well known (and expensive) actors - might also become cheaper to produce"

    translation: Voice actors will be screwed out a living by something imitating them. It already is an intensely competitive field, and they do maintain rights to the way their voice sounds.

    I can't wait for a not-quite John DiMaggio AI...

    • (Score: 2) by DannyB on Monday January 16 2023, @05:14PM (1 child)

      by DannyB (5839) Subscriber Badge on Monday January 16 2023, @05:14PM (#1287085) Journal

      Don't worry. Soon enough they'll learn to create custom voices not made from a human sample. Use existing systems that simulate the human vocal tract. You don't need to come up with an entire voice any longer. Just a few good seconds. Tune for different desirable voices. Build a catalog of a few dozen good quality voices that can be used for legitimate porpoises. Speech to text. Car navigation systems. Audio books.

      Oh, here is an application. A boss takes the word document he typed, use speech synthesis to prepare a dictation tape. The secretary listens to the tape and types the boss's document nice and neat.

      --
      The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
      • (Score: 2) by gtomorrow on Tuesday January 17 2023, @12:53PM

        by gtomorrow (2230) on Tuesday January 17 2023, @12:53PM (#1287211)

        Oh, here is an application. A boss takes the word document he typed, use speech synthesis to prepare a dictation tape. The secretary listens to the tape and types the boss's document nice and neat.

        That's ridiculous! Nobody uses tape anymore!

    • (Score: 0) by Anonymous Coward on Monday January 16 2023, @06:58PM

      by Anonymous Coward on Monday January 16 2023, @06:58PM (#1287106)

      Hmm, I always thought his name was Joe

    • (Score: 2) by Mykl on Monday January 16 2023, @10:59PM

      by Mykl (1112) on Monday January 16 2023, @10:59PM (#1287157)

      It might be considered a trade-off if the movie price were reduced to reflect the cheaper cost to produce the film, but we all know what the chances of that are...