Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 16 submissions in the queue.
posted by Fnord666 on Monday January 16 2023, @07:56AM   Printer-friendly
from the my-voice-is-no-longer-my-password dept.

Text-to-speech model can preserve speaker's emotional tone and acoustic environment:

On Thursday, Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample. Once it learns a specific voice, VALL-E can synthesize audio of that person saying anything—and do it in a way that attempts to preserve the speaker's emotional tone.

Its creators speculate that VALL-E could be used for high-quality text-to-speech applications, speech editing where a recording of a person could be edited and changed from a text transcript (making them say something they originally didn't), and audio content creation when combined with other generative AI models like GPT-3.


Original Submission

 
This discussion was created by Fnord666 (652) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by takyon on Monday January 16 2023, @04:45PM

    by takyon (881) <{takyon} {at} {soylentnews.org}> on Monday January 16 2023, @04:45PM (#1287077) Journal

    https://screenrant.com/skyrim-bethesda-elder-scrolls-voice-actors-cast-bad/ [screenrant.com]
    https://www.escapistmagazine.com/starfield-is-the-one-game-that-should-use-ai-voice-actors/ [escapistmagazine.com]

    I believe Skyrim uses 48 Kbps audio for voice acting, and I've seen an estimate of 1215 minutes (20.25 hours) which might be the original game without expansions. That gets you to 437.4 megabytes. The original PC game was around 5.7 GB, small by today's standards.

    If you were to instead use text with a markup language, you're probably getting no less than 100:1 compression from text-to-voice (60 bytes for a second, including any markup).

    If the algorithm can synthesize speech in real-time, it can easily be used in video games. Now you're only limited by the amount of scripts you can write... and you can use a technology like Chat-GPT to write more dialog than a human possibly could. You may even be able to do it dynamically, within the game.

    You can also get any player's name inserted into voice lines.

    By the way, this markup could become very colloquial and loosely structured like you have seen with AI prompts, as long as the AI can handle it. [beggar voice]Spare [emphasis]just one[/emphasis] coin for an old beggar?[/] or [panting after running for several minutes, hoarse voice]He- he got away![/]

    These technologies will screw people over, but it's possible that voice actors can still get compensated depending on how scrupulous the companies involved are. For example, voice actors go into a company like Replica Studios and provide voice samples to train AI. Much more than the 3 seconds Microsoft is bragging about here, for better accuracy. The voice actors can come in again in the future to provide different samples, since their voice will change from aging or other factors. Although there will be algorithms to automatically age up voices or add the effects of 7,300 packs of cigarettes to the voice, I'm sure. Voice actors get paid for their time, and get some royalties each time their voice is licensed. I could see game companies licensing out thousands or tens of thousands of voices to provide more variety than what was previously possible. People don't like hearing the same 10-20 voices recycled over and over again for different characters.

    Personality rights [wikipedia.org], along with the contract you agree to, are what would protect your "voice". But there is no national personality right in the U.S. And there is nothing stopping a foreign company from ripping off someone's voice work and simply not distributing where they could be sued.

    Back on the Elder Scrolls theme, people have not left the 21-year-old Morrowind alone. RTX Remix is being used to add raytracing to the game. Someone is definitely going to take the numerous text conversations and use AI to voice act them.

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2