Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Monday May 20 2019, @02:03PM   Printer-friendly
from the you-dont-say dept.

Developers at business AI company Dessa have come up with a new text-to-speech system called "RealTalk". In the version they demoed, it was trained to speak with the voice of popular podcaster Joe Rogan. The developers have put up a site with a blind test at http://fakejoerogan.com/. They must have been so impressed by their own creation that they discuss the implications at https://medium.com/@dessa_/real-talk-speech-synthesis-5dd0897eef7f.

Your humble submitter did the blind test and just barely had a majority of correct guesses, but was so impressed by the quality that he considered it newsworthy - how do you fare in the test?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by danmars on Monday May 20 2019, @02:54PM (3 children)

    by danmars (3662) on Monday May 20 2019, @02:54PM (#845527)

    I got them all right first try. What is so hard about this? The AI version is terrible at determining when to pause or change pace or inflection. I imagine it would be even easier for someone who actually listens to Joe Rogan.

    Starting Score:    1  point
    Moderation   +2  
       Interesting=2, Total=2
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 3, Interesting) by Rich on Monday May 20 2019, @03:26PM

    by Rich (945) on Monday May 20 2019, @03:26PM (#845535) Journal

    It was late at night when I did the test, and I was paying attention to local modulation glitches, rather than overall flow. I've been developing an electronic music instrument over the last years and am probably conditioned on picking up the slightest flaw in transients, rather than processing the composition of longer spoken text. I re-listened, focusing on pausing, like you suggested, and got 6 out of 8 right, one more than in my first listening - it is not as easy as you suggest, at least not to me (fluent, but non-native English, mostly exposed to German, test on MacBook Pro 2009 built-in speakers, dent at 3kHz in my right ear).

  • (Score: 3, Interesting) by Alfred on Monday May 20 2019, @07:29PM

    by Alfred (4006) on Monday May 20 2019, @07:29PM (#845604) Journal
    Dang, you overachiever. I missed one. If the samples were longer i think it would make a difference. and i bet these were the best samples they had with many others being very obviously bad.

    I think they are getting closer to what the CIA/NSA was doing last decade. Part of that is to take a mic feed and translate it into different voice in real time. That analysis helps with landing the right inflections and stresses too.
  • (Score: 3, Interesting) by KilroySmith on Monday May 20 2019, @07:54PM

    by KilroySmith (2113) on Monday May 20 2019, @07:54PM (#845613)

    7 out of 8 for me - only missed the first one. If I had listened to several before guessing, then it would have been 8 of 8. The pacing is wrong, and there are fewer intonation changes in the faux Rogan selections, making it easy for me to tell the difference.
    That said, they samples were remarkably good. I'm quite impressed.