A new application that promises to be the "Photoshop of speech" is raising ethical and security concerns. Adobe unveiled Project Voco last week. The software makes it possible to take an audio recording and rapidly alter it to include words and phrases the original speaker never uttered, in what sounds like their voice.
One expert warned that the tech could further undermine trust in journalism. Another said it could pose a security threat. However, the US software firm says it is taking action to address such risks.
[...] "It seems that Adobe's programmers were swept along with the excitement of creating something as innovative as a voice manipulator, and ignored the ethical dilemmas brought up by its potential misuse," he told the BBC. "Inadvertently, in its quest to create software to manipulate digital media, Adobe has [already] drastically changed the way we engage with evidential material such as photographs.
"This makes it hard for lawyers, journalists, and other professionals who use digital media as evidence.
"In the same way that Adobe's Photoshop has faced legal backlash after the continued misuse of the application by advertisers, Voco, if released commercially, will follow its predecessor with similar consequences."
The risks extend beyond people being fooled into thinking others said something they did not. Banks and other businesses have started using voiceprint checks to verify customers are who they say they are when they phone in. One cybersecurity researcher said the companies involved had long anticipated something like Adobe's invention.
Related Stories
According to a story appearing in The Economist, several companies are developing software that can, with only a relatively short sample of a person's speech, produce a "clone" of their saying nearly anything. CandyVoice is a phone app developed by a new Parisian company that only needs 160 or so French or English phrases from which it can extract sufficient information to read plain text in that person's voice. Carnegie Mellon University has a similar program called Festvox. The Chinese internet giant, Baidu, claims it has software which needs only about 50 sentences. And now, Vivotext, a voice-cloning firm headed by Gershon Silbert in Tel Aviv, looks to expand on that as it licenses its software to Hasbro.
More troubling, any voice—including that of a stranger—can be cloned if decent recordings are available on YouTube or elsewhere. Researchers at the University of Alabama, Birmingham, led by Nitesh Saxena, were able to use Festvox to clone voices based on only five minutes of speech retrieved online. When tested against voice-biometrics software like that used by many banks to block unauthorised access to accounts, more than 80% of the fake voices tricked the computer. Alan Black, one of Festvox's developers, reckons systems that rely on voice-ID software are now "deeply, fundamentally insecure".
And, lest people get smug about the inferiority of machines, humans have proved only a little harder to fool than software is. Dr Saxena and his colleagues asked volunteers if a voice sample belonged to a person whose real speech they had just listened to for about 90 seconds. The volunteers recognised cloned speech as such only half the time (ie, no better than chance). The upshot, according to George Papcun, an expert witness paid to detect faked recordings produced as evidence in court, is the emergence of a technology with "enormous potential value for disinformation". Dr Papcun, who previously worked as a speech-synthesis scientist at Los Alamos National Laboratory, a weapons establishment in New Mexico, ponders on things like the ability to clone an enemy leader's voice in wartime.
(Score: 1, Insightful) by Anonymous Coward on Thursday November 24 2016, @09:43PM
"Then I suggest the record tapes have been deliberately changed. A computer expert can change record tapes, duplicate voices, say anything, say nothing."
(Score: 1) by Ethanol-fueled on Thursday November 24 2016, @10:59PM
Reminds me of a joke:
A young man is romantically hopeless because he has a ridiculously wimpy-sounding high-pitched voice. One day, he encounters a genie, who offers to grant the young man any wish. The young man tells the genie, "I want to have a smooth deep voice so I can attract a date," and the genie hands the young man a donkey dick and explains, "Whenever you are sounding wimpy, just suck on this magical donkey dick and your voice will be smooth and deep."
Well, the young man arranged a date with a lady he had never met and before that date he sucked on his magical donkey dick...and immediately afterward he said, "WHOA, THAT WAS GOOD!" Later, his date was a success, and the young man was to go on a second date with the lady. Before the second date he looked around but could not find his magical donkey dick...and in a panic, asked his mom, "Mom, have you seen my magical donkey dick? Where is it?
"I DON'T KNOW," she replied.
(Score: 2) by ilPapa on Friday November 25 2016, @12:42AM
That's actually a true story. I know the guy it happened to. Friend of a friend.
You are still welcome on my lawn.
(Score: 0) by Anonymous Coward on Friday November 25 2016, @01:31AM
The dude was dating his own mom?
(Score: 2) by Bot on Saturday November 26 2016, @07:41PM
That genie is a fraudster: any donkey dick WILL WORK JUST AS WELL
Account abandoned.
(Score: 2) by Username on Thursday November 24 2016, @10:00PM
It’s pretty low already. You’ll need James Cameron to raise it up from it’s current position.
I think this software would be nice for radio when they have to dump swear words.
(Score: 4, Informative) by BananaPhone on Thursday November 24 2016, @11:05PM
https://en.wikipedia.org/wiki/Steve_Wilson_(reporter)#WTVT_Whistleblower_lawsuit [wikipedia.org]
...An appeal was filed, and a ruling in February 2003 came down in favor of WTVT, who successfully argued that the FCC policy against falsification [of news] was not a "law, rule, or regulation",...
and ever since, the 'news' has lost its Truthiness.
(Score: 0) by Anonymous Coward on Friday November 25 2016, @05:56AM
Its ironic that you are misrepresenting the case by claiming that it enabled misrepresentation.
The case, as anyone who reads the full text of the wiki page can see, is not about whether it is legal or not for a news report to lie (of course its legal to lie because 1st amendment). It is about whether or not reporting lies to the FCC qualified as whistleblowing. The court rule that reporting lies does not qualify as whistleblowing.
By your logic trust in journalism died on the day the bill of rights was ratified because that's when lying became an inalienable right.
(Score: 3, Interesting) by Anonymous Coward on Thursday November 24 2016, @10:03PM
If Adobe can do it, then spooks have probably had the ability to do it for years.
Give it a few more years and there will be a GNU version.
The only fix is to start doing crypto-signatures on all audio and video. And while some might argue that not having a signature is better because it gives you plausible deniability, it also makes it that much easier to indict you in the court of public opinion. As Winston Churchill said, "A lie gets halfway around the world before the truth has a chance to get its pants on."
(Score: 1, Interesting) by Anonymous Coward on Thursday November 24 2016, @10:59PM
I can't remember which sci-fi movie it was, but it had a speech from a former POTUS which was altered to say something about UFOs. DC was making a stink about it because it was so accurate.
(Score: 1, Informative) by Anonymous Coward on Thursday November 24 2016, @11:37PM
Contact: http://mentalfloss.com/article/68241/why-film-contact-annoyed-bill-clinton [mentalfloss.com]
(Score: 0) by Anonymous Coward on Thursday November 24 2016, @11:06PM
Uh-oh, "signatures" will be abused to enforce DRM on every sound bite and clip. Adobe should be marched out and shot for creating this.
(Score: 2) by sjames on Thursday November 24 2016, @11:29PM
It's been doable for decades. I even managed a somewhat crude job of it with the old original soundblaster card on a '386.
The Adobe product just automates it.
(Score: 0) by Anonymous Coward on Thursday November 24 2016, @10:50PM
Thank you for your service. We won't be needing you any longer.
(Score: 0) by Anonymous Coward on Friday November 25 2016, @01:22AM
She's dead, Jim.
(Score: 0) by Anonymous Coward on Thursday November 24 2016, @11:50PM
How do you think current general domain voice synthesizers are built? Did you hear any modern one lately?
Best ones use concatenative synthesis, chopping samples or full words or even phrases and using DSP to stich the pieces. This is not much different.
(Score: 0) by Anonymous Coward on Friday November 25 2016, @12:34AM
Adobe demo'd some amazing tech at their conference: https://blogs.adobe.com/conversations/2016/11/lets-get-experimental-behind-the-adobe-max-sneaks.html [adobe.com]
(Score: 2) by Rich on Friday November 25 2016, @04:23PM
And now, a message from the president of the United States, George W. Bush...
I've adopted sophisticated terrorist tactics, and I'm a dangerous, dangerous man, with dangerous, dangerous weapons.
I want to drain the coal resources in America and foreign sources of crude oil.
I'm a weapon of mass destruction, I'm a brutal dictator, and I'm evil.
https://www.youtube.com/watch?v=qmWzGVmPN5o [youtube.com]
They would've had a field day with that software :)
(Score: 0) by Anonymous Coward on Friday November 25 2016, @10:10PM
Dude, this tool is meant for voice edition not mind reading.
(Score: 0) by Anonymous Coward on Saturday November 26 2016, @03:42AM
https://www.youtube.com/watch?v=n41bRHlr76Y [youtube.com]