from the say-what dept.
A new application that promises to be the "Photoshop of speech" is raising ethical and security concerns. Adobe unveiled Project Voco last week. The software makes it possible to take an audio recording and rapidly alter it to include words and phrases the original speaker never uttered, in what sounds like their voice.
One expert warned that the tech could further undermine trust in journalism. Another said it could pose a security threat. However, the US software firm says it is taking action to address such risks.
[...] "It seems that Adobe's programmers were swept along with the excitement of creating something as innovative as a voice manipulator, and ignored the ethical dilemmas brought up by its potential misuse," he told the BBC. "Inadvertently, in its quest to create software to manipulate digital media, Adobe has [already] drastically changed the way we engage with evidential material such as photographs.
"This makes it hard for lawyers, journalists, and other professionals who use digital media as evidence.
"In the same way that Adobe's Photoshop has faced legal backlash after the continued misuse of the application by advertisers, Voco, if released commercially, will follow its predecessor with similar consequences."
The risks extend beyond people being fooled into thinking others said something they did not. Banks and other businesses have started using voiceprint checks to verify customers are who they say they are when they phone in. One cybersecurity researcher said the companies involved had long anticipated something like Adobe's invention.
According to a story appearing in The Economist, several companies are developing software that can, with only a relatively short sample of a person's speech, produce a "clone" of their saying nearly anything. CandyVoice is a phone app developed by a new Parisian company that only needs 160 or so French or English phrases from which it can extract sufficient information to read plain text in that person's voice. Carnegie Mellon University has a similar program called Festvox. The Chinese internet giant, Baidu, claims it has software which needs only about 50 sentences. And now, Vivotext, a voice-cloning firm headed by Gershon Silbert in Tel Aviv, looks to expand on that as it licenses its software to Hasbro.
More troubling, any voice—including that of a stranger—can be cloned if decent recordings are available on YouTube or elsewhere. Researchers at the University of Alabama, Birmingham, led by Nitesh Saxena, were able to use Festvox to clone voices based on only five minutes of speech retrieved online. When tested against voice-biometrics software like that used by many banks to block unauthorised access to accounts, more than 80% of the fake voices tricked the computer. Alan Black, one of Festvox's developers, reckons systems that rely on voice-ID software are now "deeply, fundamentally insecure".
And, lest people get smug about the inferiority of machines, humans have proved only a little harder to fool than software is. Dr Saxena and his colleagues asked volunteers if a voice sample belonged to a person whose real speech they had just listened to for about 90 seconds. The volunteers recognised cloned speech as such only half the time (ie, no better than chance). The upshot, according to George Papcun, an expert witness paid to detect faked recordings produced as evidence in court, is the emergence of a technology with "enormous potential value for disinformation". Dr Papcun, who previously worked as a speech-synthesis scientist at Los Alamos National Laboratory, a weapons establishment in New Mexico, ponders on things like the ability to clone an enemy leader's voice in wartime.