from the crisp-fakes dept.
There has been some controversy over Deepfakes, a process of substituting faces in video. Almost immediately, it was used for pornography. While celebrities were generally unamused, porn stars were alarmed by the further commodification of their rôle. The algorithm is widely available and several web sites removed objectionable examples. You know something is controversial when porn sites remove it. Reddit was central for Deepfakes/FakeApp tech support and took drastic action to remove discussion after it started to become synonymous with fictitious revenge porn and other variants of anti-social practices.
I found a good description of the deepfakes algorithm. It runs via a standard neural network library but requires considerable processing power on specific GPUs. I will describe the video input (with face to be removed) as the source and the face to be replaced as the target. The neural network is trained with the target face only. The source is distorted and the neural network is trained to approximate reference images of the target. When the neural network is given the source, it has been trained to "undistort" the source to target.
If there are multiple faces in a frame of video, face recognition restricts input to the most likely face. Indeed, for maximum efficiency, this technique is used to crop source video in all cases. The trick that makes the process feasible is that the neural network is only trained with the target face. Furthermore, given the use of libraries, the unique code to achieve this objective is shockingly small.
A friend attempted to mix DeepFakes with the Internet meme of Downfall parodies. There is an infamous scene in the film Downfall (not to be confused with the film Falling Down) where Adolf Hitler rants prior to defeat. Unfaithful subtitles of the German dialog have been used to parody everything from corporate sales targets to sportsball management to the ongoing medical abuse of transsexual patients. Until now, only the words in the subtitles changed. The audio and video was otherwise unchanged. My friend hoped that it would be possible to insert the likeness of people being parodied.
Unfortunately, it doesn't work with the current algorithm. The number of faces is not a problem. The clipping and occlusion prevents the neural network from working effectively. It should be possible with an extension of the current algorithm but it is currently impractical.
A further development, found by the same friend, is the automatic conversion of a one sentence description into a very short video. The example system uses Flintstones cartoons. An example sentence would be "Fred dancing in the kitchen" and a rough but valid video is created which matches the description. Potentially, it would be possible to automatically convert a novel into a 100 minute film with no human intervention. Given that novels are frequently converted into films, there is a large amount of example data which may be used as reference. I know this would only be moderately easier than making a holodeck but experts may not be aware of the progress towards either goal.
takyon: An algorithm can also be used to manipulate facial movements to match video or audio input (see this example of Jordan Peele controlling Barack Obama's face). DARPA is holding an event that will task experts with making and catching "deepfakes".
Researchers have also created short "movies" (64x64, 32-frame animated GIFs) from text descriptions. It may be possible to synthesize scenes for a full length movie in the future without needing strong AI. After all, procedural generation could be used to create and populate a virtual city (like the one in Big Hero 6), and then it's a matter of writing some kind of coherent narrative and "shooting" it. A "director neural network" could be trained to mimic the cinematography techniques of films created by humans, and then apply the results to the virtual environment.
Back in December, the unsavory hobby of a Reddit user by the name of deepfakes became a new centerpiece of artificial intelligence debate, specifically around the newfound ability to face-swap celebrities and porn stars. Using software, deepfakes was able to take the face of famous actresses and swap them with those of porn actresses, letting him live out a fantasy of watching famous people have sex. Now, just two months later, easy-to-use applications have sprouted up with the ability to perform this real-time editing with even more ease, according to Motherboard, which also first reported about deepfakes late last year.
Thanks to AI training techniques like machine learning, scores of photographs can be fed into an algorithm that creates convincing human masks to replace the faces of anyone on video, all by using lookalike data and letting the software train itself to improve over time. In this case, users are putting famous actresses into existing adult films. According to deepfakes, this required some extensive computer science know-how. But Motherboard reports that one user in the burgeoning community of pornographic celebrity face swapping has created a user-friendly app that basically anyone can use.
The same technique can be used for non-pornographic purposes, such as inserting Nicolas Cage's face into classic movies. One user also "outperformed" the Princess Leia scene at the end of Disney's Rogue One (you be the judge, original footage is at the top of the GIF).
The machines are learning.
The messaging platform Discord has taken down a channel that was being used to share and spread AI-edited pornographic videos:
Last year, a Reddit user known as "deepfakes" used machine learning to digitally edit the faces of celebrities into pornographic videos, and a new app has made the process much easier to create and spread the videos online. on Friday, chat service Discord shut down a user-created group that was spreading the videos, citing their policy against revenge porn.
Discord is a free chat platform that caters to gamers, and has a poor track record when it comes to dealing with abuse and toxic communities. After it was contacted by Business Insider, the company took down the chat group, named "deepfakes."
One take is that there is no recourse for "victims" of AI-generated porn, at least in the U.S.:
To many vulnerable people on the internet, especially women, this looks a whole lot like the end times. "I share your sense of doom," Mary Anne Franks, who teaches First Amendment and technology law at the University of Miami Law School, and also serves as the tech and legislative policy advisor for the Cyber Civil Rights Initiative. "I think it is going to be that bad."
Pornhub will be deleting "deepfakes" — AI-generated videos that realistically edit new faces onto pornographic actors — under its rules against nonconsensual porn, following in the footsteps of platforms like Discord and Gfycat. "We do not tolerate any nonconsensual content on the site and we remove all said content as soon as we are made aware of it," the company told Motherboard, which first reported on the deepfakes porn phenomenon last year. Pornhub says that nonconsensual content includes "revenge porn, deepfakes, or anything published without a person's consent or permission."
Update: The infamous subreddit itself, /r/deepfakes, has been banned by Reddit. /r/CelebFakes and /r/CelebrityFakes have also been banned for their non-AI porn fakery (they had existed for over 7 years). Other subreddits like /r/fakeapp (technical support for the software) and /r/SFWdeepfakes remain intact. Reported at Motherboard, The Verge, and TechCrunch.
Motherboard also reported on some users (primarily on a new subreddit, /r/deepfakeservice) offering to accept commissions to create deepfakes porn. This is seen as more likely to result in a lawsuit:
A machine learning algorithm has created tiny (64×64 pixels) 32-frame videos based on text descriptions:
The researchers trained the algorithm on 10 types of scenes, including "playing golf on grass," and "kitesurfing on the sea," which it then roughly reproduced. Picture grainy VHS footage. Nevertheless, a simple classification algorithm correctly guessed the intended action among six choices about half the time. (Sailing and kitesurfing were often mistaken for each other.) What's more, the network could also generate videos for nonsensical actions, such as "sailing on snow," and "playing golf at swimming pool," the team reported this month at a meeting of the Association for the Advancement of Artificial Intelligence in New Orleans, Louisiana.
[...] Currently, the videos are only 32 frames long—lasting about 1 second—and the size of a U.S. postage stamp, 64 by 64 pixels. Anything larger reduces accuracy, says Yitong Li, a computer scientist at Duke University in Durham, North Carolina, and the paper's first author. Because people often appear as distorted figures, a next step, he says, is using human skeletal models to improve movement.
Tuytelaars also sees applications beyond Hollywood. Video generation could lead to better compression if a movie can be stored as nothing but a brief description. It could also generate training data for other machine learning algorithms. For example, realistic video clips might help autonomous cars prepare for dangerous situations they would not frequently encounter. And programs that deeply understand the visual world could spin off useful applications in everything from refereeing to surveillance. They could help a self-driving car predict where a motorbike will go, for example, or train a household robot to open a fridge, Pirsiavash says.
An AI-generated Hollywood blockbuster may still be beyond the horizon, but in the meantime, we finally know what "kitesurfing on grass" looks like.
Currently to get a realistic Deep Fake, shots from multiple angles are needed. Russian researchers have now taken this a step further, generating realistic video sequences based off a single photo.
Researchers trained the algorithm to understand facial features' general shapes and how they behave relative to each other, and then to apply that information to still images. The result was a realistic video sequence of new facial expressions from a single frame.
As a demonstration, they provide details and synthesized video sequences of historical figures such as Albert Einstein and Salvador Dali, as well as sequences based on paintings such as the Mona Lisa.
The authors are aware of the potential downsides of their technology and address this:
We realize that our technology can have a negative use for the so-called "deepfake" videos. However, it is important to realize, that Hollywood has been making fake videos (aka "special effects") for a century, and deep networks with similar capabilities have been available for the past several years (see links in the paper). Our work (and quite a few parallel works) will lead to the democratization of the certain special effects technologies. And the democratization of the technologies has always had negative effects. Democratizing sound editing tools lead to the rise of pranksters and fake audios, democratizing video recording lead to the appearance of footage taken without consent. In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection alongside with the ongoing shift for privacy and data security in major IT companies.
While it works with as few as one frame to learn from, the technology benefits in accuracy and 'identity preservation' from having multiple frames available. This becomes obvious when observing the synthesized Mona Lisa sequences, which, while accurate to the original, appear to be essentially three different individuals to the human eye watching them.
Journal Reference: https://arxiv.org/abs/1905.08233v1
Most Deepfake Videos Have One Glaring Flaw: A Lack of Blinking
My Struggle With Deepfakes
Discord Takes Down "Deepfakes" Channel, Citing Policy Against "Revenge Porn"
AI-Generated Fake Celebrity Porn Craze "Blowing Up" on Reddit
As Fake Videos Become More Realistic, Seeing Shouldn't Always be Believing
GitHub is banning code from DeepNude, the app that used AI to create fake nude pictures of women. Motherboard, which first reported on DeepNude last month, confirmed that the Microsoft-owned software development platform won't allow DeepNude projects. GitHub told Motherboard that the code violated its rules against "sexually obscene content," and it's removed multiple repositories, including one that was officially run by DeepNude's creator.
DeepNude was originally a paid app that created nonconsensual nude pictures of women using technology similar to AI "deepfakes." The development team shut it down after Motherboard's report, saying that "the probability that people will misuse it is too high." However, as we noted last week, copies of the app were still accessible online — including on GitHub.
Late that week, the DeepNude team followed suit by uploading the core algorithm (but not the actual app interface) to the platform. "The reverse engineering of the app was already on GitHub. It no longer makes sense to hide the source code," wrote the team on a now-deleted page. "DeepNude uses an interesting method to solve a typical AI problem, so it could be useful for researchers and developers working in other fields such as fashion, cinema, and visual effects."
Related: AI-Generated Fake Celebrity Porn Craze "Blowing Up" on Reddit
Discord Takes Down "Deepfakes" Channel, Citing Policy Against "Revenge Porn"
My Struggle With Deepfakes
Deep Fakes Advance to Only Needing a Single Two Dimensional Photograph
Here's a quick overview of "documentaries" to watch before, during and after a pandemic:
The Andromeda Strain film: An early Michael Crichton adaptation which came before the Westworld film and series and the Jurassic Park film series. Like many Michael Crichton stories, factual science is extended with credible speculation. In this case, a prion-like infection has killed almost everyone in a village and the survivors are seemingly unrelated. The film is best known for its cartoonish but very photogenic indoor set which serves as the backdrop of a Level 4 Biolab. Such eloborate sets were common in the era. (Other examples include Rollerball and disaster parody/Airplane predecessor, The Big Bus.) The film features concurrent action which was a common experimental film technique in the 1960s but, nowadays, is most commonly associated with Kiefer Sutherland in the 24 series. There is also a lesser-known mini-series.
Outbreak: A rather dull film which nevertheless provides a graphic portrayal of uncontained pandemic and towns being quarantined. It would be marginally improved if the antagonists were re-cast. Possible source material for DeepFaking.
The Resident Evil film series: These films considerably advanced the tropes of amoral corporation, rogue artificial intelligence as antagonist, reality within reality, experimentation without informed consent and the horror of a medicalized vampire/zombie rabies virus. The Red Queen versus White Queen subplot dovetails with Alice in Wonderland, prey versus predator evolution and Umbrella Corp's red and white logo which - in a case of life mirroring art - was copied by Wuhan's Level 4 Biolab. Many people prefer the competent and detailed Resident Evil series of computer games which are arguably better than the Half Life series or the SCP game.