Stories
Slash Boxes
Comments

SoylentNews is people

Submission Preview

Link to Story

My Struggle With Deepfakes

Accepted submission by cafebabe at 2018-05-29 11:13:59
Hardware

There has been controversy about DeepFakes [soylentnews.org], the process of substituting faces in video. Almost immediately, it was used for pornography. While celebrities were generally unamused, porn stars were alarmed by the further commodification of their rôle. The algorithm is widely available and several web sites removed objectionable examples. You know something is controversial when porn sites remove it. Reddit was central for DeepFake tech support and took drastic action to remove discussion after it was becoming synonymous with ficticious revenge porn and other variants of anti-social practices.

I found a good description of the DeepFake algorithm. It runs via a standard neural network library but requires considerable processing power on specific GPUs. I will describe the video input (with face to be removed) as the source and the face to be replaced as the target. The neural network is trained with the target face only. The source is distorted and the neural network is trained to approximate reference images of the target. When the neural network is given the source, it has been trained to "undistort" the source to target.

If there are multiple faces in a frame of video, face recognition restricts input to the most likely face. Indeed, for maximum efficiency, this technique is used to crop source video in all cases. The trick that makes the process feasible is that the neural network is only trained with the target face. Furthermore, given the use of libraries, the unique code to achieve this objective is shockingly small.

A friend attempted to mix DeepFakes with the Internet meme [wikipedia.org] of Downfall parodies. There is an infamous scene in the film: Downfall [wikipedia.org] (not to be confused with the film: Falling Down [wikipedia.org]) where Adolf Hitler rants prior to defeat. Unfaithful subtitles of the German dialog have been used to parody everything from corporate sales targets to sportsball management to the ongoing medical abuse of transsexual patients [youtube.com]. Until now, only the words in the subtitles changed. The audio and video was otherwise unchanged. My friend hoped that it would be possible to insert the likeness of people being parodied.

Unfortunately, it doesn't work with the current algorithm. The number of faces is not a problem. The clipping and occlusion prevents the neural network from working effectively. It should be possible with an extension of the current algorithm but it is currently impractical.

A further development, found by the same friend, is the automatic conversion of a one sentence description into a very short video. The example system uses Flintstones cartoons. An example sentence would be "Fred dancing in the kitchen" and a rough but valid video is created which matches the description. Potentially, it would be possible to automatically convert a novel into a 100 minute film with no human intervention. Given that novels are frequently converted into films, there is a large amount of example data which may be used as reference. I know this would only be moderately easier than making a holodeck but experts may not be aware of the progress towards either goal.


Original Submission