Currently to get a realistic Deep Fake, shots from multiple angles are needed. Russian researchers have now taken this a step further, generating realistic video sequences based off a single photo.

Researchers trained the algorithm to understand facial features' general shapes and how they behave relative to each other, and then to apply that information to still images. The result was a realistic video sequence of new facial expressions from a single frame.

As a demonstration, they provide details and synthesized video sequences of historical figures such as Albert Einstein and Salvador Dali, as well as sequences based on paintings such as the Mona Lisa.

The authors are aware of the potential downsides of their technology and address this:

We realize that our technology can have a negative use for the so-called "deepfake" videos. However, it is important to realize, that Hollywood has been making fake videos (aka "special effects") for a century, and deep networks with similar capabilities have been available for the past several years (see links in the paper). Our work (and quite a few parallel works) will lead to the democratization of the certain special effects technologies. And the democratization of the technologies has always had negative effects. Democratizing sound editing tools lead to the rise of pranksters and fake audios, democratizing video recording lead to the appearance of footage taken without consent. In each of the past cases, the net effect of democratization on the World has been positive, and mechanisms for stemming the negative effects have been developed. We believe that the case of neural avatar technology will be no different. Our belief is supported by the ongoing development of tools for fake video detection and face spoof detection alongside with the ongoing shift for privacy and data security in major IT companies.

While it works with as few as one frame to learn from, the technology benefits in accuracy and 'identity preservation' from having multiple frames available. This becomes obvious when observing the synthesized Mona Lisa sequences, which, while accurate to the original, appear to be essentially three different individuals to the human eye watching them.

Journal Reference: https://arxiv.org/abs/1905.08233v1

