Stories
Slash Boxes
Comments

SoylentNews is people

posted by mrpg on Saturday February 24 2018, @08:44PM   Printer-friendly
from the picture-this dept.

A machine learning algorithm has created tiny (64×64 pixels) 32-frame videos based on text descriptions:

The researchers trained the algorithm on 10 types of scenes, including "playing golf on grass," and "kitesurfing on the sea," which it then roughly reproduced. Picture grainy VHS footage. Nevertheless, a simple classification algorithm correctly guessed the intended action among six choices about half the time. (Sailing and kitesurfing were often mistaken for each other.) What's more, the network could also generate videos for nonsensical actions, such as "sailing on snow," and "playing golf at swimming pool," the team reported this month at a meeting of the Association for the Advancement of Artificial Intelligence in New Orleans, Louisiana.

[...] Currently, the videos are only 32 frames long—lasting about 1 second—and the size of a U.S. postage stamp, 64 by 64 pixels. Anything larger reduces accuracy, says Yitong Li, a computer scientist at Duke University in Durham, North Carolina, and the paper's first author. Because people often appear as distorted figures, a next step, he says, is using human skeletal models to improve movement.

Tuytelaars also sees applications beyond Hollywood. Video generation could lead to better compression if a movie can be stored as nothing but a brief description. It could also generate training data for other machine learning algorithms. For example, realistic video clips might help autonomous cars prepare for dangerous situations they would not frequently encounter. And programs that deeply understand the visual world could spin off useful applications in everything from refereeing to surveillance. They could help a self-driving car predict where a motorbike will go, for example, or train a household robot to open a fridge, Pirsiavash says.

An AI-generated Hollywood blockbuster may still be beyond the horizon, but in the meantime, we finally know what "kitesurfing on grass" looks like.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by SomeGuy on Saturday February 24 2018, @10:15PM (1 child)

    by SomeGuy (5632) on Saturday February 24 2018, @10:15PM (#643196)

    Video generation could lead to better compression if a movie can be stored as nothing but a brief description.

    We could do something similar with music too. Perhaps we could call it MIDI?

    Been there done that.

    DOOME1M1.MID. (You can hear it in your head now can't you? :P )

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Interesting) by Ethanol-fueled on Saturday February 24 2018, @10:51PM

    by Ethanol-fueled (2792) on Saturday February 24 2018, @10:51PM (#643207) Homepage

    MIDI was a godsend for getting files of songs and opening them in the score editor to learn how to play them. Considering that the official books were over 20 bucks an album (more for larger or more rare ones, my then-girlfriend paid 40 for an official Tori Amos book which wasn't even accurate) the extra effort to download the midis was worth it, and anyway good luck finding the sheet music for Tarkus or Trilogy back then.

    But the reason why this is on-topic is because similar approaches have been used to generate music, though with comparatively much better results than what is described in the article. If anybody can create their own "deep fakes" now, just imagine the capabilities of government agencies. Some of them probably have lots of very good blackmail material on very important people, and muh deep fake calls truth into question and gives falsehoods credibility.