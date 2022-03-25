Nvidia shows off AI model that turns a few dozen snapshots into a 3D-rendered scene
Nvidia's latest AI demo is pretty impressive: a tool that quickly turns a "few dozen" 2D snapshots into a 3D-rendered scene. In the video below you can see the method in action, with a model dressed like Andy Warhol holding an old-fashioned Polaroid camera. (Don't overthink the Warhol connection: it's just a bit of PR scene dressing.)
The tool is called Instant NeRF, referring to "neural radiance fields" — a technique developed by researchers from UC Berkeley, Google Research, and UC San Diego in 2020. If you want a detailed explainer of neural radiance fields, you can read one here, but in short, the method maps the color and light intensity of different 2D shots, then generates data to connect these images from different vantage points and render a finished 3D scene. In addition to images, the system requires data about the position of the camera.
Researchers have been improving this sort of 2D-to-3D model for a couple of years now, adding more detail to finished renders and increasing rendering speed. Nvidia says its new Instant NeRF model is one of the fastest yet developed and reduces rendering time from a few minutes to a process that is finished "almost instantly."
Breakthrough AI Technique Enables Real-Time Rendering of Scenes in 3D From 2D Images:
Humans are pretty good at looking at a single two-dimensional image and understanding the full three-dimensional scene that it captures. Artificial intelligence agents are not.
Yet a machine that needs to interact with objects in the world — like a robot designed to harvest crops or assist with surgery — must be able to infer properties about a 3D scene from observations of the 2D images it's trained on.
While scientists have had success using neural networks to infer representations of 3D scenes from images, these machine learning methods aren't fast enough to make them feasible for many real-world applications.
A new technique demonstrated by researchers at MIT and elsewhere is able to represent 3D scenes from images about 15,000 times faster than some existing models.
The method represents a scene as a 360-degree light field, which is a function that describes all the light rays in a 3D space, flowing through every point and in every direction. The light field is encoded into a neural network, which enables faster rendering of the underlying 3D scene from an image.