Stories
Slash Boxes
Comments

SoylentNews is people

posted by hubie on Saturday April 22, @11:04AM   Printer-friendly

Apple's WaveOne purchase heralds a new era in smart-streaming of AR and video:

Apple's surprise purchase at the end of last month of WaveOne, a California-based startup that develops content-aware AI algorithms for video compression, showcases an important shift in how video signals are streamed to our devices. In the near-term Cuppertino's purchase will likely lead to smart video-compression tools in Apple's video-creation products and in the development of its much-discussed augmented-reality headset.

However, Apple isn't alone. Startups in the AI video codec space are likely to prove acquisition targets for other companies trying to keep up.

[...] AI codecs, having been developed over the course of decades, use machine-learning algorithms to analyze and understand the visual content of a video, identify redundancies and nonfunctional data, and compress the video in a more efficient way. They use learning-based techniques instead of manually designed tools for encoding and can use different ways to measure encoding quality beyond traditional distortion measures. Recent advancements, like attention mechanisms, help them understand the data better and optimize visual quality.

During the first half of the 2010s, Netflix and a California-based company called Harmonic helped to spearhead a movement of what's called "content-aware" encoding. CAE, as Harmonic calls it, uses AI to analyze and identify the most important parts of a video scene, and to allocate more bits to those parts for better visual quality, while reducing the bit rate for less important parts of the scene.

Content-aware video compression adjusts an encoder for different resolutions of encoding, adjusts the bit rate according to content, and adjusts the quality score—the perceived quality of a compressed video compared to the original uncompressed video. All those things can be done by neural encoders as well.

[...] WaveOne has shown success in neural-network compression of still images. In one comparison, WaveOne reconstructions of images were 5 to 10 times as likely to be chosen over conventional codecs by a group of independent users.

But the temporal correlation in video is much stronger than the spatial correlation in an image and you must encode the temporal domain extremely efficiently to beat the state of the art.

"At the moment, the neural video encoders are not there yet," said Yiannis Andreopoulos, a professor of data and signal processing at University College London and chief technology officer at iSize Technologies.

[...] Nonetheless, the industry appears to be moving toward combining AI with conventional codecs—rather than relying on full neural-network compression.

[...] For the time being, "AI and conventional technologies will work in tandem," said Andreopoulos, in part, he said, because conventional encoders are interpretable and can be debugged. Neural networks are famously obscure "black boxes." Whether in the very long term neural encoding will beat traditional encoding, Andreopoulos added, is still an open question.


Original Submission

This discussion was created by hubie (1068) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Insightful) by inertnet on Saturday April 22, @11:14AM (3 children)

    by inertnet (4071) Subscriber Badge on Saturday April 22, @11:14AM (#1302561) Journal

    Be aware that this will lead to an "all your videos are belong to us" approach. Your videos will have to be streamed to this black box AI in order to be compressed by it.

    • (Score: 4, Interesting) by looorg on Saturday April 22, @01:22PM (2 children)

      by looorg (578) on Saturday April 22, @01:22PM (#1302572)

      How would that be different from how it is now? If you are serious about it or cranking out a lot of video content you are more or less guaranteed to already have a dedicated card, or an entire machine(s), that just does compression and encoding of video content. So if it's in a little black box or in a slot in your desktop machine(s) or in a server farm far far away doesn't, or shouldn't, really matter all that much. Image "magic" no matter the size of colour of the box.

      Still it's a bit weird that it's getting blown up now. Is it the "AI" that is new? After all the progress of video compression have been massive just over the last few years or decades. It's not to long ago that the files were massive and the quality was shitty or the images were really tiny and small. If anything is weird it might be how long it has taken, after all in some regard isn't video just a stream of still images shown in rapid succession after each other? If anything the compression should, or perhaps is, better then still images. After all there are more of them to analyze, things that can be removed, compressed, cut, not updated etc. The issue I would imagine is to figure out which parts to do what to. I gather that is what the "AI" will does.

      • (Score: 3, Interesting) by gznork26 on Saturday April 22, @01:44PM (1 child)

        by gznork26 (1159) on Saturday April 22, @01:44PM (#1302573) Homepage Journal

        Speaking of tiny video, back during dial-up, a small company I was in developed a video compression suite for Microsoft that enabled a tech to insert break points and then adjust both the frame rate and bit rate for each in order to optimize the tiny videos compressed for their clients. Using it then entailed a lot of experimenting and tweaking by the tech, who learned from experience what combinations worked in various circumstances.

        An AI set on that task might include those tricks, but I'm curious what other methods it would come up with. We humans think of video as a series of frames, but the AI could just as easily segment the image and apply different solutions to each across time for even better gains. I look forward to discussions here at Soylent about researchers' analyses of how the AI's solutions worked once they get good at it.

        • (Score: 2) by bzipitidoo on Monday April 24, @09:17PM

          by bzipitidoo (4388) Subscriber Badge on Monday April 24, @09:17PM (#1302890) Journal

          I have heard this sort of thing described as "graduate student compression". Basically using human intelligence, not artificial intelligence. It was particularly applied to fitting fractals to still images. The potential space savings of reducing a still image to parameters for fractals is huge. Haven't heard of anything on that in years. I should guess fractal compression didn't pan out. Possibly even a good fractal approximation of an object required so much adjustment that it took more data than the more established methods of the likes of JPEG.

  • (Score: 5, Informative) by Rich on Saturday April 22, @03:18PM

    by Rich (945) on Saturday April 22, @03:18PM (#1302576) Journal

    Stable Diffusion img2img works by first mapping a 512x512 image into a 64x64x4 "latent space", which is a multidimensional representation of, er, stuff. When you run it with effects set to minimum, you get your image back in good shape, even if it was condensed into 1/16 of the data points in between. Packing and unpacking works by the magic of "AutoEncoders". These pack and unpack a picture into and out of latent space. Understanding what is in that latent space is difficult, but we can assume it's some blobby representation of things and where they are.

    So the "WaveOne reconstructions of images were 5 to 10 times as likely to be chosen over conventional codecs by a group of independent users." part can be basically done by everyone with a PC.

    Now, because latent space consists of "stuff" rather than pixels, an interpolation will actually transition the "stuff", rather than doing a pixel fade. There are a few videos on YT under "latent space interpolation", which give an idea on what happens during such a transition. The idea is that you don't have to have intermediate frames and can just interpolate latent space between keyframes. Whoever tames that behaviour will hit a gold mine.

    However, the Autoencoders are pretty large, The popular 840k-VAE measures 334.6MB. You probably would not want to ship one of those with every medium, but maybe they plan with possible augmentation of these.

    Fun stuff to watch developing, many papers are brand new, and I wonder if there's "text-to-video" to be found along the way.

  • (Score: 3, Insightful) by istartedi on Saturday April 22, @03:53PM (3 children)

    by istartedi (123) on Saturday April 22, @03:53PM (#1302578) Journal

    If I read TFA correctly, the algorithm figures out where a human is likely to look and simply allocates more bandwidth to those areas.

    This isn't what I was thinking.

    I was thinking that it could actually analyze the scene, and say, for example, label something as "trees in the background". With a sufficient database of what trees look like, it could generate all those client-side and nobody would care. If the lead characters are smooching center-frame, nobody is going to be like, "hey, that 11th tree from the left is supposed to more like a walnut".

    I think that could compress like gangbusters, and be fairly asymmetric in terms of compressions/decompression, with some fairly intensive analysis to encode, but a pretty quick render to pull the specified object from a local DB and render it.

    Where it might get pushed too far is if the DB is too small, and you start watching movies and thinking, "Hey, that scene with them in the truck, it seems like I've seen that road before", and various other types of deja vu.

    What they're describing though, it doesn't sound like anything that sophisticated. I've just described people smooching in front of a forest or driving down a road. You all have different visuals in your head of what that would look like, and that literally compresses to just a few bytes vs. what a video would be--but if we compress to that point we'd also start breaking down the artist's vision. It's important in film that it be *light green* grass by the road, etc., which you may or may not have imagined (but I did when thinking about it).

    --
    Appended to the end of comments you post. Max: 120 chars.
    • (Score: 0) by Anonymous Coward on Sunday April 23, @07:42AM

      by Anonymous Coward on Sunday April 23, @07:42AM (#1302631)
      People have done similar stuff before and other people have provided good reasons on why it's a bad/terrible idea.

      Are you going to be legally liable when your system adds in details that the original doesn't have?

      I hope you will be, it's a lot more fun to watch that way. 😉
    • (Score: 3, Insightful) by acid andy on Sunday April 23, @03:35PM

      by acid andy (1683) Subscriber Badge on Sunday April 23, @03:35PM (#1302674) Homepage Journal

      nobody is going to be like, "hey, that 11th tree from the left is supposed to more like a walnut".

      I probably would. And that's why I don't like this approach to video compression. Often when I'm watching TV I'll be just as interested in lots of little details in the background as I will in plots and dialog that are usually just repetitions of things that have been done many times before.

      --
      Master of the science of the art of the science of art.
    • (Score: 3, Insightful) by acid andy on Sunday April 23, @03:49PM

      by acid andy (1683) Subscriber Badge on Sunday April 23, @03:49PM (#1302675) Homepage Journal

      I wonder if it would improve the compression rate further if the AI basically deconstructed the movie into a 3D video game, so determine the geometry of all the objects, derive compressed textures for them, and then record their positions and motion to replay. Haven't we come close to something like this already? I don't remember a specific example but I know techiques like photogrammetry have been around for some time.

      It would be pretty cool to just load up your favorite movie and then go and walk around the environments, wouldn't it? It would instantly make VR and 3D television more worthwhile as well.

      I think that could compress like gangbusters, and be fairly asymmetric in terms of compressions/decompression, with some fairly intensive analysis to encode, but a pretty quick render to pull the specified object from a local DB and render it.

      Where it might get pushed too far is if the DB is too small, and you start watching movies and thinking, "Hey, that scene with them in the truck, it seems like I've seen that road before", and various other types of deja vu.

      I think this is the problem. I'm sure there are some gains to this approach, but you'd need a way to quickly provide probably a few terabytes of local data for the database up front. What happens if the content provider decides they want to refresh the whole database periodically? Maybe a change of resolution or format or just too much new content. If you have a fast enough way of refreshing all that without frustrating the viewer then you possibly have a fast enough way of delivering regular compressed video too so maybe it all becomes moot.

      --
      Master of the science of the art of the science of art.
  • (Score: 1, Insightful) by Anonymous Coward on Saturday April 22, @07:38PM (1 child)

    by Anonymous Coward on Saturday April 22, @07:38PM (#1302589)

    Would it hurt to link to an example of compression quality? Come on, titillate me with your fancy toys.

    • (Score: 4, Touché) by Anonymous Coward on Saturday April 22, @11:13PM

      by Anonymous Coward on Saturday April 22, @11:13PM (#1302604)

      You're not familiar with Apple's reality distortion field, are you?

(1)