Submitted via IRC for BoyceMagooglyMonkey
Compressing your files is a good way to save space on your hard drive. At Dropbox's scale, it's not just a good idea; it is essential. Even a 1% improvement in compression efficiency can make a huge difference. That's why we conduct research into lossless compression algorithms that are highly tuned for certain classes of files and storage, like Lepton for jpeg images, and Pied-Piper-esque lossless video encoding. For other file types, Dropbox currently uses the zlib compression format, which saves almost 8% of disk storage.
We introduce DivANS, our latest open-source contribution to compression, in this blog post. DivANS is a new way of structuring compression programs to make them more open to innovation in the wider community, by separating compression into multiple stages that can each be improved independently:
Source: https://blogs.dropbox.com/tech/2018/06/building-better-compression-together-with-divans/
(Score: 3, Interesting) by shortscreen on Thursday June 28 2018, @08:14AM (3 children)
If they are trying to get good compression then why are they using speculative probability tables instead of counting the actual frequencies in the actual data? Doesn't zlib already do exactly that for each 32KB block?
(Score: 1, Informative) by Anonymous Coward on Thursday June 28 2018, @12:03PM
I don't know if this is the real reason, but Dropbox received a patent [freshpatents.com] for image recompression "with an arithmetic coding that uses a sophisticated adaptive probability model."
(Score: 0) by Anonymous Coward on Thursday June 28 2018, @10:26PM
This isn't about optimisation, it's about getting you to install another vector for the NSA.
(Score: 0) by Anonymous Coward on Friday June 29 2018, @01:17AM
Yet their compression probably beats zlib by 8-10%.
zlib is actually one of the worse deflate compressors out there. It is usually used as the '0' comparison for most compression tests. 7zip is one of the better ones for speed and compression. But is still pretty slow. There are faster algs out there but they compromise on space but still beat zlib for size. Those are usually used for streaming. There are ones that blow 7zip away by a good 20%. They are also amazingly slow.
DivANS is an interesting way to look at the streams spit out. Basically they are turning it into an IR directed acyclic graph like language then optimizing that much like a optimizing compiler. I would say this actually important advancement to keep an eye on in the world of compression.