Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Saturday February 29 2020, @06:05PM   Printer-friendly
from the zip-it-up dept.

Hans Wennborg does a deep dive into the history and evolution of the Zip compression format and underlying algorithms in a blog post. While this lossless compression format became popular around three decades ago, it has its roots in the 1950s and 1970s. Notably, as a result of the "Arc Wars" of the 1980s, hitting BBS users hard, the Zip format was dedicated to the public domain from the start. The main work of the Zip format is performed through use of Lempel-Ziv compression (LZ77) and Huffman coding.

I have been curious about data compression and the Zip file format in particular for a long time. At some point I decided to address that by learning how it works and writing my own Zip program. The implementation turned into an exciting programming exercise; there is great pleasure to be had from creating a well oiled machine that takes data apart, jumbles its bits into a more efficient representation, and puts it all back together again. Hopefully it is interesting to read about too.

This article explains how the Zip file format and its compression scheme work in great detail: LZ77 compression, Huffman coding, Deflate and all. It tells some of the history, and provides a reasonably efficient example implementation written from scratch in C. The source code is available in hwzip-1.0.zip.

Previously:
Specially Crafted ZIP Files Used to Bypass Secure Email Gateways (2019)
Which Compression Format to Use for Archiving? (2019)
The Math Trick Behind MP3s, JPEGs, and Homer Simpson's Face (2019)
Ask Soylent: Internet-communication Archival System (2014)


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by driverless on Sunday March 01 2020, @06:14AM (1 child)

    by driverless (4770) on Sunday March 01 2020, @06:14AM (#964794)

    That had nothing to do with it. Pure arithemetic coding was unencumbered, in the 1980s no-one cared or even knew about patent trolls, and even for patented stuff like LZW everyone used it because Sperry weren't enforcing their patent. Even ignoring all of that, no-one operates like that because then you could literally never do anything at all.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by bzipitidoo on Tuesday March 03 2020, @06:27PM

    by bzipitidoo (4388) on Tuesday March 03 2020, @06:27PM (#966072) Journal

    The perils of patents that came with Arithmetic Coding have everything to do with why, in the mid 1990s, bzip2 was accepted and bzip was not.

    The driving reason for the creation of PNG was that GIF was patent encumbered. Maybe Sperry wasn't trying to shakedown users, but Unisys most assuredly was. Website owners were encouraged to switch to PNG, with such initiatives as Burn All GIFs Day.

    Even in the 1980s, people were quite aware of intellectual property problems. After all, the FSF was founded in 1985.

    > Even ignoring all of that, no-one operates like that because then you could literally never do anything at all.

    It's true that it's not possible to write anything but the simplest of programs without inadvertently violating software patents by the hundreds. Not only were the courts crazy enough to allow the patenting of software, they even support patent holders' efforts to go after the users. You'd think they should only go after the creators of some allegedly infringing software, and leave the users out of the fight. But no, they can successfully sue the users. Before their patents expired, Unisys embarked on a massive campaign to shakedown every significant website that used a GIF anywhere at all. That's what users are anxious to avoid. And if that means a small hit to the best possible, in their view, it's worth it.