SoylentNews Comments | A Deep Dive into the History and Evolution of Zip Compression

A Deep Dive into the History and Evolution of Zip Compression

posted by Fnord666 on Saturday February 29 2020, @06:05PM

from the zip-it-up dept.

Hans Wennborg does a deep dive into the history and evolution of the Zip compression format and underlying algorithms in a blog post. While this lossless compression format became popular around three decades ago, it has its roots in the 1950s and 1970s. Notably, as a result of the "Arc Wars" of the 1980s, hitting BBS users hard, the Zip format was dedicated to the public domain from the start. The main work of the Zip format is performed through use of Lempel-Ziv compression (LZ77) and Huffman coding.

I have been curious about data compression and the Zip file format in particular for a long time. At some point I decided to address that by learning how it works and writing my own Zip program. The implementation turned into an exciting programming exercise; there is great pleasure to be had from creating a well oiled machine that takes data apart, jumbles its bits into a more efficient representation, and puts it all back together again. Hopefully it is interesting to read about too.
This article explains how the Zip file format and its compression scheme work in great detail: LZ77 compression, Huffman coding, Deflate and all. It tells some of the history, and provides a reasonably efficient example implementation written from scratch in C. The source code is available in hwzip-1.0.zip.

Previously:
Specially Crafted ZIP Files Used to Bypass Secure Email Gateways (2019)
Which Compression Format to Use for Archiving? (2019)
The Math Trick Behind MP3s, JPEGs, and Homer Simpson's Face (2019)
Ask Soylent: Internet-communication Archival System (2014)

Original Submission

Starting Score:

points

Moderation

Informative=1, Total=1

Extra 'Informative' Modifier

Total Score:

This discussion has been archived. No new comments can be posted.

A Deep Dive into the History and Evolution of Zip Compression | Log In/Create an Account | Top | 20 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:The use of zlib is truly ubiquitous(Score: 1, Informative) by Anonymous Coward on Saturday February 29 2020, @09:12PM

by Anonymous Coward on Saturday February 29 2020, @09:12PM (#964672)

https://en.wikipedia.org/wiki/Unzip#Local_file_header [wikipedia.org]
https://en.wikipedia.org/wiki/Unzip#Central_directory_file_header [wikipedia.org]

The file info is stored in 2 locations. At the beginning of the stream and again in the central header.

Remember .zip came out of the BBS era. So sometimes you would not get the whole file. So they made it so you could recover some of the files even if you only had half of it or something was corrupt in the middle.

tar is basically similar. header+file concat. But it does not compress the data stream. tar.gz or tgz basically compresses all of the stream headers and all into 1 'zip' stream. So if you blow out one bit in the middle the second half is gone. But you get better compression because you possibly get more duplicate items. In practice the actual amount compressed is fairly 'meh'. As the window size in zip is decently small. That is why 7z has better compression most of the time as it can look at more data.

Parent

Starting Score:	0		points
Moderation		+1
Informative=1, Total=1
Extra 'Informative' Modifier		0

Total Score:		1

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

A Deep Dive into the History and Evolution of Zip Compression

Re:The use of zlib is truly ubiquitous(Score: 1, Informative) by Anonymous Coward on Saturday February 29 2020, @09:12PM