Web developer Ukiah Smith wrote a blog post about which compression format to use when archiving. Obviously the algorithm must be lossless but beyond that he sets some criteria and then evaluates how some of the more common methods line up.
After some brainstorming I have arrived with a set of criteria that I believe will help ensure my data is safe while using compression.
- The compression tool must be opensource.
- The compression format must be open.
- The tool must be popular enough to be supported by the community.
- Ideally there would be multiple implementations.
- The format must be resilient to data loss.
Some formats I am looking at are zip, 7zip, rar, xz, bzip2, tar.
He closes by mentioning error correction. That has become more important than most acknowledge due to the large size of data files, the density of storage, and the propensity for bits to flip.
(Score: 5, Informative) by SomeGuy on Thursday September 12 2019, @07:46PM (7 children)
In practice, error correction for compressed archives is bullshit. Most error correction assumes a single bit gets flipped, but more often archives will have gaps of 512 bytes or larger due to bad sectors reading a hard disk, mysteriously missing areas in the center due to network copy errors, truncated due to uploads crapping out, every CR converted to CRLF because someone did not know how to use FTP, or such.
Single bit errors usually happen in RAM, and from my own experience this messes things up either before or after the file is compressed/decompressed, so the archiver's error checking usually can't catch that
Unfortunately, I have had both 7z and RAR change their default compression methods on me, and I think some ZIP software programs have tried to add some of their own. The problem is, you compress a file and then suddenly anyone with an "old" archiver cant uncompress your archive any more. (start assholes bitching about how everyone should always be using the latest and greatest here and then slap them upside the head because in the real world this is not always possible, especially when dealing with vintage or legacy systems.)
Some might actually recommend NOT using compression AT ALL. Just put them uncompressed in a zip/7z/RAR container, where extraction in the worst case is trivial. As a bonus, if you are using a compressed/deduplicated file system, it may get much better compression.
The only thing "resilient" to data loss is lots of backups, and constant checking and re-checking that what is supposed to be in the file(s) is what is actually there.
(Score: 2, Interesting) by Anonymous Coward on Thursday September 12 2019, @08:12PM
Error correction for archives, compressed or not, isn't bs at all. Distributed storage with erasure codes easily handles bitrot and large errors (sector rot?). This comes at a cost of complexity and increased space but does allow for self-correcting of errors.
(Score: 0) by Anonymous Coward on Thursday September 12 2019, @10:49PM (5 children)
https://github.com/lrq3000/dvdisaster [github.com]
https://cdn.rawgit.com/lrq3000/dvdisaster/stable/dvdisaster/documentation/en/howtos20.html [rawgit.com]
The program is intended for ISO images, but will happily (and quickly, which is important) create an ECC file for absolutely any file given in place of an input ISO, and just as happily and quickly fix damage in that file using the ECC file.
(Score: 3, Funny) by Reziac on Friday September 13 2019, @02:34AM (4 children)
....and with great irony, their download links are dead.
And there is no Alkibiades to come back and save us from ourselves.
(Score: 0) by Anonymous Coward on Friday September 13 2019, @05:51AM (3 children)
Works for me. Did you try the ones here: https://github.com/lrq3000/dvdisaster/releases [github.com]
(Score: 2) by Reziac on Friday September 13 2019, @07:29AM (2 children)
Thanks; for some reason that refused to come up.
And there is no Alkibiades to come back and save us from ourselves.
(Score: 0) by Anonymous Coward on Friday September 13 2019, @02:41PM (1 child)
https://packages.debian.org/source/sid/dvdisaster [debian.org]
https://web.archive.org/web/20180428070843/http://dvdisaster.net/en/index.html [archive.org]
(Score: 0) by Anonymous Coward on Saturday September 14 2019, @01:05AM
According to the Debian package tracker, sid doesn't have the latest version either. What a mess.