Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday September 12 2019, @07:22PM   Printer-friendly
from the it-depends dept.

Web developer Ukiah Smith wrote a blog post about which compression format to use when archiving. Obviously the algorithm must be lossless but beyond that he sets some criteria and then evaluates how some of the more common methods line up.

After some brainstorming I have arrived with a set of criteria that I believe will help ensure my data is safe while using compression.

  • The compression tool must be opensource.
  • The compression format must be open.
  • The tool must be popular enough to be supported by the community.
  • Ideally there would be multiple implementations.
  • The format must be resilient to data loss.

Some formats I am looking at are zip, 7zip, rar, xz, bzip2, tar.

He closes by mentioning error correction. That has become more important than most acknowledge due to the large size of data files, the density of storage, and the propensity for bits to flip.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by SomeGuy on Thursday September 12 2019, @07:46PM (7 children)

    by SomeGuy (5632) on Thursday September 12 2019, @07:46PM (#893291)

    In practice, error correction for compressed archives is bullshit. Most error correction assumes a single bit gets flipped, but more often archives will have gaps of 512 bytes or larger due to bad sectors reading a hard disk, mysteriously missing areas in the center due to network copy errors, truncated due to uploads crapping out, every CR converted to CRLF because someone did not know how to use FTP, or such.

    Single bit errors usually happen in RAM, and from my own experience this messes things up either before or after the file is compressed/decompressed, so the archiver's error checking usually can't catch that

    Unfortunately, I have had both 7z and RAR change their default compression methods on me, and I think some ZIP software programs have tried to add some of their own. The problem is, you compress a file and then suddenly anyone with an "old" archiver cant uncompress your archive any more. (start assholes bitching about how everyone should always be using the latest and greatest here and then slap them upside the head because in the real world this is not always possible, especially when dealing with vintage or legacy systems.)

    Some might actually recommend NOT using compression AT ALL. Just put them uncompressed in a zip/7z/RAR container, where extraction in the worst case is trivial. As a bonus, if you are using a compressed/deduplicated file system, it may get much better compression.

    The only thing "resilient" to data loss is lots of backups, and constant checking and re-checking that what is supposed to be in the file(s) is what is actually there.

    Starting Score:    1  point
    Moderation   +4  
       Interesting=1, Informative=3, Total=4
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 2, Interesting) by Anonymous Coward on Thursday September 12 2019, @08:12PM

    by Anonymous Coward on Thursday September 12 2019, @08:12PM (#893307)

    Error correction for archives, compressed or not, isn't bs at all. Distributed storage with erasure codes easily handles bitrot and large errors (sector rot?). This comes at a cost of complexity and increased space but does allow for self-correcting of errors.

  • (Score: 0) by Anonymous Coward on Thursday September 12 2019, @10:49PM (5 children)

    by Anonymous Coward on Thursday September 12 2019, @10:49PM (#893401)

    https://github.com/lrq3000/dvdisaster [github.com]
    https://cdn.rawgit.com/lrq3000/dvdisaster/stable/dvdisaster/documentation/en/howtos20.html [rawgit.com]
    The program is intended for ISO images, but will happily (and quickly, which is important) create an ECC file for absolutely any file given in place of an input ISO, and just as happily and quickly fix damage in that file using the ECC file.

    • (Score: 3, Funny) by Reziac on Friday September 13 2019, @02:34AM (4 children)

      by Reziac (2489) on Friday September 13 2019, @02:34AM (#893489) Homepage

      ....and with great irony, their download links are dead.

      --
      And there is no Alkibiades to come back and save us from ourselves.
      • (Score: 0) by Anonymous Coward on Friday September 13 2019, @05:51AM (3 children)

        by Anonymous Coward on Friday September 13 2019, @05:51AM (#893544)

        Works for me. Did you try the ones here: https://github.com/lrq3000/dvdisaster/releases [github.com]

        • (Score: 2) by Reziac on Friday September 13 2019, @07:29AM (2 children)

          by Reziac (2489) on Friday September 13 2019, @07:29AM (#893557) Homepage

          Thanks; for some reason that refused to come up.

          --
          And there is no Alkibiades to come back and save us from ourselves.