Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday September 12 2019, @07:22PM   Printer-friendly
from the it-depends dept.

Web developer Ukiah Smith wrote a blog post about which compression format to use when archiving. Obviously the algorithm must be lossless but beyond that he sets some criteria and then evaluates how some of the more common methods line up.

After some brainstorming I have arrived with a set of criteria that I believe will help ensure my data is safe while using compression.

  • The compression tool must be opensource.
  • The compression format must be open.
  • The tool must be popular enough to be supported by the community.
  • Ideally there would be multiple implementations.
  • The format must be resilient to data loss.

Some formats I am looking at are zip, 7zip, rar, xz, bzip2, tar.

He closes by mentioning error correction. That has become more important than most acknowledge due to the large size of data files, the density of storage, and the propensity for bits to flip.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by NickM on Thursday September 12 2019, @09:33PM (4 children)

    by NickM (2867) on Thursday September 12 2019, @09:33PM (#893368) Journal
    Why bzip2 instead of LZMA2? I don't get it, if you are willing to wait after an encoder why not wait a little more to get even better compression? If you wanted speed would you be using something based on DEFLATE anyway? I get the par2 part, I posted it before you, but can you please explain your choice of encoder.
    --
    I a master of typographic, grammatical and miscellaneous errors !
  • (Score: 0) by Anonymous Coward on Thursday September 12 2019, @11:05PM (3 children)

    by Anonymous Coward on Thursday September 12 2019, @11:05PM (#893411)

    Sorry I missed your par2 reference. I use bzip2 because it's built into gnu tar, which is FOSS and available everywhere I need it. Not implying that other compressors might be better, just that my overall scheme works best for me.

    (I got spoiled by OpenVMS BACKUP's built-in parity protection scheme as a pup - you could snip a reel-to-reel BACKUP tape in two, put new start-of-tape/end-of-tape markers on what was left and still recover virtually everything on the tapes).

    I also used DVDisaster when I was more heavily invested in optical media.

    • (Score: 0) by Anonymous Coward on Friday September 13 2019, @02:51AM (2 children)

      by Anonymous Coward on Friday September 13 2019, @02:51AM (#893496)

      7z is open source too. Which pretty much has all of them and its just as easy to use as tar.

      I came to also recommend par2. It is showing its age though. As it does have upper limits on the total archive sizes you can use. So you end up having to play carve your data up game to get the par's right.

      • (Score: 0) by Anonymous Coward on Friday September 13 2019, @03:37AM (1 child)

        by Anonymous Coward on Friday September 13 2019, @03:37AM (#893512)

        Agreed, 7z native compression is significantly better -- if you want to spend the cpu time. I prefer tar -j because archival and compression are a single step and helps me stay within par2's file count limitation. As I wrote, I feel my recommendation is a good solution overall given my requirements.

        • (Score: 0) by Anonymous Coward on Friday September 13 2019, @06:05AM

          by Anonymous Coward on Friday September 13 2019, @06:05AM (#893547)

          FWIW, later versions of tar support far more compression standards. Depending on the oldest OS you support, you could get better compression rates on new files by changing that switch. I believe "-J" for xz is supported by most Linuxes and BSDs, for example.