Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday September 12 2019, @07:22PM   Printer-friendly
from the it-depends dept.

Web developer Ukiah Smith wrote a blog post about which compression format to use when archiving. Obviously the algorithm must be lossless but beyond that he sets some criteria and then evaluates how some of the more common methods line up.

After some brainstorming I have arrived with a set of criteria that I believe will help ensure my data is safe while using compression.

  • The compression tool must be opensource.
  • The compression format must be open.
  • The tool must be popular enough to be supported by the community.
  • Ideally there would be multiple implementations.
  • The format must be resilient to data loss.

Some formats I am looking at are zip, 7zip, rar, xz, bzip2, tar.

He closes by mentioning error correction. That has become more important than most acknowledge due to the large size of data files, the density of storage, and the propensity for bits to flip.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday September 13 2019, @03:19AM

    by Anonymous Coward on Friday September 13 2019, @03:19AM (#893506)

    This is basically what I use.

    Maximally xz-compressed squashFS image burnt to 25GB blu-ray (for capacity vs cost). Consider the sector-size of blu-ray (2048) and suppose how many blocks might be lost on a disc (runs of 5000? totals of 150 000 blocks?) Choose the par2 parameters accordingly, and create it across the squashfs image that you're going to write. When burning the CD, pass the burning program the squashfs image instead of an ISO file. (When mounting, mount -t squashfs /dev/sr0 /mnt/cdrom ..)

    Write the par information along with the next backup volume that you create (or next medium destined for another site). There are upper bounds to the number of blocks, so in the end block size becomes a couple of megabytes or you get a lower total of possible recovery data (I believe it's limited to ~63000 blocks, of the above-desired 150 000 blocks -- so larger blocks, which means if you have more disparate areal damage (rotted CD), then you may be unable to recover). This isn't a parity limitation, but a par2 design choice. (When you use a large number of blocks, it takes a __LOT__ of computational time.)

    I choose maximally-compressed XZ image because this is archival -- recurring storage cost always outweighs the initial compression time/cost, or you're not talking _archival_. (Someone mentioned not compressing: if you want _larger_ data for whatever sort of parity you have in mind, then just create more parity data.) I use squashfs instead of isofs bec. it natively allows compression. I think there was more to it, but can't recall what.

    I say create the par2 data across the ISO/squashfs image because if you can read anything, you can read sector data. If you're trying to do something filesystem-based (even cdrom filesystem), the majority of your data could be unusable because of a broken metadata block. If you can read the sectors, you have the best chance of recovery. DO NOT run par2 across a filesystem worth of data, because with subdirectories, the par2 data is generated assuming that all files, as-named, are in the current directory. If you have directories with different files with the same name, you've broken your parity blocks.