Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday September 12 2019, @07:22PM   Printer-friendly
from the it-depends dept.

Web developer Ukiah Smith wrote a blog post about which compression format to use when archiving. Obviously the algorithm must be lossless but beyond that he sets some criteria and then evaluates how some of the more common methods line up.

After some brainstorming I have arrived with a set of criteria that I believe will help ensure my data is safe while using compression.

  • The compression tool must be opensource.
  • The compression format must be open.
  • The tool must be popular enough to be supported by the community.
  • Ideally there would be multiple implementations.
  • The format must be resilient to data loss.

Some formats I am looking at are zip, 7zip, rar, xz, bzip2, tar.

He closes by mentioning error correction. That has become more important than most acknowledge due to the large size of data files, the density of storage, and the propensity for bits to flip.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by istartedi on Thursday September 12 2019, @09:56PM (1 child)

    by istartedi (123) on Thursday September 12 2019, @09:56PM (#893382) Journal

    Whatever you want for compression, RAID 1 and/or multiple locations for redundancy. This fits with the "do one thing and do it well" idea. Not my field, but I'm not aware of any compression format that really takes redundancy into consideration since it's just going to make the compression... not so compressed. No free lunch. If you want redundancy, you're not going to get fantastic compression. Doubling the size of everything is not so bad these days. Storage is cheap, and reliable... but never reliable enough.

    --
    Appended to the end of comments you post. Max: 120 chars.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 1) by jon3k on Thursday September 12 2019, @11:01PM

    by jon3k (3718) Subscriber Badge on Thursday September 12 2019, @11:01PM (#893408)

    RAID 1 and/or multiple locations for redundancy.

    Say it with me: RAID is not a backup.

    All RAID configurations serve (up to) two purposes: performance and availability

    That's it. RAID1 will happily replicate accidental file deletions, file system corruption, bitrot, etc. So it's not "and/or". If you care about your data, you need to have multiple independent copies. My rule of thumb is this: three copies of all of your data, one of which is offsite.

    I have a NAS with 2x12TB HDD. Weekly rsync job copies one to the other. That's my first two copies. I also have two smaller hard drives, encrypted (LUKS) in external enclosures. Once a quarter I bring them home and copy my 12TB backup disk across those two disks and then immediately take it back offsite. That's my third, offsite, copy, in case my house burns down.

    Of course everyone gets to decide how valuable their data is and how much they want to spend to protect it.