Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday September 12 2019, @07:22PM   Printer-friendly
from the it-depends dept.

Web developer Ukiah Smith wrote a blog post about which compression format to use when archiving. Obviously the algorithm must be lossless but beyond that he sets some criteria and then evaluates how some of the more common methods line up.

After some brainstorming I have arrived with a set of criteria that I believe will help ensure my data is safe while using compression.

  • The compression tool must be opensource.
  • The compression format must be open.
  • The tool must be popular enough to be supported by the community.
  • Ideally there would be multiple implementations.
  • The format must be resilient to data loss.

Some formats I am looking at are zip, 7zip, rar, xz, bzip2, tar.

He closes by mentioning error correction. That has become more important than most acknowledge due to the large size of data files, the density of storage, and the propensity for bits to flip.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by NateMich on Thursday September 12 2019, @07:49PM (11 children)

    by NateMich (6662) on Thursday September 12 2019, @07:49PM (#893293)

    I do not care about compression since LZ4 is now everywhere deep in the system where it belongs.

    Isn't LZ4's compression ratio pretty terrible though?
    It is very fast of course, but when your archives require twice as much space, that could be an issue.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by richtopia on Thursday September 12 2019, @08:27PM (10 children)

    by richtopia (3160) on Thursday September 12 2019, @08:27PM (#893318) Homepage Journal

    I am of the mindset that compression should be at the filesystem level, be it LZ4 or other. I would suggest checking out the btrfs wiki page on compression for more reading:

    https://btrfs.wiki.kernel.org/index.php/Compression [kernel.org]

    Since we are talking archiving, apply the common 3 2 1 backup scheme. Each system can have filesystem compression and de-duplication, but the end user does not need to think about this.

    • (Score: 2) by NateMich on Thursday September 12 2019, @08:53PM (7 children)

      by NateMich (6662) on Thursday September 12 2019, @08:53PM (#893336)

      I would suggest checking out the btrfs wiki page on compression for more reading

      I've used btrfs, and I don't think I'll be using it again. Certainly not for archiving anything.

      • (Score: 2, Informative) by Anonymous Coward on Thursday September 12 2019, @10:41PM (6 children)

        by Anonymous Coward on Thursday September 12 2019, @10:41PM (#893395)

        My experience with btrfs was Write Once, Read Never. The system went down cleanly and the disk wouldn't mount on reboot and all the utilities claimed it is hosed. We had a direct disk image from the previous reboot because it was a VM. Restore that, run an fsck and everything just in case, reboot, and everything was working. Attempt to apply updates, need to reboot, do so, disk refuses to mount. Try the whole process again, same result. Try again, same result. The first was enough alone to take it out of production, but the fact that the documentation was crap for troubleshooting and none of the developers seemed to give a shit about our reproducible case or blamed it on RAID, which we weren't even using in the VM, was the final nail in the coffin for btrfs. Well it would have been if another machine running btrfs with a different distro hadn't gotten the same refusal to mount during the transition after a reboot.

        • (Score: 3, Interesting) by barbara hudson on Thursday September 12 2019, @11:09PM (4 children)

          by barbara hudson (6443) <barbara.Jane.hudson@icloud.com> on Thursday September 12 2019, @11:09PM (#893414) Journal
          Most backups are Write Once, Read Never. I've had 25 drives fail on my personal computers, not counting work machines. Most of the time, the data is sitting on two or more drives, so a drive failure is just an excuse to rant at Seagate or Western. My current laptop doesn't have the motherboard connectors for a second drive, so the important stuff from this century has been pared down so it fits on a 16 gig USB key - and most of THAT is mp3 files, because classic rock is more important than code I wrote 20 years ago.

          Bought the 16 gig stick for $25 on special more than a decade ago. The same archive also sits on a 32 gig USB stick I paid $35 for 5 years ago, and on a 128 gig USB 3.1 stick I paid $30 for last year. Kingston flash drives are better value than hard drives nowadays - they just keep plugging along, and can boot off any machine, so recovery, when it happens, will be really fast (I keep a bootable distro or two on sticks because it's handy to try new stuff).

          --
          SoylentNews is social media. Says so right in the slogan. Soylentnews is people, not tech.
          • (Score: -1, Spam) by Anonymous Coward on Friday September 13 2019, @12:00AM (2 children)

            by Anonymous Coward on Friday September 13 2019, @12:00AM (#893430)

            I saw you fail against apk https://soylentnews.org/comments.pl?noupdate=1&sid=33430&page=1&cid=889582#commentwrap [soylentnews.org] and none of your bs worked. Huge fail for you. The biggest being the end of it where you try blame apk for stalking you but you are quoted years before showing you do and instigate others to also. Pitiful barb. Especially your fail on hosts punycode against unicode rare entries and on C/C++ fails in speed and security vs. pascal apk used for his hosts program.

            • (Score: 0) by Anonymous Coward on Friday September 13 2019, @12:05AM (1 child)

              by Anonymous Coward on Friday September 13 2019, @12:05AM (#893433)

              You're autistic.

              • (Score: 1, Touché) by Anonymous Coward on Friday September 13 2019, @01:57AM

                by Anonymous Coward on Friday September 13 2019, @01:57AM (#893474)
                I think you spelled "batshit crazy " wrong.
          • (Score: 0) by Anonymous Coward on Friday September 13 2019, @12:42AM

            by Anonymous Coward on Friday September 13 2019, @12:42AM (#893443)

            That's the thing. Most of the time a filesystem backup will suffice. Sometimes the file isn't there or that copy is corrupt. But most of the time the restore is due to far fingering the file.

            Obviously, you still need off-site backups as well, but you don't usually need them.

        • (Score: 1, Informative) by Anonymous Coward on Friday September 13 2019, @02:12AM

          by Anonymous Coward on Friday September 13 2019, @02:12AM (#893479)

          My experience with btrfs was Write Once, Read Never.

          Same experience here. Only with BTRFS raid. One disk started to throw off random bad sectors. Should be a simple case of "tell it to stop using bad disk", "swap disk", "tell it to start using new disk". Instead, entire BTRFS filesystem goes down hard when trying to tell it to stop using bad disk with no recovery possible.

          I'll never, ever, use BTRFS again.

    • (Score: 0) by Anonymous Coward on Thursday September 12 2019, @09:58PM (1 child)

      by Anonymous Coward on Thursday September 12 2019, @09:58PM (#893384)

      Heroic decision indeed.

      • (Score: 0) by Anonymous Coward on Friday September 13 2019, @02:19AM

        by Anonymous Coward on Friday September 13 2019, @02:19AM (#893481)

        I use btrfs as my near-line backup aggregator drive at home, and as my development filesystem at work. It has been generally good to me except when my home drive got corrupted. Not btrfs' fault, since the root cause was a burnt bit in the RAM, but flying without ECC is risky.

        Recovered the data from one of the read-only snapshots on the filesystem. Luckily those structures weren't written to.