Slash Boxes

SoylentNews is people

posted by janrinok on Tuesday February 06, @03:51AM   Printer-friendly
from the confidentiality-integrity-and-availability dept.

Exotic Silicon has a detailed exploration of how and why to make long term backups.

The myth...

When thinking about data backup, many people have tended to fixate on the possibility of a crashed hard disk, and in modern times, a totally dead SSD. It's been the classic disaster scenario for decades, assuming that your office doesn't burn down overnight. You sit down in front of your desktop in the morning, and it won't boot. As you reach in to fiddle with SATA cables and clean connections, you realise that the disk isn't even spinning up.

Maybe you knew enough to try a couple of short, sharp, ninety degree twists in the plane of the platters, in case it was caused by stiction. But sooner or later, reality dawns, and it becomes clear that the disk will never spin again. It, along with your data, is gone forever. So a couple of full back-ups at regular intervals should suffice, right?

Except that isn't how it usually happens - most likely you'll be calling on your backups for some other reason.

The reality...

Aside from the fact that when modern SSDs fail they often remain readable, I.E. they become read-only, your data is much more likely to be at risk from silent corruption over time or overwritten due to operator error.

Silent corruption can happen for reasons ranging from bad SATA cables and buggy SSD firmware, to malware and more. Operator error might go genuinely un-noticed, or be covered up.

Both of these scenarios can be protected against with an adequate backup strategy, but the simple approach of a regular, full backup, (which also often goes untested), in many cases just won't suffice.

Aspects like the time interval between backups, how many copies to have and how long to keep them, speed of recovery, and the confidentiality and integrity of said backups are all addressed. Also covered are silent corruption, archiving unchanging data, examples of comprehensive backup plans, and how to correctly store, label, and handle the backup storage media.

Not all storage media have long life spans.

Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Tuesday February 06, @05:20PM (1 child)

    by Anonymous Coward on Tuesday February 06, @05:20PM (#1343345)

    However I'm surprised ZFS or any other copy-on-write filesystem are not even mentioned.
    ZFS manages snapshots, bit-rotten errors, and a plethora of RAID configurations.
    That alone covers all operator errors, and together with a good backup system, it covers pretty much every scenario worth considering.

    Automatic snapshots work very well to deal with "oh shit, I accidentally deleted that data I needed". You don't need fancy filesystems for this, you can use LVM with a traditional filesystem to achieve the same thing.

    Automatic snapshots don't do well to deal with "oh shit, I accidentally ran a job that filled the entire disk", as unless you notice the problem before the next snapshot now you need some administrator intervention to delete data from the snapshots to free up disk space to make it usable again.

    Both scenarios happen with alarming regularity, but I think on balance automatic snapshots are worth the trouble.

    My personal experience with btrfs is not good. Linux 5.2 had a bug that would destroy an entire btrfs filesystem if you hit it (which I did. Not to say that other filesystems are bug-free, but btrfs is the only one I've seen destroyed so thoroughly), and (more importantly) the filesystem seems to require basically continuous manual intervention to maintain acceptable performance.

    Only on btrfs have I run into scenarios where df tells you there's gigabytes of free space available but then when you try to create a new file it fails with -ENOSPC, so you try to delete a file and that also fails with -ENOSPC, so hopefully you have root access to manually run some arcane balance command to make the system work again. Btrfs people will tell you that this behaviour of df is eminently reasonable and they will tell you the question of "how much disk space is available?" is inherently ill-specified which probably makes perfect sense to btrfs people but the behaviour is totally incomprehensible to normal people who do not dedicate their lives to researching copy-on-write filesystems.

    I don't have enough personal experience with ZFS to know if it fares significantly better. But if I need copy-on-write snapshots on Linux I'll take LVM+XFS over btrfs any day of the week.

  • (Score: 2) by owl on Wednesday February 07, @03:51AM

    by owl (15206) on Wednesday February 07, @03:51AM (#1343453)

    After having had btrfs ruin an entire array because one disk started doing normal "disk failure" things (random read errors) for which a specific Linux device driver existed in Linux before btrfs did and which allows one to simulate just exactly this failure mode (so it could have been tested against) I decided that the btrfs developers were not to be trusted and simply unable to produce a sensible filesystem. Therefore I will never again ever run btrfs on any disk. It is on my short list of banned filesystems (hint: the list is one item long).