Stories
Slash Boxes
Comments

SoylentNews is people

posted by chromas on Sunday April 19 2020, @10:04AM   Printer-friendly

Understanding RAID: How performance scales from one disk to eight:

One of the first big challenges neophyte sysadmins and data hoarding enthusiasts face is how to store more than a single disk worth of data. The short—and traditional—answer here is RAID (a Redundant Array of Inexpensive Disks), but even then there are many different RAID topologies to choose from.

Most people who implement RAID expect to get extra performance, as well as extra storage, out of all those disks. Those expectations aren't always rooted very firmly in the real world, unfortunately. But since we're all home with time for some technical projects, we hope to shed some light on how to plan for storage performance—not just the total number of gibibytes (GB) you can cram into an array.

A quick note here: Although readers will be interested in the raw numbers, we urge a stronger focus on how they relate to one another. All of our charts relate the performance of RAID arrays at sizes from two to eight disks to the performance of a single disk. If you change the model of disk, your raw numbers will change accordingly—but the relation to a single disk's performance will not for the most part.

[...] For all tests, we're using Linux kernel RAID, as implemented in the Linux kernel version 4.15, along with the ext4 filesystem. We used the --assume-clean parameter when creating our RAID arrays in order to avoid overwriting every block of the array, and we used -E lazy_itable_init=0,lazy_journal_init=0 when creating the ext4 filesystem to avoid contaminating our tests with ongoing background writes initializing the filesystem in the background.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by TheRaven on Monday April 20 2020, @01:42PM

    by TheRaven (270) on Monday April 20 2020, @01:42PM (#985082) Journal
    There are a few advantages that things like ZFS have over old-style RAID. One of the big ones is snapshots and incremental backups. For example:

    "With RAID you can't lose your data." Oh yes you can. It mirrors the deletes as well as the writes. For simplicity, I don't bother with RAID at home (I don't need enormous amounts of storage over multiple disks) but I have a sort of manual backup system where I have two identical disks partitioned exactly the same and I rsync from the live one to the spare. Obviously, it could be a cron job, but my machines are not up 24/7 these days, so there's no point. Also, the occasional rsync over the network to another machine is good in case one machine dies.

    With this model, one of your drives is more valuable than the other. If your backup dies, you lose nothing. If your live disk dies, you lose everything from the last rsync until the failure. If the main disk dies in the middle of an rsync, you can be left with a backup in an inconsistent state (one file corrupted, but if you have something that depends on the state across multiple files then things can get very exciting.

    With ZFS, you could have a two-drive mirror. Now your cron job takes a snapshot periodically and deletes old ones. You may be in an inconsistent state if an application is writing in the middle of your snapshot, but the snapshot is instantaneous and so you won't have a mixture of files from today and files from yesterday. You can keep multiple snapshots around and you can have them automatically appear in the .zfs directory in the root of any ZFS filesystem (ZFS filesystems are only slightly more expensive to create than directories so you can have one per user and so on). There is a load of tooling to automatically decay snapshots over time (e.g. keep hourly ones for a day, daily ones for a week, weekly ones for a month, and so on).

    This protects a lot against accidental deletion. The RAID bit protects you against individual drive failures. Now the last bit is protecting you against the computer being stolen / destroyed or having the kind of failure that fries multiple disks (e.g. a lightning strike on the house). That's what backups are for. Again, ZFS helps here. zfs send and zfs receive can send either an entire filesystem or all of the deltas between a pair of snapshots between machines. You can also turn a snapshot into a bookmark, which stops tracking all of the data, only the old metadata. You can't roll back to an old snapshot, but you can do an incremental send (it tracks which blocks have changed, but doesn't keep their old contents). I use zfsbackup-go as a wrapper around these, which generates streams with zfs send, chunks them, compresses them, encrypts them, and uploads them to cloud storage. The initial backup of a Terabyte or so was slow, but incrementally pushing a few GBs of new stuff isn't that bad. If you don't trust a cloud provider, you can send them to a friend's house.

    --
    sudo mod me up
    Starting Score:    1  point
    Moderation   +2  
       Interesting=1, Informative=1, Total=2
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4