Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Thursday May 21 2020, @11:29AM   Printer-friendly
from the let-the-competition-begin dept.

ZFS versus RAID: Eight Ironwolf disks, two filesystems, one winner:

This has been a long while in the making—it's test results time. To truly understand the fundamentals of computer storage, it's important to explore the impact of various conventional RAID (Redundant Array of Inexpensive Disks) topologies on performance. It's also important to understand what ZFS is and how it works. But at some point, people (particularly computer enthusiasts on the Internet) want numbers.

First, a quick note: This testing, naturally, builds on those fundamentals. We're going to draw heavily on lessons learned as we explore ZFS topologies here. If you aren't yet entirely solid on the difference between pools and vdevs or what ashift and recordsize mean, we strongly recommend you revisit those explainers before diving into testing and results.

And although everybody loves to see raw numbers, we urge an additional focus on how these figures relate to one another. All of our charts relate the performance of ZFS pool topologies at sizes from two to eight disks to the performance of a single disk. If you change the model of disk, your raw numbers will change accordingly—but for the most part, their relation to a single disk's performance will not.

[It is a long — and detailed — read with quite a few examples and their performance outcomes. Read the 2nd link above to get started and then continue with this story's linked article.--martyb]

Previously:
(2018-09-11) What is ZFS? Why are People Crazy About it?
(2017-07-16) ZFS Is the Best Filesystem (For Now)
(2017-06-24) Playing with ZFS (on Linux) Encryption
(2016-02-18) ZFS is Coming to Ubuntu LTS 16.04
(2016-01-13) The 'Hidden' Cost of Using ZFS for Your Home NAS


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by bradley13 on Thursday May 21 2020, @12:46PM (18 children)

    by bradley13 (3053) on Thursday May 21 2020, @12:46PM (#997348) Homepage Journal

    The test scenario - using 8 disks - seems aimed pretty specifically at NAS or mid-level servers. That's not enough disks to be a really large storage system, yet it's a lot more than you're going to have in a regular PC or Laptop. Still, the results are interesting: basically, ZFS wins in almost all scenarios. However, as the article indirectly points out: ZFS is a complicated beast, and correct configuration is essential.

    What I didn't see in the articles (and perhaps this has changed): I seem to recall that ZFS caches a lot of information in memory, and that a computer crash could seriously damage the file system. Not being a file system expert: is the true? If so, has this weakness been remedied?

    --
    Everyone is somebody else's weirdo.
    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 0) by Anonymous Coward on Thursday May 21 2020, @01:01PM (9 children)

    by Anonymous Coward on Thursday May 21 2020, @01:01PM (#997352)

    It is true for certain configurations. If you try to enable deduplication for 10 TB and have 1 gigabyte of RAM...

    • (Score: 5, Interesting) by Mojibake Tengu on Thursday May 21 2020, @01:49PM (8 children)

      by Mojibake Tengu (8598) on Thursday May 21 2020, @01:49PM (#997372) Journal

      I would not call a 10T dedup with 1G RAM a configuration. It is stupid badly done misconfiguration, actually. ZFS requires 8G RAM by default minimum recommendation on FreeBSD and we all know why.
      Porting it to a toy operating system which lacks important facilities like jails and delegations was a big mistake.
      While ZFS verily is the only true filesystem on Linux, it will not bring joy to its uncultured users.

      --
      Respect Authorities. Know your social status. Woke responsibly.
      • (Score: 3, Funny) by Runaway1956 on Thursday May 21 2020, @03:03PM

        by Runaway1956 (2926) Subscriber Badge on Thursday May 21 2020, @03:03PM (#997399) Journal

        it will not bring joy to its uncultured users.

        No worries, mate. Covid19 is here to culture everyone!!

      • (Score: 2, Insightful) by Anonymous Coward on Thursday May 21 2020, @03:10PM

        by Anonymous Coward on Thursday May 21 2020, @03:10PM (#997405)

        ZFS works just fine with lower than 8GB as long as you don't enable deduplication. Nowadays it's really a moot issue though. My next toaster will probably have 32 gigs!

      • (Score: -1, Troll) by Anonymous Coward on Thursday May 21 2020, @05:16PM (4 children)

        by Anonymous Coward on Thursday May 21 2020, @05:16PM (#997447)

        Wow, didn't realize FreeBSD is now so bad it needs 8GB RAM just for a filesystem. Thank goodness the talented developers are on Linux.

        • (Score: 2) by srobert on Thursday May 21 2020, @06:21PM (2 children)

          by srobert (4803) on Thursday May 21 2020, @06:21PM (#997483)

          It doesn't. There's a FreeBSD laptop on the kitchen table. It's got ZFS and only 4G Ram. My wife uses it for daily tasks. A single disk and not mirroring anything so under those circumstance one might ask, "what's the advantage over UFS?". I'm not sure there's an advantage, but after running that way for years, I can say there's no definitely no disadvantage.

          • (Score: 2, Informative) by DECbot on Thursday May 21 2020, @10:41PM (1 child)

            by DECbot (832) on Thursday May 21 2020, @10:41PM (#997627) Journal

            The advantage is zfs send to do backups of her laptop--with encryption on both the dataset and the data in transfer. Also, if you do automatic snapshots before any software is installed, updated, or removed you could then always undo those mistakes with a quick reboot and rollback instead of hours of fixing a botched install, update, deletion, etc.
             
            Performance, you could probably argue UFS is as performant or better, but the administration aspects of ZFS are superior.

            --
            cats~$ sudo chown -R us /home/base
            • (Score: 0) by Anonymous Coward on Thursday May 21 2020, @11:05PM

              by Anonymous Coward on Thursday May 21 2020, @11:05PM (#997633)

              ZFS wins in performance too, even if benchmarks say differently. CoW, caching, etc. make ZFS perform very well as a desktop filesystem.

        • (Score: -1, Troll) by Anonymous Coward on Thursday May 21 2020, @07:05PM

          by Anonymous Coward on Thursday May 21 2020, @07:05PM (#997509)
          Wow, didn't realize FreeBSD is now so bad it needs 8GB RAM just for a filesystem. Thank goodness the talented developers are on Linux.
          "I don't understand the topic but I have stupid things to say"

          ^^I fixed it for you.
      • (Score: 1) by tyler on Friday May 22 2020, @11:06AM

        by tyler (6335) on Friday May 22 2020, @11:06AM (#997785) Homepage
  • (Score: 2, Informative) by Anonymous Coward on Thursday May 21 2020, @02:53PM (6 children)

    by Anonymous Coward on Thursday May 21 2020, @02:53PM (#997394)

    ZFS was build for servers, with the mantra that you have UPS or battery backed cache... So yes, if is not as safe on crashes as other filesystems because of those designs. Other filesystems add barriers to force sync to the disk on correct times.
    But notice that you will not destroy all the FS with a crash, just have higher change of losing last writes, that on other FS may have been saved.

    Also notice that ZFS demand ECC ram (not requires, but it is one for the first things that experts ask, if you have a problem, and ignore you if you don't have it)... again, because servers have it and the design was build around those features.

    • (Score: 2) by hendrikboom on Thursday May 21 2020, @08:09PM (5 children)

      by hendrikboom (1125) Subscriber Badge on Thursday May 21 2020, @08:09PM (#997549) Homepage Journal

      I just happen to have hardware in a box somewhere with, I believe 1 gibibyte of ECC memory.
      Do I have a chance in the XFS league? I'm interested in high long-term reliability rather than performance. Performance would be nice, but preservation is priority.

      • (Score: 2) by hendrikboom on Thursday May 21 2020, @08:10PM

        by hendrikboom (1125) Subscriber Badge on Thursday May 21 2020, @08:10PM (#997551) Homepage Journal

        Whoops. Meant ZFS, not XFS. Damn people always pronouncing words like xerox as zerox.

      • (Score: 2) by TheRaven on Tuesday May 26 2020, @08:09AM (3 children)

        by TheRaven (270) on Tuesday May 26 2020, @08:09AM (#999136) Journal
        How much storage are you talking about? The general rule of thumb is 1GB of RAM per TB of storage, 2-4GB if you enable dedup. I have a NAS with 8GB of RAM and 8TB of storage (3x4TiB disks in a RAID-Z configuration) and it's completely fine, but I access it over WiFi, so the disks are rarely the bottleneck. People have used a tuned version of ZFS on machines with 4MB of RAM, though I don't know of anyone who's run the unmodified version on less than 32MB. ZFS uses a journal and is a copy-on-write filesystem. The former means that you get write amplification (every write has to be two writes, more if you are adding redundancy), the latter means that you get a lot of fragmentation, which means reads from spinning-rust disks hit their worst case (the same is true for SSDs, but the difference is less pronounced). As such, adding a lot of RAM for caching reads helps a lot, but it isn't essential. If you enable deduplication, you need to traverse the dedup tables on write, so that adds more to the amount of cache you need. Somewhat counter-intuitively, you also need more RAM if you add a L2ARC cache device because you need to keep the indexing structures for the L2ARC in RAM.

        There are some sysctls that let you tune the maximum size of the ARC. It's worth using them on a machine with a higher ratio of storage to RAM. If you're using it as a NAS and not using RAM for much apart from disk cache, you can leave them quite high, otherwise you may find you're swapping a lot because ARC is consuming all of your RAM and not leaving much for applications.

        --
        sudo mod me up
        • (Score: 2) by hendrikboom on Wednesday May 27 2020, @12:17AM (2 children)

          by hendrikboom (1125) Subscriber Badge on Wednesday May 27 2020, @12:17AM (#999429) Homepage Journal

          Currently about 2TB on two 4TB disks, using ext4 on software RAID (just the two copies of everything RAID). Probably more later.
          I'm trying to transition to keep my personal data in digital form instead of in paper books and vulnerable analog media.
          Wanting to have a check-summed file system as a hedge against long-term undetected disk degeneration.
          No need for deduplication.
          There is a need for backups not to become corrupted by being backed up from a corrupted hard drive.

          -- hendrik

          • (Score: 2) by TheRaven on Wednesday May 27 2020, @09:11AM (1 child)

            by TheRaven (270) on Wednesday May 27 2020, @09:11AM (#999548) Journal

            If you make sure you tune the ARC sysctls, it should work with ZFS. Given that it's a read-mostly workload, you will see slowdown from the fragmentation but not much from the write amplification. That said, if the motherboard can fit more RAM, it's worth doing it. I have a NAS with 8GB of RAM because that's the most that the mini-ITX board will handle, but I'd be very tempted to bump it up to 32GB if the board would take it.

            I'm using zfsbackup-go [github.com] for backups. It takes a snapshot and then uses zfs send to get the stream, chunks it, GPG encrypts it, compresses it, and sends it to cloud storage, so I can do incremental encrypted backups nice and easily.

            --
            sudo mod me up
            • (Score: 2) by hendrikboom on Wednesday May 27 2020, @12:50PM

              by hendrikboom (1125) Subscriber Badge on Wednesday May 27 2020, @12:50PM (#999588) Homepage Journal

              Current situation:

              That 1GB ECC system cannot take more RAM (and it's still in the box, unused).

              I have an ancient machine with two GB of RAM. It's what I'm using now. It has a ten-year-old AMD processor, and does not have error-detecting RAM. Not reaally suitable for a file system that keeps rewriting data after transient residence in unreliable memory. (what's reliable over the course of a week or even a year isn't necessarily reliable over a decade)

              I have a laptop with 16g of RAM. Hmmm. It can take two hard drives, though at present it has only one... a vague possibility? I think not, though it is my daily workhorse. Currently using one SSD, which I've heard is not the greatest for long-term data preservation. And the SSD is too small. Bulk files and critical files kept on the main server (the one with two GB of RAM) and accessed through sshfs. Some worries about file locking through sshfs; in particular, worries about simultaneous access to mbox files through a mail reader over sshfs and postfix, so I run my mail reader on the server over ssh -X. Frequently changed important files kept in distributed revision control (monotone, or, if to be publicly available, monotone *and* github)

              Nor is something likely to be left at coffee shops by mistake or stolen when I have to visit the coffee shop lavatory the best place for long-term preservation of data. I'm not *that* worried about others having read access to these data. I am that worried about losing access to it myself.

              Yes, I'll still need backups.

              -- hendrik

              Hmmm.... I could have a backup regime that checksums all files, writes the checksums to backup and complains if the file to be backed up has changed in the main file system without changing the last-updated time in the file system ... That would probably catch disk misreadings.
              Data might then survive bit flips on disk.

              Wouldn't need a new file system for this -- just a new backup program. And a very slow backup process, as it checksums every file on the hard drives.

              -- hendrik

  • (Score: 2) by TheRaven on Tuesday May 26 2020, @08:13AM

    by TheRaven (270) on Tuesday May 26 2020, @08:13AM (#999138) Journal

    I seem to recall that ZFS caches a lot of information in memory, and that a computer crash could seriously damage the file system

    I don't think that's true. ZFS does cache a lot in memory, but the cache is to speed up reads. Writes are written (quite quickly) to the ZFS Intents Log (ZIL), which is basically a circular buffer. They're then written out more slowly to the rest of the disk. ZFS supports using a separate log device to store the ZIL. If you don't mirror this, then failure of that disk will lose writes if it coincides with a system crash (if the log device fails but the system stays up, the contents of the log will just be reconstructed in the ZIL on other devices from RAM).

    ZFS uses an adaptive replacement cache (ARC) to improve read performance. This is a mix of LFU / LRU cache, which adapts to the workload. It also supports an L2 ARC on an external disk, so you can use a flash device as a read cache for your pool. If you want to do this, you'll need a bit more RAM, because ZFS keeps the indexing structures for the cache device in RAM.

    Recent versions of ZFS can enable compression for the ARC and L2ARC. This is generally worth doing. It increases the cost of an ARC hit slightly, but also increases the hit rate. There's a script in the FreeBSD package collection called zfs-stats [freshports.org] that lets you see the hit rates for the various caches.

    In general, you won't see any data loss (other than writes that have not yet been fsync'd) from a power loss. Once something is in the ZIL, it is persistent and will be recovered on the next reboot. I've had the power go out a load of times on ZFS systems and not noticed any problems as a result. ZFS has always had this property though, so I'm not sure where you got this belief from.

    --
    sudo mod me up