Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday February 19 2016, @03:23AM   Printer-friendly
from the someday-coming-to-a-phone-near-you? dept.

For those Linux folks out there, imagine merging LVM2, dm-raid, and your file system of choice into an all powerful, enterprise ready, check-summed, redundant, containerized, soft raid, disk pool, ram hungry, demi-god file system. The FreeBSD Handbook is a good start to grep the basic capabilities and function of ZFS[*].

The Ars reports:

A new long-term support (LTS) version of Ubuntu is coming out in April, and Canonical just announced a major addition that will please anyone interested in file storage. Ubuntu 16.04 will include the ZFS filesystem module by default, and the OpenZFS-based implementation will get official support from Canonical.
...
ZFS is used primarily in cases where data integrity is important—it's designed not just to store data but to continually check on that data to make sure it hasn't been corrupted. The oversimplified version is that the filesystem generates a checksum for each block of data. That checksum is then saved in the pointer for that block, and the pointer itself is also checksummed. This process continues all the way up the filesystem tree to the root node, and when any data on the disk is accessed, its checksum is calculated again and compared against the stored checksum to make sure that the data hasn't been corrupted or changed. If you have mirrored storage, the filesystem can seamlessly and invisibly overwrite the corrupted data with correct data.

ZFS was available as a technology preview in Ubuntu 15.10, but the install method was a bit more cumbersome than just apt-get install zfsutils-linux. I for one am excited to see ZFS coming to Linux as it is a phenomenal solution for building NAS devices and for making incremental backups of a file system. Now I just wish Ubuntu would do something about the systemD bug.

[*] According to Wikipedia:

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

ZFS was originally implemented as open-source software, licensed under the Common Development and Distribution License (CDDL). The ZFS name is registered as a trademark of Oracle Corporation.

OpenZFS is an umbrella project aimed at bringing together individuals and companies that use the ZFS file system and work on its improvements.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by captain normal on Friday February 19 2016, @04:27AM

    by captain normal (2205) on Friday February 19 2016, @04:27AM (#306713)

    Is this a part of system D or an option? Is this another step into melding Ubuntu into Win 10?

    --
    When life isn't going right, go left.
    • (Score: 2) by DECbot on Friday February 19 2016, @04:45AM

      by DECbot (832) on Friday February 19 2016, @04:45AM (#306715) Journal

      I'd say, despite systemD, ZFS comes to Ubuntu. ZFS works perfectly fine on BSD without a modern init. You can even run the root filesystem on ZFS.

      --
      cats~$ sudo chown -R us /home/base
    • (Score: 1, Funny) by Anonymous Coward on Friday February 19 2016, @12:03PM

      by Anonymous Coward on Friday February 19 2016, @12:03PM (#306831)

      ZFSystemd- where your binary logs can reside in a better filesystem.

    • (Score: 0) by Anonymous Coward on Saturday February 20 2016, @11:10PM

      by Anonymous Coward on Saturday February 20 2016, @11:10PM (#307565)

      More like in opposition to, as the head honcho has a hardon for BTRFS.

  • (Score: 0) by Anonymous Coward on Friday February 19 2016, @05:25AM

    by Anonymous Coward on Friday February 19 2016, @05:25AM (#306732)

    A single flipped bit in RAM in a single file can hose an entire ZFS filesystem. This is because the error is spread across the whole filesystem through the checksums used to verify file integrity. ZFS is designed not to trust HDDs and always trust the rest of the computer.

    Now while I'd love to see ECC RAM in every box, commodity hardware just doesn't support it.

    https://duckduckgo.com/?q=zfs+ecc+ram [duckduckgo.com]

    • (Score: 1) by Francis on Friday February 19 2016, @05:39AM

      by Francis (5544) on Friday February 19 2016, @05:39AM (#306736)

      That's not true. A single flipped bit isn't going to take down an entire zfs file system. It would be nice to see a credible source on the matter.

      • (Score: 1, Touché) by Anonymous Coward on Friday February 19 2016, @06:30AM

        by Anonymous Coward on Friday February 19 2016, @06:30AM (#306752)

        sudo cat $GPP > matter.c

    • (Score: 3, Informative) by Anonymous Coward on Friday February 19 2016, @06:54AM

      by Anonymous Coward on Friday February 19 2016, @06:54AM (#306757)

      This question was answered in a nuanced manner on the BSD Now show "ZFS in the trenches" and different developers have different opinions but all recommended ECC if possible. In the episode Josh Paetzel of FreeNAS explains that if the ZFS spacemaps were corrupted due to memory errors you could be in a worse situation compared to less sophisticated filesystems that have corrupt data an fsck would destroy in an attempt to recover the rest [youtu.be].

      • (Score: 2, Insightful) by pTamok on Friday February 19 2016, @10:58AM

        by pTamok (3042) on Friday February 19 2016, @10:58AM (#306807)

        Is there a text article somewhere that goes into these issues?

        I'm afraid I have an old-school preference for text over video. It suits the reading and comprehension process I have honed over a lifetime.

        I am trying very hard not to make a value-judgement about the utility of video as an information transfer mechanism, and I know I'm probably partially ossified in my ways.

        • (Score: 2, Insightful) by Francis on Friday February 19 2016, @08:55PM

          by Francis (5544) on Friday February 19 2016, @08:55PM (#307065)

          Not just that, you can skim through quickly to get a rough sense of what's going on before going into things in depth. Greatly improves the comprehension.

        • (Score: 2) by darkfeline on Saturday February 20 2016, @01:33AM

          by darkfeline (1030) on Saturday February 20 2016, @01:33AM (#307203) Homepage

          I used to think that, until I discovered the joys of playing YouTube videos using mpv at 1.33x speed or faster. You can nudge the speed faster or slower depending on the content, but I've found the average informational video works well at 1.33x.

          --
          Join the SDF Public Access UNIX System today!
    • (Score: 0) by Anonymous Coward on Friday February 19 2016, @08:05AM

      by Anonymous Coward on Friday February 19 2016, @08:05AM (#306772)

      Now while I'd love to see ECC RAM in every box, commodity hardware just doesn't support it.

      To be fair plenty of commodity hardware does. But you got me thinking so I thought I'd check against my semi-recent motherboards.

      My GA-Z77X-D3H (Intel 1155) motherboard doesn't, but my GA-MA770T-UD3P (AMD AM3+) does.

    • (Score: 2) by ledow on Friday February 19 2016, @11:14AM

      by ledow (5567) on Friday February 19 2016, @11:14AM (#306810) Homepage

      A single flipped bit in RAM could hose your EFI and kill your laptop, if recent events are anything to go by.

      A single flipped bit in RAM is BAD. As in most likely you'll blue-screen or kernel-panic no matter what the OS or filesystem, and then you're into data loss and corruption that no RAID BBU will save you from.

      Though ZFS might be technically worse in that scenario, it's already game-over time for continuing operations, it probably means you're going to lose all service to that PC, have a disk check on next boot at minimum and maybe/maybe not lose the database you were writing to no matter what the filesystem.

      Either way, you want to not be relying on that never happening. So ECC RAM or redundant storage and servers at minimum. Once you get there, ECC RAM is bog-standard and a pitiful percentage of the overall cost.

      Home PC's? Yeah, maybe that's a problem. But to be honest, you're looking at a "power-user" use case to be using Ubuntu LTS anyway, especially if you're using ZFS, and especially if you're using it on a PC without ECC.

      So, really, it's kind of a niche worry.

      I speak as someone who had to have a nice guy from IBM/Lenovo visit recently with a very expensive (per GB, was free replacement for me) RAM chip when one of our blades BSOD'd for safety when it detected an ECC error.
      Services were affected for about a minute until the replicas took over. We didn't trust the server that were serviced and made it pull back all the replicas from the other machines once it was back up, and checked them. And I'm part of a two-man team in a school, so it's hardly top-end stuff.

      A single-bit flip in your RAID card onboard RAM could do untold damage.
      A single-bit flip in your RAM isn't that much better.

      If it matters to you, only run with ECC and monitoring tools enabled.

    • (Score: 2) by pendorbound on Friday February 19 2016, @01:56PM

      by pendorbound (2688) on Friday February 19 2016, @01:56PM (#306889) Homepage

      In theory sure, ECC is nice, and ZFS makes some assumptions about RAM integrity. But in practice?

      I've run ZFS for about 8 years with around 8TB pool size on non-ECC RAM with no troubles. OS was OpenSolaris, then FreeBSD, and finally Gentoo Linux for about the last four years. No lost pools, and no ZFS scrub errors that weren't immediately attributable to failing drives.

      I've moved to ECC for the last year or so now because I got a deal on a server on eBay I couldn't turn down, but non-ECC RAM isn't going to make all your pr0n evaporate overnight when a cosmic ray hits your motherboard...

    • (Score: 2) by zeigerpuppy on Friday February 19 2016, @10:21PM

      by zeigerpuppy (1298) on Friday February 19 2016, @10:21PM (#307117)

      This is the clearest description of why ZFS needs ECC that I have seen.
      https://pthree.org/2013/12/10/zfs-administration-appendix-c-why-you-should-use-ecc-ram/ [pthree.org]

      I've been running ZFS in two servers for the last 3 years.

      It's been possible to install ZFSonLinux with little difficulty for at least this long (my system is based on Debian - no systemd)

      It is awesome and I've had no major issues but potential users should also be aware that not all hard disk controllers are created equal.
      ZFS must access the drives at a low level to do its checksumming.
      Some controllers and drivers get in the way of this.
      I have had problems with supermicro onboard SAS controllers for instance.

      The LSI SAS controllers are awesome and quite cheap too.
      Here's a bit more discussion: http://zfsguru.com/forum/buildingyourownzfsserver/21 [zfsguru.com]

      In summary, ZFS is fantastic, reduces administration overhead and is mature on Linux. However, it requires a level of understanding that is not for the casual user. ECC is required and a good understanding of how caches and hardware commands interact. Also, if you're using SSD as an intents log, it needs to be battery backed up (Intel S3x00 for instance)

      By all means use it, but let's put this issue to rest, ZFS should not be considered safe on any old hardware.

  • (Score: 2, Interesting) by beardedchimp on Friday February 19 2016, @10:39AM

    by beardedchimp (393) on Friday February 19 2016, @10:39AM (#306800)

    Could anyone give me use cases where I would be better off using ZFS instead of BTRFS?

    • (Score: 0) by Anonymous Coward on Friday February 19 2016, @11:40AM

      by Anonymous Coward on Friday February 19 2016, @11:40AM (#306817)

      When you're sending zfs snapshots to freebsd or solaris....

    • (Score: 2) by isostatic on Friday February 19 2016, @12:38PM

      by isostatic (365) on Friday February 19 2016, @12:38PM (#306834) Journal

      How stable is BTRFS?

      ZFS went through teething problems many years ago -- http://www.uknof.org.uk/uknof13/Bird-Redux.pdf [uknof.org.uk]

      First failure:

      “Uberblock” and superblock got corrupted
      Uberblock is for finding the superblock
      Due to “copy on write”, previous copies of the
      superblock are still on disk
      Sun rolled it back to a previous copy, by
      using `dd` to edit bytes on disk (!!)

      Second failure:

      This time they had better utilities
      Fixed with less pain
      Still not something the end user can do
      However, Stuff was online again!

      • (Score: 2) by rleigh on Friday February 19 2016, @10:11PM

        by rleigh (4887) on Friday February 19 2016, @10:11PM (#307109) Homepage

        While the presentation may be missing some detail, it says that they ran ZFS on top of a hardware RAID array, which is not a recommended configuration. That might have been a contributing factor to the failure. It's recommended to run on plain discs so it can manage the RAID itself, e.g. for resilver. On the server I have, I reflashed the LSI HBA BIOS to put it in IT mode specifically so it was a dumb controller which would give ZFS direct access to all the storage. I could then add all the individual discs as components of each zvol, whereas for the hardware RAID case, you'll just see a single block device for each volume.

    • (Score: 2) by tangomargarine on Friday February 19 2016, @02:53PM

      by tangomargarine (667) on Friday February 19 2016, @02:53PM (#306925)

      The few times I've tried BTRFS on my home desktop it took less than a month to crap its pants so I've been staying on ext4.

      --
      "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
      • (Score: 0) by Anonymous Coward on Saturday February 20 2016, @11:15PM

        by Anonymous Coward on Saturday February 20 2016, @11:15PM (#307568)

        Still on EXT3 here, as EXT4 has had some "oddities" recently.

    • (Score: 2) by rleigh on Friday February 19 2016, @07:45PM

      by rleigh (4887) on Friday February 19 2016, @07:45PM (#307043) Homepage

      When would you be better off using ZFS over Btrfs? I'll be quite blunt and say this: for every case.

      ZFS is simply a much better filesystem. It's more robust, it's more featureful, and it's generally better performing. Btrfs still has major bugs and performance problems, and lacks a lot of the more advanced features of ZFS. I'd say use Btrfs if, and only if, you want to use one of the few features it has which ZFS does not (there are a couple of things, not that I've needed them myself).

  • (Score: 3, Insightful) by VLM on Friday February 19 2016, @12:45PM

    by VLM (445) on Friday February 19 2016, @12:45PM (#306839)

    Wouldn't it be easier to just move to freebsd rather than use freebsd's docs to use ubuntu which is a heavily modified Debian? I had no real problems when I moved to freebsd. So you type "pkg" instead of "apt-get", so what?

    It certainly works well. I like to do a zpool scrub zroot before an upgrade, then snapshot, then upgrade, then reboot, then verify all is well, then zap my snapshot. That's a fairly typical desktop workflow.

    On the blog-o-sphere theres a stereotypical flamewar about dedupe eating gigs of ram and exploding when theres a ram error which I don't care about because 1) ram is cheap 2) SSD is expensive. The price ratio of ram to spinning rust is painful for ZFS but painless for SSD. So its trivial to buy more spinning rust than you have dedupe ram to handle it, but since I haven't put spinning rust on a desktop in years.... Also because I don't get a F about dedupe, I don't use it, which kind of eliminates that entire class of problem, LOL.

    • (Score: 3, Informative) by TheRaven on Friday February 19 2016, @03:06PM

      by TheRaven (270) on Friday February 19 2016, @03:06PM (#306931) Journal

      Dedup is really useful if you're doing a lot of backups, because most backup utilities will copy the entirety of any file that's changed. With ZFS, if you modify a byte in a 1GB file (e.g. a VM image), then the next backup will contain a copy of the 1GB file, but will only take one block (typically 128KB) of space. This is, unfortunately, broken with the Windows backup utility, which stores backups as zip files.

      Without dedup, you typically want 1GB of RAM per TB of hard disk. With dedup, you typically want 2GB. This isn't particularly expensive. Most small installations will be under 10TB, and 16GB of RAM is a lot cheaper than the disks (the real annoyance is that a lot of mini-ITX boards can't handle more than 8GB). If you want to cut costs a bit, L2ARC and log device on SSD are a huge performance win. Our dev machines are 24-core rack-mounted systems with 256GB+ of RAM, 512GB of flash and mirrored 3TB spinning rust. The flash is mostly L2ARC, and a couple of GBs acting as a log device. Each developer typically has a clone of the same few git repos in their home directory, so dedup is a big win there (also good for performance, as you only need one copy in the L2ARC and one in the ARC for everyone to share).

      I don't really know why you'd choose to use Ubuntu though. ZFS is fairly mature now, but the integration between ZFS and the rest of the VFS is fairly subtle and it took quite a while for it to stabilise in FreeBSD. I doubt Linux is upstreaming many changes that simplify life for a filesystem that can't be distributed with the kernel, so I would still be quite hesitant to trust data on Linux with ZFS for a while.

      --
      sudo mod me up
      • (Score: 2) by rleigh on Friday February 19 2016, @07:51PM

        by rleigh (4887) on Friday February 19 2016, @07:51PM (#307048) Homepage

        I started out with ZFS on Linux, before moving the discs over into a new FreeBSD server and importing the pool. Easiest data migration I've ever done, particularly between operating systems. Like you I don't bother with dedup.

        While ZFS on Linux is certainly less mature than on FreeBSD, and lacks the nice integration you get there, my experience of it was that it's pretty solid. (I originally got into ZFS from work colleagues who had some pretty serious ZFS pools, all on Linux. They all swear by it and have been using it for several years.)

      • (Score: 2) by darkfeline on Saturday February 20 2016, @01:41AM

        by darkfeline (1030) on Saturday February 20 2016, @01:41AM (#307205) Homepage

        A lot of modern backup utilities do dedup now.

        attic, bup, even rsync offers limited dedup through hard links.

        I think you can see some of that MIT vs New Jersey conflict here. You can either have a simple filesystem like ext4 that lets other tools handle dedup, backup, snapshots ("worse is better") or you can try to put all of that into the filesystem ("the right thing").

        Personally, I'm a fan of the former. When something goes wrong with, e.g., my backups, I like knowing that it's not the filesystem that fucked up. I can debug a backup tool, but debugging a filesystem is a little beyond me, and debugging a filesystem with dedup and snapshots built in is definitely beyond me. Best to keep each abstraction level simple, even if it means we need to add a few more abstraction levels on top.

        --
        Join the SDF Public Access UNIX System today!
        • (Score: 2) by TheRaven on Saturday February 20 2016, @01:04PM

          by TheRaven (270) on Saturday February 20 2016, @01:04PM (#307362) Journal

          attic, bup, even rsync offers limited dedup through hard links.

          This gives you file-level dedup, not block-level dedup. As I said, if you modify a 1GB file, then you now have a new file, even if you've only changed one byte. File-level dedup won't help you, you'll still have a new 1GB copy for each backup. In contrast, with block-level dedup (which can only be done in a filesystem, or in a file that is effectively a filesystem), you will only use disk space for the block that has changed. With ZFS, that's typically 128KB. If you're backing up VM images, or other large files that frequently change by a small amount, then that's a huge win.

          When something goes wrong with, e.g., my backups, I like knowing that it's not the filesystem that fucked up.

          Do you? Does your filesystem include block-level checksums? Does your filesystem support storing redundant copies so that it can recover from single-block errors on the disk? Does your filesystem, in fact, provide any guarantees about data integrity? If there's a single-block error in the middle of a file in a complex format that your backup tool is using to implement block-level deduplication, how do you expect to recover?

          I can debug a backup tool

          Can you? Why do you think that debugging a backup tool that is implementing half of a filesystem itself is going to be simpler than debugging a filesystem?

          --
          sudo mod me up
          • (Score: 2) by darkfeline on Sunday February 21 2016, @01:55AM

            by darkfeline (1030) on Sunday February 21 2016, @01:55AM (#307628) Homepage

            No, attic and bup both do "block"-level dedup, for configurable block sizes, of course.

            >checksums, etc
            Look, there will always be classes of hardware failure that cannot be recovered from using integrity checking, etc. You will always need backups. I'd rather those backups not be complicated by other factors. Like that one comment discussing ECC RAM, I'd rather not have a well-meaning error correction routine actually destroying my data. Simple is best.

            >debugging
            Yes. Yes I can. A backup tool is not implementing half a filesystem, for the sole reason that it is running on a complete filesystem. Show me a backup tool that reimplements, e.g., block management, and I'll stop using that tool posthaste.

            Have you written a filesystem before? I have. There's a lot of shit going on, and abstracting that away is the purpose of a filesystem. Snapshots, checksums, etc. are NOT part of that purpose.

            --
            Join the SDF Public Access UNIX System today!
            • (Score: 2) by TheRaven on Sunday February 21 2016, @06:56PM

              by TheRaven (270) on Sunday February 21 2016, @06:56PM (#307817) Journal
              First you say:

              No, attic and bup both do "block"-level dedup, for configurable block sizes, of course.

              And then you say

              Show me a backup tool that reimplements, e.g., block management

              So it sounds as if you're answering your own questions before you even ask them.

              --
              sudo mod me up
              • (Score: 2) by darkfeline on Monday February 22 2016, @12:46AM

                by darkfeline (1030) on Monday February 22 2016, @12:46AM (#307934) Homepage

                Filesystem block management involves allocating, freeing and the tracking thereof of storage blocks on a hardware device.

                What you're looking for, "block"-level dedup, is: "If two files between backups are 90% similar, there shouldn't be two duplicate copies of that 90% data". attic and bup deduplicate this data by indexing/hashing pieces of files. This is not filesystem block management; attic and bup do not manage allocation and freeing of storage space on hardware devices. However, it achieves the goal of what you refer to as "block-level dedup"; that is, two files sharing the same piece of data will only have that piece stored once, across all backups.

                So yeah, you don't understand filesystems.

                --
                Join the SDF Public Access UNIX System today!
                • (Score: 2) by TheRaven on Monday February 22 2016, @02:55PM

                  by TheRaven (270) on Monday February 22 2016, @02:55PM (#308174) Journal

                  What you're looking for, "block"-level dedup, is: "If two files between backups are 90% similar, there shouldn't be two duplicate copies of that 90% data"

                  That is absolutely not what block-level dedup means. Block-level deduplication means building a hash table of all blocks in the system and replacing ones that match existing blocks with a reference to them with the old blocks, irrespective of which file the original blocks came from. This requires storing the backups using something like an inode structure and either creating new blocks of data and references to them. If your backup software is doing this, then it's doing close to what a filesystem needs to do (though it may be able to simplify some things because it likely doesn't have to deal with any kind of concurrency). If it's not doing this, then it's not doing block-level deduplication.

                  It sounds like you might need to read a bit more about filesystems before you continue this discussion.

                  --
                  sudo mod me up
  • (Score: 1) by tbuskey on Friday February 19 2016, @02:27PM

    by tbuskey (6127) on Friday February 19 2016, @02:27PM (#306902)

    I've been using ZFS on Solaris, OpenSolaris and later http://zfsonlinux.org/ [zfsonlinux.org] with Ubuntu and now with CentOS.

    The GPL on the Linux kernel and the CDDL on the ZFS code preclude binary distribution of the ZFS code. So on Linux, installation is a source code download, compile into a module with the kernel source and install. Both the Ubuntu and CentOS installs are simple. For CentOS, it's yum install or yum update. I have run into some issues doing it too soon with a kernel update and reboot, but overall it just works.

    With Linux, I only do ZFS on my data disks. I can reinstall the OS on ext/xfs. If I did ZFS on the OS disks and there was an issue, I wouldn't be able to boot.

    Dedupe in ZFS isn't worth it 99% of the time IMO. I was using it for a 2TB mirror on a 3 GB Opensolaris system. A power outage corrupted & locked up ZFS. I needed to put the disks into another system with more RAM in order to reimport the data. The 3GB system would crash with an out of RAM issue. It took 2-3 days of running as well. Solaris has better ram management than Linux. Dedupe only saved about 10% in my case. Not worth it.

    ZFS doesn't like power outages. Put a UPS on w/ a clean shutdown before the battery runs down.

    ZFS must have redundancy that it controls. When a checksum on the data is wrong, it will copy the data from the other copy(ies) and fix it. If it can't fix it, it will lock the file system so you don't lose data.

    Hardware RAID gets in the way. Use JBOD and ZFS will heal.

    Compression is worth it. It can speed up operations.

    ZFS will fail more drives on you because it detects errors. SMART errors are not always real. ZFS checks everything and detects real problems. Trust it.

    ZFS recovery is decent. Sometimes a drive is marked bad. I can clear it & resilver so the data gets put on other sectors. That is often enough to get the drive running for awhile longer. I only do that at home, not work. Sometimes a reboot resets things too.

    NFS/Samba/autofs startup on boot & ZFS. I find it happens too soon. I havent bothered to fix it other than doing it manually after I know ZFS is up properly.

    I've been very happy with ZFS and with it on Linux.

    • (Score: 2) by TheRaven on Friday February 19 2016, @03:13PM

      by TheRaven (270) on Friday February 19 2016, @03:13PM (#306936) Journal

      Dedupe in ZFS isn't worth it 99% of the time IMO

      This is very workload dependent. I replaced 3 2TB disks with 3 4TB disks (RAID-Z) in my home NAS, because they were 90% full (which ZFS likes even less than most UNIX filesystems). I wanted to do a clean reinstall anyway, so I transferred the data with zfs send / zfs receive and sent a load of non-deduplicated filesystems to deduplicated filesystems. The result was saving about 25% of the total space consumption. For some filesystems it was over 50%.

      ZFS doesn't like power outages

      Why do you say this? The ZFS intent log allows data recovery in case of a power outage and RAID-Z doesn't suffer from the RAID-5 write hole. I've had the power go out a few times and each time zpool status -v shows that the scrub has done a little bit of cleanup, but I've not lost any data, other than things that were not yet committed to disk.

      Compression is worth it. It can speed up operations.

      FreeBSD now defaults to lz4, I think OpenSolaris does too, not sure about Linux. This is important, because gzip compression can save a bit more disk space but is usually slower. A vaguely modern CPU can decompress lz4 faster than anything other than a high-end PCIe flash device can provide it, so it will improve your disk bandwidth. If you're using compression, it's quite important to use the same compression algorithm everywhere, as the blocks are deduplicated after compression.

      --
      sudo mod me up
      • (Score: 1) by tbuskey on Sunday February 21 2016, @11:59PM

        by tbuskey (6127) on Sunday February 21 2016, @11:59PM (#307923)
        Dedupe in ZFS isn't worth it 99% of the time IMO

        This is very workload dependent. I replaced 3 2TB disks with 3 4TB disks (RAID-Z) in my home NAS, because they were 90% full (which ZFS likes even less than most UNIX filesystems). I wanted to do a clean reinstall anyway, so I transferred the data with zfs send / zfs receive and sent a load of non-deduplicated filesystems to deduplicated filesystems. The result was saving about 25% of the total space consumption. For some filesystems it was over 50%.

        My number is pulled out of my hat of course. I had a power failure take down my server. I couldn't recover my 1 TB deduped zvol on my 3 GB system. Even running from the OpenSolaris DVD. I had to put the disks into my 8 GB system and it took over 24 hours to import it. I don't think that's ok even at home, so I don't do dedupe.

        ZFS doesn't like power outages

        Why do you say this? The ZFS intent log allows data recovery in case of a power outage and RAID-Z doesn't suffer from the RAID-5 write hole. I've had the power go out a few times and each time zpool status -v shows that the scrub has done a little bit of cleanup, but I've not lost any data, other than things that were not yet committed to disk.

        Every time I've lost power with ZFS, I've had to do a ZFS import to get my data again. I had my data, uncorrupted, but I had to do something special. So I use a UPS that does a clean shutdown like I should've been doing. So, yes, you don't lose data, but it doesn't come back up w/o a simple command. I've had it take an hour. I imagine other filesystems might have silent corruption.

        Compression is worth it. It can speed up operations.

        FreeBSD now defaults to lz4, I think OpenSolaris does too, not sure about Linux. This is important, because gzip compression can save a bit more disk space but is usually slower. A vaguely modern CPU can decompress lz4 faster than anything other than a high-end PCIe flash device can provide it, so it will improve your disk bandwidth. If you're using compression, it's quite important to use the same compression algorithm everywhere, as the blocks are deduplicated after compression.

        lz4 is in the OpenZFS spec and I think ZoL implements it. I though the original compression was something other than gzip.

        I've found that compression doesn't slow things down on photos, video or other compressed files. If the data is compressible, It's always faster.

        • (Score: 2) by TheRaven on Monday February 22 2016, @03:07PM

          by TheRaven (270) on Monday February 22 2016, @03:07PM (#308183) Journal

          Every time I've lost power with ZFS, I've had to do a ZFS import to get my data again

          That sounds like you're doing something very badly wrong. ZFS was explicitly designed to handle abrupt outages. What kind of disks were you using? The only cases where I can think of that would cause this kind of issue are ones where the disk lies to the OS about whether data is actually committed to persistent storage. On a single disk, it should just be a matter of replaying the ZIL (very fast). On a RAID configuration, there will be a background scrub that will eventually fix errors where the disks powered down asymmetrically (these will also be fixed on first read if that happens before the scrub gets to a particular block).

          lz4 is in the OpenZFS spec and I think ZoL implements it. I though the original compression was something other than gzip.

          Originally it used lzjb (fast, not very good compression) or gzip (slow, good compression). lz4 is both fast and provides fairly good compression (not as good as gzip), but it also contains paths to bail early: if your data is very high entropy, then lz4 will not attempt to compress it. This means that you can turn it on for things that are already compressed and you won't burn too many cycles (which means compression won't slow down the write path - it won't slow down the read path noticeably for any of the algorithms unless you're using a fairly exotic NVRAM hardware).

          --
          sudo mod me up
        • (Score: 2) by rleigh on Tuesday February 23 2016, @10:04AM

          by rleigh (4887) on Tuesday February 23 2016, @10:04AM (#308603) Homepage

          If you need to re-import the pool, that might be because it's not being picked up automatically at boot (is this all the time, or just after an abrupt outage)?

          On FreeBSD, there's a cache of the pool state under /boot (IIRC) which you could regenerate if it's outdated.

          zpool set cachefile=/boot/zfs/zpool.cache your_pool

          Maybe read up on it before doing this; just a suggestion about what might be wrong here.

  • (Score: 0) by Anonymous Coward on Friday February 19 2016, @04:34PM

    by Anonymous Coward on Friday February 19 2016, @04:34PM (#306969)

    I really like the notes at the bottom, they are easy to ignore if I already know what the thing is and saves me the trouble of Googling it if I don't.