Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday February 19 2016, @03:23AM   Printer-friendly
from the someday-coming-to-a-phone-near-you? dept.

For those Linux folks out there, imagine merging LVM2, dm-raid, and your file system of choice into an all powerful, enterprise ready, check-summed, redundant, containerized, soft raid, disk pool, ram hungry, demi-god file system. The FreeBSD Handbook is a good start to grep the basic capabilities and function of ZFS[*].

The Ars reports:

A new long-term support (LTS) version of Ubuntu is coming out in April, and Canonical just announced a major addition that will please anyone interested in file storage. Ubuntu 16.04 will include the ZFS filesystem module by default, and the OpenZFS-based implementation will get official support from Canonical.
...
ZFS is used primarily in cases where data integrity is important—it's designed not just to store data but to continually check on that data to make sure it hasn't been corrupted. The oversimplified version is that the filesystem generates a checksum for each block of data. That checksum is then saved in the pointer for that block, and the pointer itself is also checksummed. This process continues all the way up the filesystem tree to the root node, and when any data on the disk is accessed, its checksum is calculated again and compared against the stored checksum to make sure that the data hasn't been corrupted or changed. If you have mirrored storage, the filesystem can seamlessly and invisibly overwrite the corrupted data with correct data.

ZFS was available as a technology preview in Ubuntu 15.10, but the install method was a bit more cumbersome than just apt-get install zfsutils-linux. I for one am excited to see ZFS coming to Linux as it is a phenomenal solution for building NAS devices and for making incremental backups of a file system. Now I just wish Ubuntu would do something about the systemD bug.

[*] According to Wikipedia:

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

ZFS was originally implemented as open-source software, licensed under the Common Development and Distribution License (CDDL). The ZFS name is registered as a trademark of Oracle Corporation.

OpenZFS is an umbrella project aimed at bringing together individuals and companies that use the ZFS file system and work on its improvements.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by tbuskey on Friday February 19 2016, @02:27PM

    by tbuskey (6127) on Friday February 19 2016, @02:27PM (#306902)

    I've been using ZFS on Solaris, OpenSolaris and later http://zfsonlinux.org/ [zfsonlinux.org] with Ubuntu and now with CentOS.

    The GPL on the Linux kernel and the CDDL on the ZFS code preclude binary distribution of the ZFS code. So on Linux, installation is a source code download, compile into a module with the kernel source and install. Both the Ubuntu and CentOS installs are simple. For CentOS, it's yum install or yum update. I have run into some issues doing it too soon with a kernel update and reboot, but overall it just works.

    With Linux, I only do ZFS on my data disks. I can reinstall the OS on ext/xfs. If I did ZFS on the OS disks and there was an issue, I wouldn't be able to boot.

    Dedupe in ZFS isn't worth it 99% of the time IMO. I was using it for a 2TB mirror on a 3 GB Opensolaris system. A power outage corrupted & locked up ZFS. I needed to put the disks into another system with more RAM in order to reimport the data. The 3GB system would crash with an out of RAM issue. It took 2-3 days of running as well. Solaris has better ram management than Linux. Dedupe only saved about 10% in my case. Not worth it.

    ZFS doesn't like power outages. Put a UPS on w/ a clean shutdown before the battery runs down.

    ZFS must have redundancy that it controls. When a checksum on the data is wrong, it will copy the data from the other copy(ies) and fix it. If it can't fix it, it will lock the file system so you don't lose data.

    Hardware RAID gets in the way. Use JBOD and ZFS will heal.

    Compression is worth it. It can speed up operations.

    ZFS will fail more drives on you because it detects errors. SMART errors are not always real. ZFS checks everything and detects real problems. Trust it.

    ZFS recovery is decent. Sometimes a drive is marked bad. I can clear it & resilver so the data gets put on other sectors. That is often enough to get the drive running for awhile longer. I only do that at home, not work. Sometimes a reboot resets things too.

    NFS/Samba/autofs startup on boot & ZFS. I find it happens too soon. I havent bothered to fix it other than doing it manually after I know ZFS is up properly.

    I've been very happy with ZFS and with it on Linux.

  • (Score: 2) by TheRaven on Friday February 19 2016, @03:13PM

    by TheRaven (270) on Friday February 19 2016, @03:13PM (#306936) Journal

    Dedupe in ZFS isn't worth it 99% of the time IMO

    This is very workload dependent. I replaced 3 2TB disks with 3 4TB disks (RAID-Z) in my home NAS, because they were 90% full (which ZFS likes even less than most UNIX filesystems). I wanted to do a clean reinstall anyway, so I transferred the data with zfs send / zfs receive and sent a load of non-deduplicated filesystems to deduplicated filesystems. The result was saving about 25% of the total space consumption. For some filesystems it was over 50%.

    ZFS doesn't like power outages

    Why do you say this? The ZFS intent log allows data recovery in case of a power outage and RAID-Z doesn't suffer from the RAID-5 write hole. I've had the power go out a few times and each time zpool status -v shows that the scrub has done a little bit of cleanup, but I've not lost any data, other than things that were not yet committed to disk.

    Compression is worth it. It can speed up operations.

    FreeBSD now defaults to lz4, I think OpenSolaris does too, not sure about Linux. This is important, because gzip compression can save a bit more disk space but is usually slower. A vaguely modern CPU can decompress lz4 faster than anything other than a high-end PCIe flash device can provide it, so it will improve your disk bandwidth. If you're using compression, it's quite important to use the same compression algorithm everywhere, as the blocks are deduplicated after compression.

    --
    sudo mod me up
    • (Score: 1) by tbuskey on Sunday February 21 2016, @11:59PM

      by tbuskey (6127) on Sunday February 21 2016, @11:59PM (#307923)
      Dedupe in ZFS isn't worth it 99% of the time IMO

      This is very workload dependent. I replaced 3 2TB disks with 3 4TB disks (RAID-Z) in my home NAS, because they were 90% full (which ZFS likes even less than most UNIX filesystems). I wanted to do a clean reinstall anyway, so I transferred the data with zfs send / zfs receive and sent a load of non-deduplicated filesystems to deduplicated filesystems. The result was saving about 25% of the total space consumption. For some filesystems it was over 50%.

      My number is pulled out of my hat of course. I had a power failure take down my server. I couldn't recover my 1 TB deduped zvol on my 3 GB system. Even running from the OpenSolaris DVD. I had to put the disks into my 8 GB system and it took over 24 hours to import it. I don't think that's ok even at home, so I don't do dedupe.

      ZFS doesn't like power outages

      Why do you say this? The ZFS intent log allows data recovery in case of a power outage and RAID-Z doesn't suffer from the RAID-5 write hole. I've had the power go out a few times and each time zpool status -v shows that the scrub has done a little bit of cleanup, but I've not lost any data, other than things that were not yet committed to disk.

      Every time I've lost power with ZFS, I've had to do a ZFS import to get my data again. I had my data, uncorrupted, but I had to do something special. So I use a UPS that does a clean shutdown like I should've been doing. So, yes, you don't lose data, but it doesn't come back up w/o a simple command. I've had it take an hour. I imagine other filesystems might have silent corruption.

      Compression is worth it. It can speed up operations.

      FreeBSD now defaults to lz4, I think OpenSolaris does too, not sure about Linux. This is important, because gzip compression can save a bit more disk space but is usually slower. A vaguely modern CPU can decompress lz4 faster than anything other than a high-end PCIe flash device can provide it, so it will improve your disk bandwidth. If you're using compression, it's quite important to use the same compression algorithm everywhere, as the blocks are deduplicated after compression.

      lz4 is in the OpenZFS spec and I think ZoL implements it. I though the original compression was something other than gzip.

      I've found that compression doesn't slow things down on photos, video or other compressed files. If the data is compressible, It's always faster.

      • (Score: 2) by TheRaven on Monday February 22 2016, @03:07PM

        by TheRaven (270) on Monday February 22 2016, @03:07PM (#308183) Journal

        Every time I've lost power with ZFS, I've had to do a ZFS import to get my data again

        That sounds like you're doing something very badly wrong. ZFS was explicitly designed to handle abrupt outages. What kind of disks were you using? The only cases where I can think of that would cause this kind of issue are ones where the disk lies to the OS about whether data is actually committed to persistent storage. On a single disk, it should just be a matter of replaying the ZIL (very fast). On a RAID configuration, there will be a background scrub that will eventually fix errors where the disks powered down asymmetrically (these will also be fixed on first read if that happens before the scrub gets to a particular block).

        lz4 is in the OpenZFS spec and I think ZoL implements it. I though the original compression was something other than gzip.

        Originally it used lzjb (fast, not very good compression) or gzip (slow, good compression). lz4 is both fast and provides fairly good compression (not as good as gzip), but it also contains paths to bail early: if your data is very high entropy, then lz4 will not attempt to compress it. This means that you can turn it on for things that are already compressed and you won't burn too many cycles (which means compression won't slow down the write path - it won't slow down the read path noticeably for any of the algorithms unless you're using a fairly exotic NVRAM hardware).

        --
        sudo mod me up
      • (Score: 2) by rleigh on Tuesday February 23 2016, @10:04AM

        by rleigh (4887) on Tuesday February 23 2016, @10:04AM (#308603) Homepage

        If you need to re-import the pool, that might be because it's not being picked up automatically at boot (is this all the time, or just after an abrupt outage)?

        On FreeBSD, there's a cache of the pool state under /boot (IIRC) which you could regenerate if it's outdated.

        zpool set cachefile=/boot/zfs/zpool.cache your_pool

        Maybe read up on it before doing this; just a suggestion about what might be wrong here.