Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday February 19 2016, @03:23AM   Printer-friendly
from the someday-coming-to-a-phone-near-you? dept.

For those Linux folks out there, imagine merging LVM2, dm-raid, and your file system of choice into an all powerful, enterprise ready, check-summed, redundant, containerized, soft raid, disk pool, ram hungry, demi-god file system. The FreeBSD Handbook is a good start to grep the basic capabilities and function of ZFS[*].

The Ars reports:

A new long-term support (LTS) version of Ubuntu is coming out in April, and Canonical just announced a major addition that will please anyone interested in file storage. Ubuntu 16.04 will include the ZFS filesystem module by default, and the OpenZFS-based implementation will get official support from Canonical.
...
ZFS is used primarily in cases where data integrity is important—it's designed not just to store data but to continually check on that data to make sure it hasn't been corrupted. The oversimplified version is that the filesystem generates a checksum for each block of data. That checksum is then saved in the pointer for that block, and the pointer itself is also checksummed. This process continues all the way up the filesystem tree to the root node, and when any data on the disk is accessed, its checksum is calculated again and compared against the stored checksum to make sure that the data hasn't been corrupted or changed. If you have mirrored storage, the filesystem can seamlessly and invisibly overwrite the corrupted data with correct data.

ZFS was available as a technology preview in Ubuntu 15.10, but the install method was a bit more cumbersome than just apt-get install zfsutils-linux. I for one am excited to see ZFS coming to Linux as it is a phenomenal solution for building NAS devices and for making incremental backups of a file system. Now I just wish Ubuntu would do something about the systemD bug.

[*] According to Wikipedia:

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

ZFS was originally implemented as open-source software, licensed under the Common Development and Distribution License (CDDL). The ZFS name is registered as a trademark of Oracle Corporation.

OpenZFS is an umbrella project aimed at bringing together individuals and companies that use the ZFS file system and work on its improvements.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by TheRaven on Friday February 19 2016, @03:06PM

    by TheRaven (270) on Friday February 19 2016, @03:06PM (#306931) Journal

    Dedup is really useful if you're doing a lot of backups, because most backup utilities will copy the entirety of any file that's changed. With ZFS, if you modify a byte in a 1GB file (e.g. a VM image), then the next backup will contain a copy of the 1GB file, but will only take one block (typically 128KB) of space. This is, unfortunately, broken with the Windows backup utility, which stores backups as zip files.

    Without dedup, you typically want 1GB of RAM per TB of hard disk. With dedup, you typically want 2GB. This isn't particularly expensive. Most small installations will be under 10TB, and 16GB of RAM is a lot cheaper than the disks (the real annoyance is that a lot of mini-ITX boards can't handle more than 8GB). If you want to cut costs a bit, L2ARC and log device on SSD are a huge performance win. Our dev machines are 24-core rack-mounted systems with 256GB+ of RAM, 512GB of flash and mirrored 3TB spinning rust. The flash is mostly L2ARC, and a couple of GBs acting as a log device. Each developer typically has a clone of the same few git repos in their home directory, so dedup is a big win there (also good for performance, as you only need one copy in the L2ARC and one in the ARC for everyone to share).

    I don't really know why you'd choose to use Ubuntu though. ZFS is fairly mature now, but the integration between ZFS and the rest of the VFS is fairly subtle and it took quite a while for it to stabilise in FreeBSD. I doubt Linux is upstreaming many changes that simplify life for a filesystem that can't be distributed with the kernel, so I would still be quite hesitant to trust data on Linux with ZFS for a while.

    --
    sudo mod me up
    Starting Score:    1  point
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by rleigh on Friday February 19 2016, @07:51PM

    by rleigh (4887) on Friday February 19 2016, @07:51PM (#307048) Homepage

    I started out with ZFS on Linux, before moving the discs over into a new FreeBSD server and importing the pool. Easiest data migration I've ever done, particularly between operating systems. Like you I don't bother with dedup.

    While ZFS on Linux is certainly less mature than on FreeBSD, and lacks the nice integration you get there, my experience of it was that it's pretty solid. (I originally got into ZFS from work colleagues who had some pretty serious ZFS pools, all on Linux. They all swear by it and have been using it for several years.)

  • (Score: 2) by darkfeline on Saturday February 20 2016, @01:41AM

    by darkfeline (1030) on Saturday February 20 2016, @01:41AM (#307205) Homepage

    A lot of modern backup utilities do dedup now.

    attic, bup, even rsync offers limited dedup through hard links.

    I think you can see some of that MIT vs New Jersey conflict here. You can either have a simple filesystem like ext4 that lets other tools handle dedup, backup, snapshots ("worse is better") or you can try to put all of that into the filesystem ("the right thing").

    Personally, I'm a fan of the former. When something goes wrong with, e.g., my backups, I like knowing that it's not the filesystem that fucked up. I can debug a backup tool, but debugging a filesystem is a little beyond me, and debugging a filesystem with dedup and snapshots built in is definitely beyond me. Best to keep each abstraction level simple, even if it means we need to add a few more abstraction levels on top.

    --
    Join the SDF Public Access UNIX System today!
    • (Score: 2) by TheRaven on Saturday February 20 2016, @01:04PM

      by TheRaven (270) on Saturday February 20 2016, @01:04PM (#307362) Journal

      attic, bup, even rsync offers limited dedup through hard links.

      This gives you file-level dedup, not block-level dedup. As I said, if you modify a 1GB file, then you now have a new file, even if you've only changed one byte. File-level dedup won't help you, you'll still have a new 1GB copy for each backup. In contrast, with block-level dedup (which can only be done in a filesystem, or in a file that is effectively a filesystem), you will only use disk space for the block that has changed. With ZFS, that's typically 128KB. If you're backing up VM images, or other large files that frequently change by a small amount, then that's a huge win.

      When something goes wrong with, e.g., my backups, I like knowing that it's not the filesystem that fucked up.

      Do you? Does your filesystem include block-level checksums? Does your filesystem support storing redundant copies so that it can recover from single-block errors on the disk? Does your filesystem, in fact, provide any guarantees about data integrity? If there's a single-block error in the middle of a file in a complex format that your backup tool is using to implement block-level deduplication, how do you expect to recover?

      I can debug a backup tool

      Can you? Why do you think that debugging a backup tool that is implementing half of a filesystem itself is going to be simpler than debugging a filesystem?

      --
      sudo mod me up
      • (Score: 2) by darkfeline on Sunday February 21 2016, @01:55AM

        by darkfeline (1030) on Sunday February 21 2016, @01:55AM (#307628) Homepage

        No, attic and bup both do "block"-level dedup, for configurable block sizes, of course.

        >checksums, etc
        Look, there will always be classes of hardware failure that cannot be recovered from using integrity checking, etc. You will always need backups. I'd rather those backups not be complicated by other factors. Like that one comment discussing ECC RAM, I'd rather not have a well-meaning error correction routine actually destroying my data. Simple is best.

        >debugging
        Yes. Yes I can. A backup tool is not implementing half a filesystem, for the sole reason that it is running on a complete filesystem. Show me a backup tool that reimplements, e.g., block management, and I'll stop using that tool posthaste.

        Have you written a filesystem before? I have. There's a lot of shit going on, and abstracting that away is the purpose of a filesystem. Snapshots, checksums, etc. are NOT part of that purpose.

        --
        Join the SDF Public Access UNIX System today!
        • (Score: 2) by TheRaven on Sunday February 21 2016, @06:56PM

          by TheRaven (270) on Sunday February 21 2016, @06:56PM (#307817) Journal
          First you say:

          No, attic and bup both do "block"-level dedup, for configurable block sizes, of course.

          And then you say

          Show me a backup tool that reimplements, e.g., block management

          So it sounds as if you're answering your own questions before you even ask them.

          --
          sudo mod me up
          • (Score: 2) by darkfeline on Monday February 22 2016, @12:46AM

            by darkfeline (1030) on Monday February 22 2016, @12:46AM (#307934) Homepage

            Filesystem block management involves allocating, freeing and the tracking thereof of storage blocks on a hardware device.

            What you're looking for, "block"-level dedup, is: "If two files between backups are 90% similar, there shouldn't be two duplicate copies of that 90% data". attic and bup deduplicate this data by indexing/hashing pieces of files. This is not filesystem block management; attic and bup do not manage allocation and freeing of storage space on hardware devices. However, it achieves the goal of what you refer to as "block-level dedup"; that is, two files sharing the same piece of data will only have that piece stored once, across all backups.

            So yeah, you don't understand filesystems.

            --
            Join the SDF Public Access UNIX System today!
            • (Score: 2) by TheRaven on Monday February 22 2016, @02:55PM

              by TheRaven (270) on Monday February 22 2016, @02:55PM (#308174) Journal

              What you're looking for, "block"-level dedup, is: "If two files between backups are 90% similar, there shouldn't be two duplicate copies of that 90% data"

              That is absolutely not what block-level dedup means. Block-level deduplication means building a hash table of all blocks in the system and replacing ones that match existing blocks with a reference to them with the old blocks, irrespective of which file the original blocks came from. This requires storing the backups using something like an inode structure and either creating new blocks of data and references to them. If your backup software is doing this, then it's doing close to what a filesystem needs to do (though it may be able to simplify some things because it likely doesn't have to deal with any kind of concurrency). If it's not doing this, then it's not doing block-level deduplication.

              It sounds like you might need to read a bit more about filesystems before you continue this discussion.

              --
              sudo mod me up