Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Tuesday September 11 2018, @03:29PM   Printer-friendly
from the zed-eff-ess-or-zee-eff-ess dept.

John Paul Wohlscheid over at It's FOSS takes a look at the ZFS file system and its capabilities. He mainly covers OpenZFS which is the fork made since Oracle bought and shut down Solaris which was the original host of ZFS. It features pooled storage with RAID-like capabilities, copy-on-write with snapshots, data integrity verification and automatic repair, and it can handle files up to 16 exabytes in size, with file systems of up to 256 quadrillion zettabytes in size should you have enough electricity to pull that off. Because it started development under a deliberately incompatible license, ZFS cannot be directly integrated in Linux. However, several distros work around that and provide packages for it. It has been ported to FreeBSD since 2008.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Interesting) by bzipitidoo on Tuesday September 11 2018, @04:29PM (13 children)

    by bzipitidoo (4388) on Tuesday September 11 2018, @04:29PM (#733189) Journal

    I used ReiserFS (version 3) for several years, and was looking at ReiserFS 4. We all know what happened to those.

    Tried XFS in the datacenter, and ran into an embarrassing problem with it being incredibly slow at deleting a large directory tree. Took 5 minutes (!) to delete a Linux kernel source code directory. Investigating, it turned out that the default parameters for XFS were the absolute worst possible for that particular hardware, with block sizes mismatched in the worst way. I moved everything off and reformatted those partitions with XFS with much better parameters, and that reduced the time for that operation to a still slow but much more bearable 10 seconds or so.

    When btrfs came along, I gave it a go. It too had a use case scenario at which it is extremely poor: the sync command. Firefox likes to sync, often. So Firefox ran very slow on a btrfs partition. I hear btrfs has greatly improved the performance of sync, but it is still a bit slow. Another big criticism of btrfs was the lack of repair tools such as fsck.

    ext2 of course takes a long time to do a fsck because it doesn't use journaling. ext2 is still useful for database storage in which the journaling merely gets in the way of their own management methods, so it's better to use ext2 than ext3 or 4 for that.

    For all else, I find ext4 really is the file system with the least nasty surprises, least likely to have some use case at which it is really, really slow. Sure, it's going to have a rough time dealing with the gigantic directory containing over 64k files, has to be configured differently than the default configuration to support that at all, but that's a much less common use case than deleting a large directory tree. Resizing the file system is another use that I find is rarely needed, so whenever some file system proponents brag about how quickly their favorite file system can do that, I just shrug. Guess I'm getting old and conservative and cranky when it comes to file systems.

    One thing I do miss about ye olde FAT is the ease of undeleting a file if you accidentally delete it. It's not guaranteed to work, but if you haven't made many changes to the file system since the deletion, odds are it will work. Whereas, after an "rm", it's usually still possible to recover the data but it takes a lot more effort. Like, oh, you have to unmount the partition then scan through the whole freaking thing with grep, looking for whatever unique data the file had that you can remember accurately. And that's if you have not encrypted the partition. As for the trash can, I really do not much like it. "rm" bypasses it, so it's no help there. The biggest reason to delete stuff is that I'm running short of space, and the trash can just gets in the way of that. You think you deleted something, but the amount of free disk space didn't budge. Having to tell the system to empty the trash is an extra step I'd rather not have to do. Anyway, I find backups a much better way to guard against accidental file deletion.

    I have not tried ZFS, in large part because it wasn't free.

    Starting Score:    1  point
    Moderation   +3  
       Interesting=3, Total=3
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 4, Insightful) by pendorbound on Tuesday September 11 2018, @04:44PM (7 children)

    by pendorbound (2688) on Tuesday September 11 2018, @04:44PM (#733207) Homepage

    ZFS is "Free" as in speech. It's just that its CDDL license is not compatible with GPL. You could make an argument that GPL is less free than CDDL since it's a restriction in the GPL which makes it impossible to use with CDDL, not the other way around. CDDL code is freely used in BSD-licensed software without any legal issues.

    The fact that Sun staff are on-record saying that CDDL was written based on MPL explicitly because it made it GPL-incompatible and limited ZFS' usage with Linux kind of muddies the situation. Still, the CDDL license is free as in speech according to both the FSF and OSI. https://en.wikipedia.org/wiki/Common_Development_and_Distribution_License [wikipedia.org]

    I went through a similar filesystem evolution as you did and landed on ZFS. I've been using it in home-production since 2008. Its built-in snapshot and checksumming support have saved my bacon several times in the face of both hardware and operator errors. I'd highly recommend giving it a look. The one caveat is look at the minimum hardware requirements, balk at how insane they are for a filesystem, and then make sure you comply with them anyways. ZFS does not operate well in RAM constrained environments.

    • (Score: 3, Informative) by VLM on Tuesday September 11 2018, @07:00PM (5 children)

      by VLM (445) on Tuesday September 11 2018, @07:00PM (#733249)

      The one caveat is look at the minimum hardware requirements, balk at how insane they are for a filesystem, and then make sure you comply with them anyways. ZFS does not operate well in RAM constrained environments.

      The whole 1GB ram per 1TB of storage thing? Not really needed. Dedupe is memory hungry. If you're trying to do an office NAS for many (hundreds?) of users simultaneously, you'll want more than 1G per 1T for caching reasons.

      You can't hurt anything by having more ram, thats for sure.

      I'm typing this on a freebsd box with 32 GB of ram for fast compiles and only 250GB x2 of storage so its very nice, but not necessary.

      • (Score: 2) by Entropy on Tuesday September 11 2018, @07:44PM (4 children)

        by Entropy (4228) on Tuesday September 11 2018, @07:44PM (#733273)

        No. That's for dedup. (Deduplication) which is by no means necessary. I think ZFS doesn't work well if you have 512M of RAM or something, but over a gig should be fine.

        • (Score: 2) by pendorbound on Tuesday September 11 2018, @07:58PM

          by pendorbound (2688) on Tuesday September 11 2018, @07:58PM (#733281) Homepage

          It's life & death critical for dedupe(*), but you're still going to want much more RAM than normal for ZFS. ZFS' ARC doesn't integrate (exactly) with Linux's normal file system caching. I've seen significant performance increases for fileserver and light database workloads by dedicating large chunks of RAM (16GB out of 96GB on the box) exclusively for ARC. It'll *work* without that, but ZFS is noticeably slower than other filesystems if it doesn't have enough ARC space available. Particularly with partial-block updates, having the rest of the block in ARC means ZFS doesn't have to go to disk to calculate the block checksum before writing out the new copy-on-write block. Running with insufficient ARC causes ZFS to frequently have to read an entire block in from disk before it can write an updated copy out, even if it was only changing one byte.

          (*) Source: Once tried to enable dedupe on a pool with nowhere near enough RAM. Took over 96 hours to import the pool after a system crash as it rescanned the entire device for duplicate block counts before it was happy the pool was clean. Had to zfs send/receive to a new pool to flush out the dedupe setting and get a usable system.

        • (Score: 0) by Anonymous Coward on Wednesday September 12 2018, @10:18AM (1 child)

          by Anonymous Coward on Wednesday September 12 2018, @10:18AM (#733536)

          but over a gig should be fine.

          My home network media backup ZFS box had 2GB ram, I ran into occasional stability issues (unexplained random reboots) it now has 4GB and runs solid 24/7, i'd say throw as much RAM at it as your motherboard can handle.
          Once I can source a cheap secondhand multi cpu server with over 32GB ram then I'll move the current disk pool over to it and consider firing up the (probably now badly needed) dedupe facility, the SD cards from the various family digital cameras and phones get regularly backed up to the server, and the networked home directories live there as well, so no doubt there are multiple copies of the same images and music files lurking on it.

          • (Score: 0) by Anonymous Coward on Wednesday September 12 2018, @10:42AM

            by Anonymous Coward on Wednesday September 12 2018, @10:42AM (#733543)

            My home network media backup ZFS box had 2GB ram, I ran into occasional stability issues (unexplained random reboots)

            I should have added in there, that this was during the testing phase, I ran the thing for a month and seriously hammered it, upped the beastie to 4GB, hammered it again for another couple of weeks before finally going live with it.

        • (Score: 2) by VLM on Wednesday September 12 2018, @11:09AM

          by VLM (445) on Wednesday September 12 2018, @11:09AM (#733554)

          (Deduplication) which is by no means necessary.

          Dedupe is almost never necessary. Under really weird conditions if you're running over 1000 almost identical virtual compute nodes (maybe a webhosting farm using virtualization?) then you can save some cash on storage. But under normal conditions you're basically trading high speed ram which is money and heat and energy intensive for slightly lower bulk storage which is cheap and getting cheaper; generally not a win.

          A good analogy is dedupe is kinda like the old windows "autoexec on media insertion" which sounds nifty but turns out to be not so great overall.

    • (Score: 2) by hendrikboom on Tuesday September 11 2018, @09:33PM

      by hendrikboom (1125) Subscriber Badge on Tuesday September 11 2018, @09:33PM (#733331) Homepage Journal

      I found the following:

      https://www.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrngar/ [reddit.com]

      Some well meaning people years ago thought that they could be helpful by making a rule of thumb for the amount of RAM needed for good write performance with data deduplication. While it worked for them, it was wrong. Some people then started thinking that it applied to ZFS in general. ZFS' ARC being reported as used memory rather than cached memory reinforced the idea that ZFS needed plenty of memory when in fact it was just used in an evict-able cache. The OpenZFS developers have been playing whack a mole with that advice ever since.

      I am what I will call a second generation ZFS developer because I was never at Sun and I postdate the death of OpenSolaris. The first generation crowd could probably fill you in on more details than I could with my take on how it started. You will not find any of the OpenZFS developers spreading the idea that ZFS needs an inordinate amount of RAM though. I am certain of that.

      And also https://www.reddit.com/r/DataHoarder/comments/5u3385/linus_tech_tips_unboxes_1_pb_of_seagate/ddrh5iv/ [reddit.com]

      A system with 1 GB of RAM would not have much trouble with a pool that contains 1 exabyte of storage, much less a petabyte or a terabyte. The data is stored on disk, not in RAM with the exception of cache. That just keeps an extra copy around and is evicted as needed.

      The only time when more RAM might be needed is when you are turn on data deduplication. That causes 3 disk seeks for each DDT miss when writing to disk and tends to slow things down unless there is enough cache for the DDT to avoid extra disk seeks. The system will still work without more RAM. It is just that the deduplication code will slow down writes when enabled. That 1GB of RAM per 1TB data stored "rule" is nonsense though. The number is a function of multiple variables, not a constant.

      So I now wonder what the *real* limits are on home-scale systems. In particular, suppose I have only a few terabytes. And a machine with only a half gigabyte of RAM. And used for nothing more bandwidth-intensive than streaming (compressed) video over a network to a laptop.

      What I like about ZFS is its extreme resistance to data corruption. That's essential for long-term storage. My alternative seems to be btrfs. Currently I'm using ext4 on software-mirrored RAID, which isn't great at detecting data corruption.

      -- hendrik

  • (Score: 1) by pTamok on Tuesday September 11 2018, @08:45PM (2 children)

    by pTamok (3042) on Tuesday September 11 2018, @08:45PM (#733298)

    One thing I do miss about ye olde FAT is the ease of undeleting a file if you accidentally delete it.

    This is why I use NILFS2 [wikipedia.org].

    Find checkpoint timestamped before the file's deletion, convert checkpoint to read-only snapshot*, mount snapshot, copy file back, unmount snapshot, convert snapshot back to checkpoint, done.

    *This stops it from being deleted by the garbage collector. You don't have to do this, but you run the risk of the checkpoint being tidied away when the garbage collector runs to free up disk space. NILFS2 treats the disk as a circular buffer of copy-on-write blocks, so it is continuously chasing its tail.

    • (Score: 3, Informative) by linuxrocks123 on Wednesday September 12 2018, @03:31AM (1 child)

      by linuxrocks123 (2557) on Wednesday September 12 2018, @03:31AM (#733467) Journal

      I used NILFS2 for awhile, then one day it oopsed() the kernel and made the filesystem read-only when reading from part of the directory tree in my Pale Moon user profile directory.

      Then after rebooting I discovered it would now persistently oops()/read-only the filesystem when reading from that location. I had to tar everything into a backup file on another machine and reformat the drive back to ext4 to restore normal operation to the machine.

      I did like NILFS2's features for the year or so I used it, but, well, that's it for that.

      • (Score: 1) by pTamok on Wednesday September 12 2018, @07:29AM

        by pTamok (3042) on Wednesday September 12 2018, @07:29AM (#733517)

        I'm sorry to hear that.

        I had one issue with NILFS, probably caused by my ignorant habit of using the power button on my laptop to get out of unresponsive blank screens caused by the display code. I too had to to a backup/restore to resolve that particular issue - but now that I've discovered REISUB [wikipedia.org] (or REISUO), I generally manage semi-graceful shutdowns (or at least ones where emergency sync has been done).

        I do have backups, and I've had no further problems with NILFS on my hardware. It's performance on my SSD is adequate for my purposes, and having the continuous checkpoint/snapshot capability is quite nice. I can understand you have a different use case/priorities. It would be nice if there was a fsck program - it's on the NILFS2 'todo list' [sourceforge.io], but development on NILFS2 is slow - probably because not a lot of people need it, using ext4's journalling, or BTRFS or ZFS instead. It's probably worth reading the 'Current Status' document that that todo list is part of so you can come to a decision as to whether you would use it or not.

  • (Score: 0) by Anonymous Coward on Tuesday September 11 2018, @11:09PM

    by Anonymous Coward on Tuesday September 11 2018, @11:09PM (#733378)

    XFS version 5 was released a few years ago; it requires a new mkfs.xfs and you should use -m crc=1,finobt=1; the deletion problem you've mentioned is not really a problem at this point, and the default parameters for XFS have improved massively as well (the XFS people like to say "all the turbo switches are on by default.") The mkfs.xfs voodoo has been largely eliminated. su/sw for RAID is the only other hard bit you may want to use for better speed.

  • (Score: 0) by Anonymous Coward on Tuesday September 11 2018, @11:23PM

    by Anonymous Coward on Tuesday September 11 2018, @11:23PM (#733383)

    My evolutionary history of filesystem usage is remarkably similiar to yours. Arrived at ZFS around 2010/2011. Been using OmniOS as a storage backend for a couple years while my servers were ESXI-based. Now that I'm switching to Proxmox (OmniOS won't boot on KVM) I need to work around ZFS shortcomings on FreeBSD/FreeNAS, but it's still a winning team.

    The CIFS "time machine" functionality based on snapshots alone is worth it. AFAIK Linux can't do that yet, wake me when they're there :)

    For the time being, my Proxmox-based home server actually boots from a ZFSonLinux mirror, but my storage is on FreeNAS still. Moving company servers to Proxmox soon, even though FreeNAS' shortcomings with regards to snapshot staggering are putting me off.