Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday February 19 2016, @03:23AM   Printer-friendly
from the someday-coming-to-a-phone-near-you? dept.

For those Linux folks out there, imagine merging LVM2, dm-raid, and your file system of choice into an all powerful, enterprise ready, check-summed, redundant, containerized, soft raid, disk pool, ram hungry, demi-god file system. The FreeBSD Handbook is a good start to grep the basic capabilities and function of ZFS[*].

The Ars reports:

A new long-term support (LTS) version of Ubuntu is coming out in April, and Canonical just announced a major addition that will please anyone interested in file storage. Ubuntu 16.04 will include the ZFS filesystem module by default, and the OpenZFS-based implementation will get official support from Canonical.
...
ZFS is used primarily in cases where data integrity is important—it's designed not just to store data but to continually check on that data to make sure it hasn't been corrupted. The oversimplified version is that the filesystem generates a checksum for each block of data. That checksum is then saved in the pointer for that block, and the pointer itself is also checksummed. This process continues all the way up the filesystem tree to the root node, and when any data on the disk is accessed, its checksum is calculated again and compared against the stored checksum to make sure that the data hasn't been corrupted or changed. If you have mirrored storage, the filesystem can seamlessly and invisibly overwrite the corrupted data with correct data.

ZFS was available as a technology preview in Ubuntu 15.10, but the install method was a bit more cumbersome than just apt-get install zfsutils-linux. I for one am excited to see ZFS coming to Linux as it is a phenomenal solution for building NAS devices and for making incremental backups of a file system. Now I just wish Ubuntu would do something about the systemD bug.

[*] According to Wikipedia:

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. The features of ZFS include protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z and native NFSv4 ACLs.

ZFS was originally implemented as open-source software, licensed under the Common Development and Distribution License (CDDL). The ZFS name is registered as a trademark of Oracle Corporation.

OpenZFS is an umbrella project aimed at bringing together individuals and companies that use the ZFS file system and work on its improvements.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by darkfeline on Monday February 22 2016, @12:46AM

    by darkfeline (1030) on Monday February 22 2016, @12:46AM (#307934) Homepage

    Filesystem block management involves allocating, freeing and the tracking thereof of storage blocks on a hardware device.

    What you're looking for, "block"-level dedup, is: "If two files between backups are 90% similar, there shouldn't be two duplicate copies of that 90% data". attic and bup deduplicate this data by indexing/hashing pieces of files. This is not filesystem block management; attic and bup do not manage allocation and freeing of storage space on hardware devices. However, it achieves the goal of what you refer to as "block-level dedup"; that is, two files sharing the same piece of data will only have that piece stored once, across all backups.

    So yeah, you don't understand filesystems.

    --
    Join the SDF Public Access UNIX System today!
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by TheRaven on Monday February 22 2016, @02:55PM

    by TheRaven (270) on Monday February 22 2016, @02:55PM (#308174) Journal

    What you're looking for, "block"-level dedup, is: "If two files between backups are 90% similar, there shouldn't be two duplicate copies of that 90% data"

    That is absolutely not what block-level dedup means. Block-level deduplication means building a hash table of all blocks in the system and replacing ones that match existing blocks with a reference to them with the old blocks, irrespective of which file the original blocks came from. This requires storing the backups using something like an inode structure and either creating new blocks of data and references to them. If your backup software is doing this, then it's doing close to what a filesystem needs to do (though it may be able to simplify some things because it likely doesn't have to deal with any kind of concurrency). If it's not doing this, then it's not doing block-level deduplication.

    It sounds like you might need to read a bit more about filesystems before you continue this discussion.

    --
    sudo mod me up