Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Monday August 28 2017, @10:08AM   Printer-friendly
from the a-bitter-fs dept.

Submitted via IRC for TheMightyBuzzard

SUSE has decided to let the world know it has no plans to step away from the btrfs filesystem, and plans to make it even better.

The company's public display of affection comes after Red Hat decided not to fully support the filesystem in its own Linux.

Losing a place in one of the big three Linux distros isn't a good look for any package even if, as was the case with this decision, Red Hat was never a big contributor or fan of btrfs.

[Matthias G. Eckermann] also hinted at some future directions for the filesystem. "We just start to see the opportunities from subvolume quotas when managing Quality of Service on the storage level" he writes, adding "Compression (already there) combined with Encryption (future) makes btrfs an interesting choice for embedded systems and IoT, as may the full use of send-receive for managing system patches and updates to (Linux based) 'firmware'." ®

Mmmmmm... butter-fs

Source: https://www.theregister.co.uk/2017/08/25/suse_btrfs_defence/


Original Submission

Related Stories

SUSE Linux Sold for $2.535 Billion 34 comments

SUSE Linux Sold for $2.5 Billion

British software company Micro Focus International has agreed to sell SUSE Linux and its associated software business to Swedish private equity group EQT Partners for $2.535 billion.

Also at The Register, Linux Journal, MarketWatch, and Reuters.

Previously: SuSE Linux has a New Owner
HPE Wraps Up $8.8bn Micro Focus Software Dump Spin-Off

Related: SUSE Pledges Endless Love for btrfs; Says Red Hat's Dumping Irrelevant


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Monday August 28 2017, @10:11AM (3 children)

    by Anonymous Coward on Monday August 28 2017, @10:11AM (#560133)

    Mmmmmm... butter-fs

    Am I the only one who normally reads it as "bit rot-fs"?

    • (Score: 2) by c0lo on Monday August 28 2017, @10:34AM (2 children)

      by c0lo (156) Subscriber Badge on Monday August 28 2017, @10:34AM (#560145) Journal

      I'm waiting for bacon-fs to throw my support behind it.
      Granted, a "crispy" partition doesn't sound a good thing, tho.

      --
      https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
      • (Score: 2) by LoRdTAW on Monday August 28 2017, @02:25PM (1 child)

        by LoRdTAW (3755) on Monday August 28 2017, @02:25PM (#560246) Journal

        I prefer my bacon cooked soft so I'm good.

        • (Score: 3, Insightful) by c0lo on Monday August 28 2017, @02:42PM

          by c0lo (156) Subscriber Badge on Monday August 28 2017, @02:42PM (#560254) Journal

          I prefer my bacon cooked soft so I'm good.

          It's certainly not illegal, however immoral it may be

          (grin)

          --
          https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
  • (Score: 5, Informative) by KiloByte on Monday August 28 2017, @10:58AM (17 children)

    by KiloByte (375) on Monday August 28 2017, @10:58AM (#560151)

    While btrfs violates KISS to a massive degree, even just data checksums alone mean you shouldn't even consider using anything but btrfs or ZFS, unless you don't care for correctness of your data at all or have another means of in-line verification.

    It's like ECC memory, except that disks lie a lot more than memory does. HDDs lie. Controllers lie. Cables lie. SD cards massively lie. eMMC lies.

    In theory, disks are supposed to use complex erasure codes themselves that should make such corruption impossible, but that's theory not practice.

    Again and again I see someone come to IRC, say "btrfs is shit", which after troubleshooting ends with "oh, indeed my machine was overheating" or "memtest86 worked well for hours but after two days it started throwing errors". Btrfs detects errors that ext4 or xfs are oblivious to.

    And checksums are not the only data safety feature of btrfs. Yes, it's slower because on HDD it duplicates metadata by default. And never overwrites data in-place.

    Or, do you want hourly O(changes) backups on a filesystem that rsync needs half an hour just to stat?

    --
    Ceterum censeo systemd esse delendam.
    • (Score: 0) by Anonymous Coward on Monday August 28 2017, @12:29PM (11 children)

      by Anonymous Coward on Monday August 28 2017, @12:29PM (#560188)

      unless you don't care for correctness of your data at all or have another means of in-line verification.

      ... uh

      It's like ECC memory, except that disks lie a lot more than memory does.

      CRC and other checksums don't lie, they have hash collisions. Bigger hashes means less chance of collision. The biggest checksum is a copy of the data itself. That's why RAID exists. I prefer to use a hardware solution like RAID5 and a journaling FS like EXT3 / 4 than a chain of hash fingerprints that if corrupted themselves lie about the entire subsequent dataset...

      • (Score: 1, Interesting) by Anonymous Coward on Monday August 28 2017, @12:40PM

        by Anonymous Coward on Monday August 28 2017, @12:40PM (#560194)

        If the simple hash had a collision, would the raid be consulted? Don't think that usually happens.

      • (Score: 3, Interesting) by KiloByte on Monday August 28 2017, @01:00PM (8 children)

        by KiloByte (375) on Monday August 28 2017, @01:00PM (#560209)

        CRC and other checksums don't lie, they have hash collisions.

        Hash collisions are a concern against a malicious human, here we're defending against faulty cable/too hot motherboard/bogus RAM/shit disk firmware/random at-rest corruption/random in-transit corruption/evil faeries shooting magic dust pellets at your disk.

        Bigger hashes means less chance of collision.

        They are also much slower to compute and take more space to store. It's a tradeoff with an obvious choice here: an attacker who can write to the block device can trivially generate bogus data with a valid checksum, thus a bigger hash gives us almost nothing.

        The biggest checksum is a copy of the data itself. That's why RAID exists. I prefer to use a hardware solution like RAID5 and a journaling FS like EXT3 / 4 than a chain of hash fingerprints that if corrupted themselves lie about the entire subsequent dataset...

        RAID solves a different, related problem: it helps you recover data once you know it's bad. It doesn't help with noticing failure (well, you can have a cronjob that reads the entire disk, but by that time you had a week of using bad data) -- and even worse, without the disk notifying you of an error you don't even know which of the copies is the good one.

        By "lies" in the GP post I meant "gives you bad data while claiming it's good". A disk may or may not notice this problem -- if it doesn't, the RAID has no idea it needs to try the other copy.

        --
        Ceterum censeo systemd esse delendam.
        • (Score: 2, Interesting) by shrewdsheep on Monday August 28 2017, @01:34PM (7 children)

          by shrewdsheep (5215) on Monday August 28 2017, @01:34PM (#560222)

          Hash collisions are a concern against a malicious human, here we're defending against faulty cable/too hot motherboard/bogus RAM/shit disk firmware/random at-rest corruption/random in-transit corruption/evil faeries shooting magic dust pellets at your disk.

          CRC guarantees detection of bit-flips up to a certain number (depending on size of CRC). After that, there is a probability of collision which seems low enough as long as independent bit flips are assumed. In practice, bit-flips are not independent but highly correlated (whole block going bad, ...; same problem with ECC). My take is to be careful about CRC and complement it with different measures. I do software RAID1 (based on discussions I had here and elsewhere) and I never overwrite backup-drives (they get retired once full to serve an extra layer of safety).

          On the more philosophical side I am amazed how well computers do work. To allow error free operation, probability of errors on computation/storage have to be incredibly low (let us say per period of time or per operation). So low in fact that I cannot believe them to be such (let us say 10E-15 - 10E-18 per operation, be it compute or read/write). So there should be errors accumulating in our data but I could not say I have noticed.

          • (Score: 2) by LoRdTAW on Monday August 28 2017, @02:41PM (6 children)

            by LoRdTAW (3755) on Monday August 28 2017, @02:41PM (#560253) Journal

            On the more philosophical side I am amazed how well computers do work.

            Honestly, I sometimes consider these things magic. I know how they work down to the silicon. But I'm still amazed how far we have come in shrinking down, speeding them up, while packing in the features and they still work >99.9% of the time. And lets not forget the millions of lines of code they jog through on a daily basis to do the most mundane of tasks like give us the time, weather, play music or watch TV.

            Just got my 500GB NVME disk and a PCIe adapter. I couldn't help but marvel over the tiny Bic lighter sized board. It holds over a million times the amount of data on a 360kb 5.25" floppy or 25000 times as much data on the ancient 20MB hard disk in my first 8086 PC. And the bandwidth is staggering. I ripped a very old 2GB WD IDE disk the other day using a pentium 2 400. Whopping 2.7MB/sec using dd through netcat to another Linux box. 500 MBps from my SATA SDD to the NVME disk and that's not even a third of the NVME write speed while reads are twice that speed! Progress certainly is amazing.

            • (Score: 2) by frojack on Monday August 28 2017, @07:10PM (5 children)

              by frojack (1554) on Monday August 28 2017, @07:10PM (#560413) Journal

              I'm more amazed ad how FEW actual data errors we actually experience.

              I don't buy the nonsense that everything except btrfs experiences silent data loss and we are none the wiser.
              Programs would crash and burn. The kernel would panic, data corruption would be rampant.

              But all those things are just rare as hell, or caught and handled automatically. KiloByte's dire predictions have the distinctive ring of bullshit to me.

              I've lost more data to btrfs (yup - opensuse) and had to reinstall entire systems due to btrfs than all the other file systems I've ever used. (And yes, I still have some Riserfs systems in play).

              --
              No, you are mistaken. I've always had this sig.
              • (Score: 2) by KiloByte on Monday August 28 2017, @07:26PM (2 children)

                by KiloByte (375) on Monday August 28 2017, @07:26PM (#560428)

                Most errors are not noticeable. You'll have a mangled texture in a game, a sub-second series of torn frames in a video, a slight corruption of data somewhere. A bad download you'll retry without investigating the issue, etc.

                On your disk, how much do all executables+libraries take together? It's 4GB or so. The vast majority of it is data.

                Being oblivious to badness makes people happy. I don't want a false sense of happiness.

                --
                Ceterum censeo systemd esse delendam.
                • (Score: 3, Insightful) by frojack on Monday August 28 2017, @10:47PM (1 child)

                  by frojack (1554) on Monday August 28 2017, @10:47PM (#560564) Journal

                  Easy for you to rev up the fud. Hard for you to produce any actual statistics.
                  CERN tried, but 80% of the errors they found were attributable to a mismatch between WD drives and 3Ware controllers,
                  and others pointed out the them that they could never separate those errors from any of their other detected errors.

                  How many times has you BTRFS system detected and/or prevented errors?

                  --
                  No, you are mistaken. I've always had this sig.
                  • (Score: 2) by KiloByte on Tuesday August 29 2017, @07:28PM

                    by KiloByte (375) on Tuesday August 29 2017, @07:28PM (#561043)

                    5 out of 8 disk failures I had in recent months involved a silent error. Four of those had nothing but silent errors, while fifth had a small number of loud errors plus a few thousand of silently corrupted blocks. Of the remaining failures, one was a FTL collapse (also silent, but quite hard to miss), while two were HDDs dying completely.

                    Thus, every single of failures that didn't take the whole disk involved silent data corruption.

                    Only one of those had ext4, and I wasted a good chunk of time investigating what's going on -- random segfaults didn't look like a disk failure in the slightest. The others had btrfs; they were either in RAID or I immediately knew what exactly was lost -- and more crucially, no bad data was ever used or hit the backups.

                    Two days ago, I also had a software failure by stupidly (successfully) trying to reproduce a fandango-on-core mm kernel bug (just reverted by Linus) on my primary desktop with one of loads being disk balance. All it taken to find corrupted data was to run a scrub, all 30ish scribbled upon extents luckily were in old snapshots or a throw-away chroot so I did not even have to restore anything from backups. The repair did not even require a reboot (not counting one after the crash, obviously).

                    By "disk" in the above I mean: 1 SD card (arm), 1 eMMC (arm), 1 SSD disk (x86), the rest spinning rust (x86).

                    --
                    Ceterum censeo systemd esse delendam.
              • (Score: 2) by LoRdTAW on Monday August 28 2017, @09:49PM

                by LoRdTAW (3755) on Monday August 28 2017, @09:49PM (#560541) Journal

                That's the magic. Almost no data loss. I've lost more data to dumb mistakes (# dd if=... of=/dev/sdc↵ ... fuuuuuuuuuk that was the wrong disk!) than hardware failure or bit rot. In fact, I think I see less bit rot nowadays than maybe 15+ years ago. I remember dealing with more corrupt compressed archives and video files on older 250GB ATA disks than I do now. Those kinds of files see a lot of sitting and rot away. Does that mean I am more confident? No. Still a bit paranoid and I keep backups.

                As for BTRFS. I have yet to give it a go. I like the idea of a libre file system with all the goodies like built in RAID and data verification. But without it being blessed as stable and production ready by the devs, I'm just not interested.

              • (Score: 2) by TheRaven on Tuesday August 29 2017, @08:17AM

                by TheRaven (270) on Tuesday August 29 2017, @08:17AM (#560722) Journal
                One of my colleagues did some work a few years ago on using single-bit flips to escape from a JVM sandbox. The basic idea is that you create a pattern in memory where there's a high probability that a memory error will allow you to make the JVM dereference a pointer that allows you to escape. You then try to heat up the memory chips until it will happen. For some of the experiments, they tried pointing a hairdryer at the RAM chips to make the bit flips happen faster. The most interesting thing with that part of the experiment was how high you could get the error rates before most software noticed.
                --
                sudo mod me up
      • (Score: 1, Informative) by Anonymous Coward on Monday August 28 2017, @02:09PM

        by Anonymous Coward on Monday August 28 2017, @02:09PM (#560239)

        Properly done ECC will correct itself, that is what ECC means (error correcting code), this is just as true for errors in the check part as the data.

        Yes there is more data to be corrupted, but you gain the ability to correct it, and detect that it needs correcting.

    • (Score: 0, Informative) by Anonymous Coward on Monday August 28 2017, @12:49PM (4 children)

      by Anonymous Coward on Monday August 28 2017, @12:49PM (#560200)

      BTRFS is trash. I still don't trust it.

      Relatively recently - mid-last year - fired up a new Arch build as a backup target, decided to give btrfs a try - most people were saying it was stable enough to try. Formatted, got a bit fancy with multi-disk and resilience, everything looks good, started backing up, dies part way through (Veeam backing up to Samba CIFS says its lost contact with the target). Machine unresponsive even from iLO. Reboot, everything just backed up is gone (empty volume) along with anything left over in tmp locations being corrupt and full of junk.

      Reformat with a nice stable server-distro (CentOS7), btrfs still marked as experimental and this is a different kernel. Try a nice simple btrfs volume setup without fanciness, similar results.

      XFS works brilliantly (and I've rarely had issues with ext4). The backup target is outfitted with proper ECC RAM, enterprise SAS HDDs, iLO diagnostics and so on - it's not a dodgy workstation or NAS. Veeam and pretty much anything else I throw at it can validate their chain signatures from beginning to end no worries across terabytes of data sitting for up to several years.

      I could forgive an experimental filesystem being experimental. I could probably forgive some dodgy experimental features on a stable base that were clearly marked. I could get some useful diagnostics if the doco was helpful and faults provided useful debugging info. The documentation for btrfs doesn't clearly mark the parts that (if you search for long enough) you discover aren't recommended or stable, and basic operation of the FS is problematic.

      I really wanted to like btrfs (it was perfect for my use case) but I can't trust it so I won't use it.

      • (Score: 4, Interesting) by KiloByte on Monday August 28 2017, @01:22PM (3 children)

        by KiloByte (375) on Monday August 28 2017, @01:22PM (#560217)

        a nice stable server-distro (CentOS7)

        And here's your problem. They use Red Hat's frankenkernels which are notoriously bogus: they mix a truly ancient base (3.10 for 7, 2.6.32 for 6) with backported features from the bleeding edge. Linux' I/O code moves pretty fast, patches tend to have nasty conflicts even three versions apart -- trying to backport complex code from 4.9 to 3.10 just can't possibly end well. And, despite trying something as risky, Red Hat doesn't even have a single btrfs-specific engineer. No wonders it keeps breaking.

        --
        Ceterum censeo systemd esse delendam.
        • (Score: 0) by Anonymous Coward on Monday August 28 2017, @02:52PM

          by Anonymous Coward on Monday August 28 2017, @02:52PM (#560260)

          I wasn't aware Poettering was in charge of the kernel too...

        • (Score: 3, Informative) by hoeferbe on Monday August 28 2017, @04:06PM (1 child)

          by hoeferbe (4715) on Monday August 28 2017, @04:06PM (#560290)
          KiloByte [soylentnews.org] wrote [soylentnews.org]:

          They use Red Hat's frankenkernels which are notoriously bogus: they mix a truly ancient base (3.10 for 7, 2.6.32 for 6) with backported features from the bleeding edge.

          You have an interesting characterization of Red Hat's compatibility policy [redhat.com].  I believe many of Red Hat's customers appreciate the stability that provides.  As can be seen from Red Hat Enterprise Linux's life cycle [redhat.com], new features and significantly new hardware support is only provided in the first 5.5 years of its 10 year life cycle.  (I.e. "Production Phase 1".)  This means new features are not being added to RHEL6.  They are being added to RHEL7's Linux kernel, but I would not go so far as to say 3.10 (which came out in 2013 and is a "longterm" release [kernel.org]) is "ancient".

          Linux' I/O code moves pretty fast, patches tend to have nasty conflicts even three versions apart -- trying to backport complex code from 4.9 to 3.10 just can't possibly end well.

          And yet Red Hat appears to be successful at it!  ;-)

          Red Hat doesn't even have a single btrfs-specific engineer.

          I found this Hacker News thread on Red Hat's Btrfs notice [ycombinator.com] informative since it came from a former Red Hat employee.

          • (Score: 2) by sjames on Monday August 28 2017, @08:15PM

            by sjames (2882) on Monday August 28 2017, @08:15PM (#560467) Journal

            On the other hand, Debian stable isn't known for moving at breakneck speed either but it supports BTRFS just fine.

            RH's continuing dedication to xfs is a bit puzzling as well. xfs made sense in the '90s on IRIX back when SGI was doing heavy video work on hardware where disk I/O was barely up to the task. In that environment, software willing to go out of it's way to work with the FS to maximize I/O could see substantial performance gains. None of that really matters much now, especially since the most interesting features had a hard requirement on specialized hardware that doesn't exist on most servers today. Further, even with the special hardware, the features were not ported to the Linux implementation anyway.

            COW really looks like the future in file systems. It's early(-ish) days for BTRFS, but it really is a credible choice these days as long as you stay away from raid > 1 and according to some, zlib compression (lzo is fine). I could understand if RH wanted to make a play with ZFS instead of BTRFS, but abandoning all of it to keep pushing XFS makes no sense.

  • (Score: 2) by bzipitidoo on Monday August 28 2017, @02:45PM (3 children)

    by bzipitidoo (4388) on Monday August 28 2017, @02:45PM (#560256) Journal

    One of the last pieces was file system check for btrfs. I read that it finally exists, but "it is not well-tested in real-life situations yet." Yeesh.

    What's that? "Btrfs is fairly self-healing"?? No, built-in fsck on the fly the way btrfs does it isn't a replacement for offline fsck, you know, with the volumes NOT mounted. How else do you guarantee nothing else is altering the file system during a repair attempt?

    • (Score: 2) by KiloByte on Monday August 28 2017, @07:43PM

      by KiloByte (375) on Monday August 28 2017, @07:43PM (#560442)

      One of the last pieces was file system check for btrfs. I read that it finally exists, but "it is not well-tested in real-life situations yet." Yeesh.

      Eh? Offline fsck for btrfs exists since 2007, btrfs itself was merged into mainline kernel in 2009. WhereTF are you pulling this data from?

      No, built-in fsck on the fly the way btrfs does it isn't a replacement for offline fsck, you know, with the volumes NOT mounted. How else do you guarantee nothing else is altering the file system during a repair attempt?

      Unlike most filesystems, usually btrfs does not require to be unmounted in order to check for damage and repair it. Yeah, some damage can't be repaired live, but it's better if common cases don't require downtime.

      --
      Ceterum censeo systemd esse delendam.
    • (Score: 2) by darkfeline on Tuesday August 29 2017, @03:25AM

      by darkfeline (1030) on Tuesday August 29 2017, @03:25AM (#560655) Homepage

      Why do you even need fsck on a CoW file system? In personal experience, 99% of the use case for fsck is replaying the journal and finding orphaned inodes. CoW doesn't need the former, and the latter can be done online.

      --
      Join the SDF Public Access UNIX System today!
    • (Score: 4, Insightful) by TheRaven on Tuesday August 29 2017, @08:24AM

      by TheRaven (270) on Tuesday August 29 2017, @08:24AM (#560725) Journal
      There's a strange attachment to fsck, but it's a program that really shouldn't need to exist. There are three kinds of errors in a filesystem: those that can be detected automatically, those that can be detected and corrected automatically, and those that can't be detected automatically. The last set can, by definition, not be caught by fsck. The middle set shouldn't need fsck, because the filesystem code should be handling them automatically. The first set definitely shouldn't be relying on an offline tool, because you want to stop using a filesystem as soon as you've detected an unrecoverable error. Tools like fsck exist solely to work around either poor filesystem design or CPU limitations on ancient hardware. They have no place on a modern FS.
      --
      sudo mod me up
  • (Score: 4, Interesting) by requerdanos on Monday August 28 2017, @05:55PM (2 children)

    by requerdanos (5997) Subscriber Badge on Monday August 28 2017, @05:55PM (#560358) Journal

    I use brtfs with compression on an SSD on my /home on this machine. Interesting observations:

    First, it's fast.

    Second, I can reliably generate a kernel panic by mounting a btrfs volume with -o compress-force=zlib and then copying a nontrivial amount of data to it. Boom, dead every time. Using lzo instead of zlib results in no problems. Now, is this because btrfs has serious problems, or is that a problem with early Ryzen chips (which this machine has)? I don't know but I have seen other reports of btrfs compression "crashing" so it might not be "just me."

    And finally, for years the brtfs wiki [kernel.org] has said the following:

    Can I determine what compression method was used on a file?
    Not directly, but this is possible from a userspace tool without any special kernel support (the code just has not been written).

    So, the code has just not been written? (Windows NT had this in the mid-90s via the compact [microsoft.com] command.) Without this code, there is essentially no way to see how much space any given files are taking up on disc. It would be very nice for this to not be missing.

    • (Score: 3, Informative) by KiloByte on Monday September 04 2017, @07:59PM (1 child)

      by KiloByte (375) on Monday September 04 2017, @07:59PM (#563552)

      'ere you go [github.com]. Alas, the ioctl requires root, and the code is a first working stab, but at least the "working" part seems to be not an exaggeration. And there's a big pull request waiting already, before I even had a chance to clean it up ☺

      --
      Ceterum censeo systemd esse delendam.
      • (Score: 2) by requerdanos on Monday September 04 2017, @08:12PM

        by requerdanos (5997) Subscriber Badge on Monday September 04 2017, @08:12PM (#563555) Journal


        195396 files.
        all 62% 37G/ 59G
        none 100% 23G/ 23G
        zlib 22% 2.6G/ 11G
        lzo 44% 11G/ 24G

        I give you my warm thanks. I really appreciate it.

  • (Score: 2) by KritonK on Tuesday August 29 2017, @08:19AM

    by KritonK (465) on Tuesday August 29 2017, @08:19AM (#560723)

    The article's main point seems to be that SUSE is a much bigger contributor to BTRFS than Red Hat and that they will continue to support it in SUSE Linux Enterprise Server as the default file system. That may well be, but Red Hat is a bigger player in the server OS market, with Red Hat Enterprise Linux and, especially, its free variant, CentOS. Although my workstations at work and at home run OpenSUSE Tumbleweed, which I prefer over Fedora, I run CentOS on all our company's servers and don't expect to replace it with anything else any time soon.

    Thus, although Red Hat may not be a big contributor to BTRFS, their decision to stop supporting it does affect a lot of people.

(1)