Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Sunday May 10 2020, @06:47PM   Printer-friendly
from the stormy-weather dept.

System adminsitrator Chris Siebenmann has found Modern versions of systemd can cause an unmount storm during shutdowns:

One of my discoveries about Ubuntu 20.04 is that my test machine can trigger the kernel's out of memory killing during shutdown. My test virtual machine has 4 GB of RAM and 1 GB of swap, but it also has 347 NFS[*] mounts, and after some investigation, what appears to be happening is that in the 20.04 version of systemd (systemd 245 plus whatever changes Ubuntu has made), systemd now seems to try to run umount for all of those filesystems all at once (which also starts a umount.nfs process for each one). On 20.04, this is apparently enough to OOM[**] my test machine.

[...] Unfortunately, so far I haven't found a way to control this in systemd. There appears to be no way to set limits on how many unmounts systemd will try to do at once (or in general how many units it will try to stop at once, even if that requires running programs). Nor can we readily modify the mount units, because all of our NFS mounts are done through shell scripts by directly calling mount; they don't exist in /etc/fstab or as actual .mount units.

[*] NFS: Network File System
[**] OOM Out of memory.

We've been here before and there is certainly more where that came from.

Previously:
(2020) Linux Home Directory Management is About to Undergo Major Change
(2019) System Down: A systemd-journald Exploit
(2017) Savaged by Systemd
(2017) Linux systemd Gives Root Privileges to Invalid Usernames
(2016) Systemd Crashing Bug
(2015) tmux Coders Asked to Add Special Code for systemd
(2016) SystemD Mounts EFI pseudo-fs RW, Facilitates Permanently Bricking Laptops, Closes Bug Invalid
(2015) A Technical Critique of Systemd
(2014) Devuan Developers Can Be Reached Via vua@debianfork.org
(2014) Systemd-resolved Subject to Cache Poisoning


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by Anonymous Coward on Sunday May 10 2020, @06:58PM (23 children)

    by Anonymous Coward on Sunday May 10 2020, @06:58PM (#992493)

    I'm shocked, just shocked. And you know what Poettering's reaction is going to be: "Fuck you, Chris Sibenmann! Will not fix."

    Starting Score:    0  points
    Moderation   +3  
       Insightful=3, Total=3
    Extra 'Insightful' Modifier   0  

    Total Score:   3  
  • (Score: 3, Touché) by DimestoreProstitute on Monday May 11 2020, @12:34AM (22 children)

    by DimestoreProstitute (9480) on Monday May 11 2020, @12:34AM (#992571)

    Nah he'll just say NFS is archaic and mounts should be moved to his own protocol.

    • (Score: 5, Touché) by qzm on Monday May 11 2020, @01:09AM (20 children)

      by qzm (3260) on Monday May 11 2020, @01:09AM (#992594)

      Exactly! He has never had this particular problem on HIS LAPTOP, and all SystemD work is based on what Pottering thinks would make HIS LAPTOP work better this week.

      Should he ever hit this problem, it would of course indicate a problem with all existing other filesystems, and require everyone to switch to Systemd-FS, which would be modeled on how he things NTFS works.

      • (Score: 1, Funny) by Anonymous Coward on Monday May 11 2020, @03:10AM

        by Anonymous Coward on Monday May 11 2020, @03:10AM (#992643)

        SystemdFS?
        Stop giving them ideas. Please.

      • (Score: 2) by driverless on Monday May 11 2020, @05:18AM (18 children)

        by driverless (4770) on Monday May 11 2020, @05:18AM (#992687)

        Not wanting to defend systemd here (who has that much asbestos?) but this is a pretty weird situation, it's not that systemd is fatally broken... well OK, we can debate that elsewhere, but that presumably no-one working on it ever expected to run into a situation with three hundred and forty seven NFS mounts. I mean WTF is this guy doing, NFS mounting the planet? In any case now the systemd people know that something as outlandish as this is actually possible, they can do something to fix it.

        • (Score: 4, Insightful) by canopic jug on Monday May 11 2020, @08:28AM (5 children)

          by canopic jug (3949) Subscriber Badge on Monday May 11 2020, @08:28AM (#992716) Journal

          When writing programs, test ranges of input and handle extremes and exceptions. If you have an input that allows an minimum through a maximum, it should have an appropriate reponse when getting something out of range. That used to be common knowledge prior to Bill personally destroying comp sci across the country. Poettering seems to be of this new generation brought up under Bill's values and lack of knowledge. An out of memory event is not an appropriate response, the extra unmount requests need to be queued up instead and handled as memory becomes available.

          --
          Money is not free speech. Elections should not be auctions.
          • (Score: 3, Interesting) by driverless on Monday May 11 2020, @11:55AM (3 children)

            by driverless (4770) on Monday May 11 2020, @11:55AM (#992757)

            AFAIK there's no maximum for NFS mounts, so in theory your software is broken if it can't handle 6.022e23 NFS mounts. That's a problem with people writing standards, virtually none of them ever specify a valid range for something. It's astounding how much security code you can crash or even get code exec on simply by sending data with a range that's valid according to the spec but unexpected by the implementation. For example vast amounts of PKI has a notation '(1...MAX)' where MAX means "there's some sort of maximum somewhere but we're not going to tell you what it should be", so when you define MAX = 2^64 - 1 things break. You can do the same with TLS extensions, IPsec options, ...

            The other problem is that if your code does actually enforce sanity checks then there's always someone who comes along to complain that your idea of what constitutes a sane upper bound doesn't match theirs and therefore your code is broken and you need to fix it.

            • (Score: 4, Insightful) by canopic jug on Monday May 11 2020, @12:06PM

              by canopic jug (3949) Subscriber Badge on Monday May 11 2020, @12:06PM (#992766) Journal

              There is a maximum that it can unmount concurrently in parallel. It is dependent on memory and other factors. After that limit, the rest get queued up and then processed only as resources become available.

              --
              Money is not free speech. Elections should not be auctions.
            • (Score: 2, Insightful) by Anonymous Coward on Monday May 11 2020, @09:39PM (1 child)

              by Anonymous Coward on Monday May 11 2020, @09:39PM (#993091)

              in theory your software is broken if it can't handle 6.022e23 NFS mounts

              Not in theory, in practice that is broken. It doesn't need to handle them successfully; error handling is handling. If it is going to crap out after 10, 100, 1e6, whatever mounts, then it should fail gracefully.

              GP's comment about bounds shows insight. I have worked in QA and the very first thing I would do if a step could be done 0..N times, would be to try 0, 1, 2, 32, 256, 257, 1025, 65537, N-1, N, N+1 times.

              In this case, the tests might've included "mount multiple NFS" but the test set definitely wasn't well-planned if it never exhausted resources. What if NFS had sekrit internal workings using an 8-bit-byte as NFS UID? What if NFS has static buffers that I can overflow? The test set needs a "cause resource exhaustion and confirm error behaviour is sane and not 'corrupt everything'" and then if it starts to *not* error gracefully (eg. due to increased structure size, dynamic lists instead of static arrays, whatever) then it should fail, then, so that the test set can be updated with whatever new values are required to confirm behaviour in erroring conditions.

              tl;dr: failing to test the scenarios when something SHOULD fail means the testing is not thorough, and likely there are bugs.

              • (Score: 2) by driverless on Tuesday May 12 2020, @01:24AM

                by driverless (4770) on Tuesday May 12 2020, @01:24AM (#993185)

                tl;dr: failing to test the scenarios when something SHOULD fail means the testing is not thorough, and likely there are bugs.

                Bugs? In systemd? Naaaah, you're pulling my leg...

          • (Score: 0) by Anonymous Coward on Wednesday May 13 2020, @02:12PM

            by Anonymous Coward on Wednesday May 13 2020, @02:12PM (#993755)

            Are both unchecked for conditions in Firefox since just after they ousted the original developer of firefox as the project lead (not sure if he is still with Mozilla, I just remember they placed their own PHB as manager over the project, which then moved to XUL, causing memory bloat and performance regressions until the JIT version of their javascript engine got released.)

            I forget what other applications but a *LOT* of applications on linux have no handling for either out of memory or out of disk errors, including for example java running on 32bit (just make it 64 bit and the problem goes away, because nobody will ever hit memory pressure on a 64 bit system, lol! Same was said during the 16->32 bit transition btw.)

        • (Score: 3, Interesting) by Thexalon on Monday May 11 2020, @12:42PM (9 children)

          by Thexalon (636) on Monday May 11 2020, @12:42PM (#992781)

          Good design accounts for weirdness, extremes, and unexpected user behavior.

          For instance, any properly designed piece of software must ensure that if the process is interrupted (e.g. someone just pulled the plug from the computer), it can recover with minimal damage. That means that you have to leave files on disk in a state where they can function or be recovered at all times, and well-written software does just that.

          This sounds like a case where Lennart decided to ignore the simple fact that every single call to malloc() can come back with a null pointer, and if you do that you create segfaults. It also sounds like he ignored the fact that if somebody can do a thing more than once, they can do them infinite numbers of times.

          The best QA guy I ever worked with was absolutely fantastic at finding ways to interact with whatever I had built that nobody except sometimes me had planned for. I'd send him the use cases, he'd poke around and find a bunch more and see whether I'd considered what happened when users did unusual things (e.g. interact with your web-based thing without Javascript turned on). If Lennart were some kid in his parents' basement working on a hobby project I'd consider skipping that step to be excusable, but he's a professional programmer leading a critical project for a billion-dollar company who can definitely afford to have QA and should have somebody involved who thinks of this kind of thing.

          --
          The only thing that stops a bad guy with a compiler is a good guy with a compiler.
          • (Score: 2) by driverless on Monday May 11 2020, @01:00PM (8 children)

            by driverless (4770) on Monday May 11 2020, @01:00PM (#992793)

            Good design accounts for weirdness, extremes, and unexpected user behavior.

            Good hindsight accounts for weirdness, extremes, and unexpected user behavior. The problem with weirdness, extremes, and unexpected user behavior is that it's weird and unexpected until the first time you encounter it, after which you fix your software to handle it the next time round. If it was expected behaviour then your software would be able to handle it already.

            • (Score: 2) by Thexalon on Monday May 11 2020, @01:59PM (7 children)

              by Thexalon (636) on Monday May 11 2020, @01:59PM (#992811)

              You seem to have missed the point of my post.

              Good design assumes that anything that can happen will happen. Really really good design also assumes that anything that can't happen will happen too. There are some limits I'd expect to have to the ability to adapt, e.g. I don't expect software to be able to keep functioning if the server room is on fire, but handling edge cases gracefully is a hallmark of good software, and I expect software that is at the core of millions of systems to be held to a very high quality standard.

              --
              The only thing that stops a bad guy with a compiler is a good guy with a compiler.
              • (Score: 2) by driverless on Monday May 11 2020, @02:08PM (6 children)

                by driverless (4770) on Monday May 11 2020, @02:08PM (#992817)

                I didn't miss the point, you're saying that theoretically perfect software should be able to handle all possible error and unexpected conditions. I'm saying that such software doesn't exist. The closest we've got is formally verified software, which averages to, from memory, something like $1,000 per line of code to develop. And it still sometimes fails to handle every possible condition.

                • (Score: 3, Informative) by Thexalon on Monday May 11 2020, @02:26PM

                  by Thexalon (636) on Monday May 11 2020, @02:26PM (#992828)

                  The Linux kernel doesn't have these kinds of problems on a regular basis. And a major reason for this is that Linus is notoriously ruthless about demanding contributors think about and address those weird conditions and edge cases before he'll even consider merging their code.

                  I'd expect systemd, with its aspirations to be at least as critical to Linux systems as the kernel is, to be held to similar standards. It's not, and the fact that it's not is a problem.

                  --
                  The only thing that stops a bad guy with a compiler is a good guy with a compiler.
                • (Score: 2) by rleigh on Monday May 11 2020, @05:52PM (3 children)

                  by rleigh (4887) on Monday May 11 2020, @05:52PM (#992966) Homepage

                  In the field I work in, every system requirement has to have an associated FMEA (failure modes effects analysis), which includes all of the hardware and software mitigations to take. It's tedious, but it ensures that all of the common and not-so-common failure modes have been thoroughly explored by a whole team of people, and that the appropriate mitigations have been implemented where appropriate.

                  Do you think the systemd developers have done this, or anything remotely like this? No, neither do I. They don't care about stuff like that.

                  And yet... having deliberately placed themselves in the most safety-critical part of the system, that's exactly what they should be doing.

                  Whenever you parallelise something, you've got to have an upper bound on the parallelisation. Often, that's a maximum bound, and you might want to lower it if the system can't cope. Look at how, e.g. ZFS balances I/O. It's continually monitoring the available bandwidth to each device and adjusting the I/O load on them to maximise throughput. It also cares about responsiveness. If you start e.g. a scrub, it will consume all available disk bandwidth, but it has a very clever algorithm which slowly ramps up the utilisation over several minutes, and it has a fast backoff if any other I/O requests come in. There's no reason that systemd couldn't be doing this on unmount. It doesn't need to parallelise everything, it can start slow, monitor the time each umount takes, the completion rate and system resources used, and it can back right off if things start stalling. But as mentioned elsewhere in the thread, this is a place where parallelisation is almost pointless. Sometimes the simple solution is the best solution. You can safely and reliably unmount everything in a one line shell script, so why can't systemd do something that simple?

                  • (Score: 0) by Anonymous Coward on Monday May 11 2020, @08:51PM

                    by Anonymous Coward on Monday May 11 2020, @08:51PM (#993065)

                    They definitely didn't do anything near that. They have less than 50% function coverage and less than 38% statement coverage. I can't even imaging them trying to get anywhere near the truly required edge, branch, and condition coverage. Give me some MCDC, fuzzing, and property testing. But no, don't do any of that but then act surprised when bugs people predicted and warned you about crop up.

                  • (Score: 0) by Anonymous Coward on Monday May 11 2020, @09:02PM

                    by Anonymous Coward on Monday May 11 2020, @09:02PM (#993073)

                    Shell script is anathema for systemd. Any admin can debug that with a plain editor, or even test interactively by copy & paste, then edit lines in shell history (basic REPL). Not in the systemd vision.

                    We already concluded that IBM (and RH before them) plan is to add complexity so only them can touch the steering wheel, and all the rest have to buy support contracts and training, and "enjoy" the ride. And as you said in the past, other projects bent over dreaming they will still matter in the future and everything will be roses by then, when Rome does not pay traitors (that is the part I add).

                    Free Software licenses like GPL require distribution of "source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities". Corporations found the loophole, the license does not say anything about making such big mess that people can make use of their freedoms. The mess is so big in some projects, that even small companies can not do it. Even less so, fork a project to try to get it back into sanity.

                  • (Score: 2) by driverless on Tuesday May 12 2020, @01:29AM

                    by driverless (4770) on Tuesday May 12 2020, @01:29AM (#993188)

                    Your mention of FMEA is exactly the point I was making about software that ends up costing $1,000 per line of code in it. I've worked on SIL 2 and 3 systems and the amount of effort required for even a simple system is insane, no standard commercial or open-source system could be developed that way. Sure, it's then highly reliable, but only if you're prepared to invest the massive amounts of time and money into doing it that way.

                    Anyway, as I mentioned earlier, not trying to defend systemd, but pointing out that just because it's in theory possible to build something to (say) SIL 3 doesn't mean it's practical for most software.

                • (Score: 0) by Anonymous Coward on Monday May 11 2020, @09:45PM

                  by Anonymous Coward on Monday May 11 2020, @09:45PM (#993093)

                  This issue here isn't some weird logic bug. This issue is a simple "cannot make call X more than N times."

                  That is not "all possible error and unexpected conditions" as you call it, that is "they failed to test atomic fault handling for a well known and commonly encountered fault" - making sure you check for nulls out of malloc() is 1990s bugs - and it is "their testing wasn't written to find and expose issues" because as Thex and I know very well, this is well within the kind of things that a good QA person tries.

        • (Score: 0) by Anonymous Coward on Monday May 11 2020, @08:36PM

          by Anonymous Coward on Monday May 11 2020, @08:36PM (#993057)

          > they can do something to fix it

          "Works fine for me, won't fix" ~ Lennart Poettering, every damn time

        • (Score: 3, Interesting) by DeVilla on Tuesday May 12 2020, @06:52AM

          by DeVilla (5354) on Tuesday May 12 2020, @06:52AM (#993267)

          If systemd had umount()ed serially or called `umount -a`, this would not have happened.

          Several people are arguing things about whether you should handle weird errors or not. That misses the point. Systemd took over the process of unmounting files systems and decided to do so in parallel. If you are doing parallel processing you monitor your resources. Systemd in not doing this.

          Systemd's allowing users to unknowingly to define a fork bomb without checking if the resource needed are available nor providing the user a mechanism to check or prevent it. The fact that it's a umount() is beside the point. If I have a large system (assumes the system has resource to run everything in steady state) can I trigger a fork bomb induced OMM by starting tons of small, independent services via unit files? If systemd is doing it's job correctly, then the answer should be no.

    • (Score: 2) by DeVilla on Tuesday May 12 2020, @06:30AM

      by DeVilla (5354) on Tuesday May 12 2020, @06:30AM (#993261)

      No. Manually mounting files systems via direct calls to `mount` in script is not supported. It's horribly broken, doesn't work and never has worked. The filesystems should be mounted via unit files which has a declarative syntax that is clearly superior. (The ancient fstab sort of works too, but that's really for only backwards compatibility and should be avoided as well.)