Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Sunday December 21 2014, @04:54PM   Printer-friendly
from the show-stopper-or-rare-event? dept.

Noted Linux expert Chris Siebenmann has described two catastrophic failures involving systemd.

One of the problems he encountered with systemd became apparent during a disastrous upgrade of a system from Fedora 20 to Fedora 21. It involved PID 1 segfaulting during the upgrade process. He isn't the only victim to suffer from this type of bad experience, either. The bug report for this problem is still showing a status of NEW, nearly a month after it was opened.

The second problem with systemd that he describes involves the journalctl utility. It displays log messages with long lines in a way that requires sideways scrolling, as well as displaying all messages since the beginning of time, in forward chronological order. Both of these behaviors contribute to making the tool much less usable, especially in critical situations where time and efficiency are of the essence.

Problems like these raise some serious questions about systemd, and its suitability for use by major Linux distros like Fedora and Debian. How can systemd be used if it can segfault in such a way, or if the tools that are provided to assist with the recovery exhibit such counter-intuitive, if not outright useless, behavior?

Editor's Comment: I am not a supporter of systemd, but if there are only 2 such reported occurrences of this fault, as noted in one of the links, then perhaps it is not a widespread fault but actually a very rare one. This would certainly explain - although not justify - why there has been so little apparent interest being shown by the maintainers. Nevertheless, the fault should still be fixed.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Sunday December 21 2014, @06:37PM

    by Anonymous Coward on Sunday December 21 2014, @06:37PM (#128074)

    Of course you wouldn't understand these problems. You're obviously not a system administrator.

    No, the home user dicking around with Linux in a VM on his MacBook Pro won't notice these problems. But that person isn't really using Linux seriously, either.

    But when you're managing systems that do serious work, often affecting a business' ability to operate and profit, you need log files that work so you can diagnose problems quickly and easily.

    Every sideways scroll could waste time, resulting in thousands of dollars being lost each second.

    Welcome to the real world, bud. We don't have time for your shitty binary log files fucking up the businesses we're trying to run.

  • (Score: 2) by DrMag on Sunday December 21 2014, @06:59PM

    by DrMag (1860) on Sunday December 21 2014, @06:59PM (#128080)

    No need to be condescending and presumptuous.

    No, I'm not a sysadmin by trade, but I've learned a good bit about it for managing my own linux server that I maintain. I do use linux seriously, and suggesting that the only people who do are full-time sysadmins is ridiculous.

    And I would venture to guess that if your business is losing thousands of dollars because you had to scroll to the side for a second, your business is poorly managed to begin with. But perhaps that's me being condescending and presumptuous. =)

    • (Score: 0) by Anonymous Coward on Sunday December 21 2014, @07:23PM

      by Anonymous Coward on Sunday December 21 2014, @07:23PM (#128088)

      A sysadmin contracted to solve a problem can be on contract to be payed hundreds of dollars per hour to fix that issue. Time spent scrolling to find a problem using this rather than the tried and tested tools like 'tail' and 'grep' can result in larger support bills.

      • (Score: 4, Insightful) by maxwell demon on Sunday December 21 2014, @07:28PM

        by maxwell demon (1608) on Sunday December 21 2014, @07:28PM (#128089) Journal

        So the tools don't allow to pipe the output to tail or grep? Then that is the real problem, not overly long log lines or too long logs.

        --
        The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 5, Insightful) by Arik on Sunday December 21 2014, @07:31PM

          by Arik (4543) on Sunday December 21 2014, @07:31PM (#128090) Journal
          Yes, systemd does not produce text logs that can be read using any of the many mature and robust text tools available - it must do them in a binary format which can only be read with their crappy viewer, and yes, no longer producing log files in text is the big problem here. You can't bug-fix your way out of defective-by-design.
          --
          If laughter is the best medicine, who are the best doctors?
          • (Score: 2) by choose another one on Monday December 22 2014, @11:30AM

            by choose another one (515) Subscriber Badge on Monday December 22 2014, @11:30AM (#128284)

            Is there something about "ForwardToSyslog=yes" that doesn't work for you - does that not produce log files in text ?

            Seems to me that going from binary to text is the right way - it's trivial once you've decided on a format - going the other way (if you need the binary logs with full metadata, which I presume somebody does...) requires ensuring the text formattinging is reversible (quoting etc.) and writing a parser. Seems to me that binary-first is the right design if someone needs binary logs.

            • (Score: 1) by Arik on Monday December 22 2014, @11:49AM

              by Arik (4543) on Monday December 22 2014, @11:49AM (#128290) Journal
              The problem is that option fails when boot fails - the one time when you most need it.

              Logs should be in text and converted to binary later if someone really needs binary, not the other way around, because this allows you to read the logs after a failed boot, and binary does not. You can convert from one to another to your hearts content, after you fix the problem and get the machine back on its feet.
              --
              If laughter is the best medicine, who are the best doctors?
        • (Score: 2) by fnj on Sunday December 21 2014, @07:42PM

          by fnj (1654) on Sunday December 21 2014, @07:42PM (#128094)

          Who told you that, or did you just make it up? Tools exist, but if THE DATA ITSELF IS UNWIELDY (e.g., the long lines mentioned), then dealing with the data using any tools, including tail and grep, is going to be unwieldy.

          • (Score: 2) by maxwell demon on Sunday December 21 2014, @08:03PM

            by maxwell demon (1608) on Sunday December 21 2014, @08:03PM (#128104) Journal

            Who told you that, or did you just make it up?

            Did you actually read the post I replied to?

            --
            The Tao of math: The numbers you can count are not the real numbers.
            • (Score: 2) by fnj on Sunday December 21 2014, @08:22PM

              by fnj (1654) on Sunday December 21 2014, @08:22PM (#128112)

              Yes, actually, I did. What caused you to leap to "So the tools don't allow to pipe the output to tail or grep?" Because journalctl fully allows such piping.

              • (Score: 2) by maxwell demon on Sunday December 21 2014, @08:42PM

                by maxwell demon (1608) on Sunday December 21 2014, @08:42PM (#128115) Journal

                This:

                Time spent scrolling to find a problem using this rather than the tried and tested tools like 'tail' and 'grep' can result in larger support bills.

                --
                The Tao of math: The numbers you can count are not the real numbers.
                • (Score: 0) by Anonymous Coward on Sunday December 21 2014, @11:45PM

                  by Anonymous Coward on Sunday December 21 2014, @11:45PM (#128155)

                  journalctl -b -u ntpd.service | grep waking
                  Dec 19 00:51:57 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                  Dec 19 00:51:59 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                  Dec 19 17:07:45 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                  Dec 20 00:19:19 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                  Dec 20 07:53:49 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                  Dec 20 07:53:51 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                  Dec 21 17:49:27 wheatley ntpd[1765]: new interface(s) found: waking up resolver

                  I don't see the problem, here.

                  • (Score: 0) by Anonymous Coward on Monday December 22 2014, @12:07AM

                    by Anonymous Coward on Monday December 22 2014, @12:07AM (#128160)

                    let me correct that for you.
                    journalctl -b -u ntpd.service | grep waking
                    journalctl: log corrupted.

                    because they refuse to fix the automatic log corruption if you turn on log rotation.

                  • (Score: 3, Insightful) by Anonymous Coward on Monday December 22 2014, @12:36AM

                    by Anonymous Coward on Monday December 22 2014, @12:36AM (#128174)

                    I'll highlight the problems for you:

                    journalctl -b -u ntpd.service | grep waking
                    Dec 19 00:51:57 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                    Dec 19 00:51:59 wheatley ntpd[1765]: new interface(s) found: waking up resolverC�h�S�%��͚������ER$��[����f
                    Dec 20 07:53:51 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                    Dec 19 17:07:45 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                    Dec 20 07:53:49 wheatley ntpd[1765]: new interface(s) found: waking up resolver
                    Dec 20 00:19:19 wheatley ntpd[1765]: new interface(s) found: waking up resolverRy}?���c]MKS�\��E�_��

                    The first problem I see is that you need to use journalctl. It should never be a requirement that a log file be filtered through some program before grep can work with it.

                    The second problem I see is that some of the lines are out of order. I don't know if the timestamps are wrong, or if they were logged in the wrong order, or if journalctl screwed them up, but clearly some are where they don't belong.

                    The third problem I see is the random binary data at the end of some of the lines. That probably should not be there as far as I can tell.

  • (Score: 2) by choose another one on Sunday December 21 2014, @09:58PM

    by choose another one (515) Subscriber Badge on Sunday December 21 2014, @09:58PM (#128136)

    > Of course you wouldn't understand these problems. You're obviously not a system administrator.

    If you want to get into a pissing contest about number of systems managed then I seriously doubt you are going to win against the RedHat cloud admins, unless you admin for Google or Amazon, and for some reason they want systemd and binary logs. No, I don't _know_ why, I don't manage thousands of servers either, but i can take some good guesses.

    Furthermore, binary logs can also be trivially turned into text (oh look, systemd even provides a configuration to do that) - the converse is not true.

    > No, the home user dicking around with Linux in a VM on his MacBook Pro won't notice these problems. But that person isn't really using Linux seriously, either.

    I guess the sysadmin managing a few tens of on-premise or colo-ed servers won't notice the kind of problems that the RedHat guys managing multiple data centres either, maybe that's because they aren't using Linux seriously ?

    • (Score: 4, Insightful) by Arik on Sunday December 21 2014, @11:11PM

      by Arik (4543) on Sunday December 21 2014, @11:11PM (#128150) Journal
      "Furthermore, binary logs can also be trivially turned into text (oh look, systemd even provides a configuration to do that)"

      Oh, look, that assumes that you booted correctly.

      This is a system designed to fail just when it is needed most. When the system is working fine, you can export the logs that show it working fine, but when it fails and you really need to see those logs, you cannot.

      --
      If laughter is the best medicine, who are the best doctors?
  • (Score: 2) by darkfeline on Sunday December 21 2014, @10:09PM

    by darkfeline (1030) on Sunday December 21 2014, @10:09PM (#128139) Homepage

    Going from `cat foobar | tail` to `journalctl foobar | fold | tail` (Yes, useless cat, I know)

    Wow, that was hard! Damn that systemd making everything harder!

    --
    Join the SDF Public Access UNIX System today!
    • (Score: 0) by Anonymous Coward on Monday December 22 2014, @12:12AM

      by Anonymous Coward on Monday December 22 2014, @12:12AM (#128164)

      Now try it when the log file is partially corrupted due to a bad disk. If you aren't a dumbass, you can still make out some of the log entries when things are done the proper way and text files are used. Now try doing that with journalctl. Oh, fuck you, there's no output at all! Fold and tail all you want, it's not going to do a fucking ounce of good. You'll sit there scratching your head, wondering where the hell the log entries are.

      • (Score: 1) by Anonymous Coward on Monday December 22 2014, @12:42AM

        by Anonymous Coward on Monday December 22 2014, @12:42AM (#128179)

        No disk problems needed. Just turn on log rotation and systemD corrupts the log's it's self 1/4 of the time. It's a known bug that they marked as 'won't fix'

        • (Score: 0) by Anonymous Coward on Monday December 22 2014, @12:57AM

          by Anonymous Coward on Monday December 22 2014, @12:57AM (#128186)

          LOL! Do you have the bug number for this?

  • (Score: 2) by cafebabe on Thursday December 25 2014, @08:29AM

    by cafebabe (894) on Thursday December 25 2014, @08:29AM (#129058) Journal

    I've been in a situation where downtime cost more than US$1 per second. Specifically, on rotation, I was solo 24 hour support for a 4,000 core renderfarm. I believe that RenderMan licences were US$3,000 per processor pair. So, that's US$6 million per year. I also believe that average salary for 500 people was about US$50,000. So that's about US$25 million per year. There was also nine other applications licenced for the renderfarm, electricity, hardware deprecation (not restricted to 3,000 harddisks) and business taxes.

    In these circumstances, being deprived of grep and/or tail -f is going to cost money. In more serious circumstances, it could foreseeably cost lives.

    --
    1702845791×2