Noted Linux expert Chris Siebenmann has described two catastrophic failures involving systemd.
One of the problems he encountered with systemd became apparent during a disastrous upgrade of a system from Fedora 20 to Fedora 21. It involved PID 1 segfaulting during the upgrade process. He isn't the only victim to suffer from this type of bad experience, either. The bug report for this problem is still showing a status of NEW, nearly a month after it was opened.
The second problem with systemd that he describes involves the journalctl utility. It displays log messages with long lines in a way that requires sideways scrolling, as well as displaying all messages since the beginning of time, in forward chronological order. Both of these behaviors contribute to making the tool much less usable, especially in critical situations where time and efficiency are of the essence.
Problems like these raise some serious questions about systemd, and its suitability for use by major Linux distros like Fedora and Debian. How can systemd be used if it can segfault in such a way, or if the tools that are provided to assist with the recovery exhibit such counter-intuitive, if not outright useless, behavior?
Editor's Comment: I am not a supporter of systemd, but if there are only 2 such reported occurrences of this fault, as noted in one of the links, then perhaps it is not a widespread fault but actually a very rare one. This would certainly explain - although not justify - why there has been so little apparent interest being shown by the maintainers. Nevertheless, the fault should still be fixed.
(Score: 0) by Anonymous Coward on Sunday December 21 2014, @06:37PM
Of course you wouldn't understand these problems. You're obviously not a system administrator.
No, the home user dicking around with Linux in a VM on his MacBook Pro won't notice these problems. But that person isn't really using Linux seriously, either.
But when you're managing systems that do serious work, often affecting a business' ability to operate and profit, you need log files that work so you can diagnose problems quickly and easily.
Every sideways scroll could waste time, resulting in thousands of dollars being lost each second.
Welcome to the real world, bud. We don't have time for your shitty binary log files fucking up the businesses we're trying to run.
(Score: 2) by DrMag on Sunday December 21 2014, @06:59PM
No need to be condescending and presumptuous.
No, I'm not a sysadmin by trade, but I've learned a good bit about it for managing my own linux server that I maintain. I do use linux seriously, and suggesting that the only people who do are full-time sysadmins is ridiculous.
And I would venture to guess that if your business is losing thousands of dollars because you had to scroll to the side for a second, your business is poorly managed to begin with. But perhaps that's me being condescending and presumptuous. =)
(Score: 0) by Anonymous Coward on Sunday December 21 2014, @07:23PM
A sysadmin contracted to solve a problem can be on contract to be payed hundreds of dollars per hour to fix that issue. Time spent scrolling to find a problem using this rather than the tried and tested tools like 'tail' and 'grep' can result in larger support bills.
(Score: 4, Insightful) by maxwell demon on Sunday December 21 2014, @07:28PM
So the tools don't allow to pipe the output to tail or grep? Then that is the real problem, not overly long log lines or too long logs.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 5, Insightful) by Arik on Sunday December 21 2014, @07:31PM
If laughter is the best medicine, who are the best doctors?
(Score: 2) by choose another one on Monday December 22 2014, @11:30AM
Is there something about "ForwardToSyslog=yes" that doesn't work for you - does that not produce log files in text ?
Seems to me that going from binary to text is the right way - it's trivial once you've decided on a format - going the other way (if you need the binary logs with full metadata, which I presume somebody does...) requires ensuring the text formattinging is reversible (quoting etc.) and writing a parser. Seems to me that binary-first is the right design if someone needs binary logs.
(Score: 1) by Arik on Monday December 22 2014, @11:49AM
Logs should be in text and converted to binary later if someone really needs binary, not the other way around, because this allows you to read the logs after a failed boot, and binary does not. You can convert from one to another to your hearts content, after you fix the problem and get the machine back on its feet.
If laughter is the best medicine, who are the best doctors?
(Score: 2) by fnj on Sunday December 21 2014, @07:42PM
Who told you that, or did you just make it up? Tools exist, but if THE DATA ITSELF IS UNWIELDY (e.g., the long lines mentioned), then dealing with the data using any tools, including tail and grep, is going to be unwieldy.
(Score: 2) by maxwell demon on Sunday December 21 2014, @08:03PM
Did you actually read the post I replied to?
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by fnj on Sunday December 21 2014, @08:22PM
Yes, actually, I did. What caused you to leap to "So the tools don't allow to pipe the output to tail or grep?" Because journalctl fully allows such piping.
(Score: 2) by maxwell demon on Sunday December 21 2014, @08:42PM
This:
The Tao of math: The numbers you can count are not the real numbers.
(Score: 0) by Anonymous Coward on Sunday December 21 2014, @11:45PM
journalctl -b -u ntpd.service | grep waking
Dec 19 00:51:57 wheatley ntpd[1765]: new interface(s) found: waking up resolver
Dec 19 00:51:59 wheatley ntpd[1765]: new interface(s) found: waking up resolver
Dec 19 17:07:45 wheatley ntpd[1765]: new interface(s) found: waking up resolver
Dec 20 00:19:19 wheatley ntpd[1765]: new interface(s) found: waking up resolver
Dec 20 07:53:49 wheatley ntpd[1765]: new interface(s) found: waking up resolver
Dec 20 07:53:51 wheatley ntpd[1765]: new interface(s) found: waking up resolver
Dec 21 17:49:27 wheatley ntpd[1765]: new interface(s) found: waking up resolver
I don't see the problem, here.
(Score: 0) by Anonymous Coward on Monday December 22 2014, @12:07AM
let me correct that for you.
journalctl -b -u ntpd.service | grep waking
journalctl: log corrupted.
because they refuse to fix the automatic log corruption if you turn on log rotation.
(Score: 2) by tempest on Monday December 22 2014, @12:11AM
Wow... Fuck.
(Score: 3, Insightful) by Anonymous Coward on Monday December 22 2014, @12:36AM
I'll highlight the problems for you:
The first problem I see is that you need to use journalctl. It should never be a requirement that a log file be filtered through some program before grep can work with it.
The second problem I see is that some of the lines are out of order. I don't know if the timestamps are wrong, or if they were logged in the wrong order, or if journalctl screwed them up, but clearly some are where they don't belong.
The third problem I see is the random binary data at the end of some of the lines. That probably should not be there as far as I can tell.
(Score: 2) by choose another one on Sunday December 21 2014, @09:58PM
> Of course you wouldn't understand these problems. You're obviously not a system administrator.
If you want to get into a pissing contest about number of systems managed then I seriously doubt you are going to win against the RedHat cloud admins, unless you admin for Google or Amazon, and for some reason they want systemd and binary logs. No, I don't _know_ why, I don't manage thousands of servers either, but i can take some good guesses.
Furthermore, binary logs can also be trivially turned into text (oh look, systemd even provides a configuration to do that) - the converse is not true.
> No, the home user dicking around with Linux in a VM on his MacBook Pro won't notice these problems. But that person isn't really using Linux seriously, either.
I guess the sysadmin managing a few tens of on-premise or colo-ed servers won't notice the kind of problems that the RedHat guys managing multiple data centres either, maybe that's because they aren't using Linux seriously ?
(Score: 4, Insightful) by Arik on Sunday December 21 2014, @11:11PM
Oh, look, that assumes that you booted correctly.
This is a system designed to fail just when it is needed most. When the system is working fine, you can export the logs that show it working fine, but when it fails and you really need to see those logs, you cannot.
If laughter is the best medicine, who are the best doctors?
(Score: 2) by darkfeline on Sunday December 21 2014, @10:09PM
Going from `cat foobar | tail` to `journalctl foobar | fold | tail` (Yes, useless cat, I know)
Wow, that was hard! Damn that systemd making everything harder!
Join the SDF Public Access UNIX System today!
(Score: 0) by Anonymous Coward on Monday December 22 2014, @12:12AM
Now try it when the log file is partially corrupted due to a bad disk. If you aren't a dumbass, you can still make out some of the log entries when things are done the proper way and text files are used. Now try doing that with journalctl. Oh, fuck you, there's no output at all! Fold and tail all you want, it's not going to do a fucking ounce of good. You'll sit there scratching your head, wondering where the hell the log entries are.
(Score: 1) by Anonymous Coward on Monday December 22 2014, @12:42AM
No disk problems needed. Just turn on log rotation and systemD corrupts the log's it's self 1/4 of the time. It's a known bug that they marked as 'won't fix'
(Score: 0) by Anonymous Coward on Monday December 22 2014, @12:57AM
LOL! Do you have the bug number for this?
(Score: 2) by cafebabe on Thursday December 25 2014, @08:29AM
I've been in a situation where downtime cost more than US$1 per second. Specifically, on rotation, I was solo 24 hour support for a 4,000 core renderfarm. I believe that RenderMan licences were US$3,000 per processor pair. So, that's US$6 million per year. I also believe that average salary for 500 people was about US$50,000. So that's about US$25 million per year. There was also nine other applications licenced for the renderfarm, electricity, hardware deprecation (not restricted to 3,000 harddisks) and business taxes.
In these circumstances, being deprived of grep and/or tail -f is going to cost money. In more serious circumstances, it could foreseeably cost lives.
1702845791×2