Two months ago, I polled the community for advice on the underlying operating system that should power SoylentNews (SN). After reading comments, and some recent experiences in my personal and professional life, we are migrating to Gentoo as the operating system of choice. As of right now, we've already migrated our development box, lithium, over, and using it as a shakedown to see how painful the overall migration will be. I'm pleased to report that, aside from varnish (an HTTP accelerator), the process went relatively smoothly.
For those who weren't here for the original article querying the community (linked above), let me recap the situation. At the time that I wrote that article, SN was mostly standardized on Ubuntu 14.04, with a single CentOS 6.7 box lurking in our midst. In the course of testing updates and other projects, the staff and myself felt that Ubuntu (and Debian) had lost a lot of the advantages that had made it a rock solid choice for the last three years of powering SN, combined with the fact that the upgrade process would not have been trivial due to the systemd migration.
Though greatly disliked by all of us, systemd being part of Ubuntu 16.04 LTS (Long Term Stable) was not a deal breaker. More importantly was the perception that the release lacked stability and we had a serious sense that the upgrade would be problematic. I felt it was time to reopen the scenario to see if we were better off migrating to a different distro, or abandoning Linux entirely.
As such the original article was penned to see what the community's feelings on the subject were. The overwhelming consensus was that I was not alone with my feelings on the latest LTS, and many thought FreeBSD would be a good choice for us. Ultimately, we decided to trial Gentoo over FreeBSD for four reasons
I'll break these first three item by item
FreeBSD is divided into two parts, the core system which has basic utilities, and the ports collection which has all the add on software like Apache and such. In theory, these two components can be updated independently of each other allowing a stable base while migrating to newer software versions with relative ease. Ports can be installed from binaries, or manually compiled to suit one's taste in a relatively automated fashion bringing together the best of a binary and source based distribution. On paper, it looks perfect.
In further research, I've found that port upgrades are fragile at the best of times. Unlike Debian's APT which has strict package dependence and shared library management, port upgrades are very much upgrade and pray and its possible to hose a system in this way. The situation is similar to using EPEL on CentOS, or using Slackware that port upgrades can leave artifacts, and there's often considerable manual intervention to keep things chugging around. This is compounded by the fact that the version of Kerberos we need is in the ports collection due to incompatibility between MIT Kerberos V (which we use) and Heimdal Kerberos which ships out of the box. For those of you familiar with Active Directory, this is roughly on par with the effort required to rebuild AD from scratch along side a pre-existing forest. This meant unless we rebuilt the entire Kerberos domain (a drastic and painful option to say the least) that we could easily break a node because a ports upgrade went sideways.
Furthermore, mixing binary and source ports also have several ways it can go wrong which is problematic. To ease our system maintenance burden, its long been a goal of the admin team to have rehash and its dependencies built and deployed through package management instead of the rather horrorifying script+rsync that we use now. While we could have technically achieved this with Ubuntu by running our own buildd (or using a PPA), the sheer amount of dependencies combined with the pain of rebuilding the world ultimately doomed this to the "would be nice" pile list of ideas.
On top of this, the split architecture of FreeBSD would also mean that upgrades are no longer "one command and done" as they are with Ubuntu and Debian. Instead it becomes a matter of determining what, if any, core system upgrades are available, deploy them, then deploy/rebuild ports as needed. None of this by itself would be deal-breaking, but when compounded with the other reasons it tipped me away from this option.
For any production website, having backups is the thing you must have, not the thing you wish you had. With the exception of our development box, all our systems are backed up to off-site storage on a machine called oxygen via rsnapshot nightly (and yes, we do test our backups). However, due to the way SoylentNews is situated, there is the possibility that if an attacker ever successfully breached SN, its possible they may be able to gain access to oxygen, and rm -rf / everything.
For this reason alone, we used two separate sets of backups in case of system failure or node compromise. As mentioned many times before, SN is hosted on a number of VPSes by Linode who I continue to highly recommend for anyone's VPS needs. One very useful and handy feature of Linode is that they offer snapshotting and node backups as part of their hosting services for reasonable prices, and critical system boxes are backed up with them as a second-level of defense. Unfortunately, Linode's backup services require that their system understand the underlying filesystem format used by the OS so they can snapshot it easily. As of writing, they do not support FreeBSD's UFS or ZFS. A migration would mean we'd have to sink additional costs in a new backup system to supplement oxygen.
I'm going to get flamed for this reason, but recent events have sort of drilled this home for me, both at SoylentNews and as my work as a freelancer. During the last round of security updates, I've been fighting to get several of CentOS's security issues fixed. Red Hat (and CentOS) offer ten year support for their products but in many ways it is the wrong approach to system stability and security. A real-world issue I ran into with CentOS's support is that they ship rather old issues of dovecot, a relatively popular IMAP server.
Now, in theory, as long as security patches are backported, this shouldn't be a problem. In practice however, it means you're essentially tethered to the security features as offered at the time of the release. For example, a good number of our users are likely familiar with the Logjam attack. The mitigation for Logjam is to regenerate DH parameters to larger sizes, and change to a non-common prime. Relatively straightforward, right?
Well, not so much. Dovecot 2.0 (which is what CentOS 6.7 ships with) doesn't allow for setting of custom DH parameters, or even tweaking anything beyond the most basic TLS settings. To a lesser extent, we also had this problem with Postfix (we can't disable client side negotiation). The solution in both cases is to upgrade. That would be great, if we could in-place upgrade CentOS, or reliably upgrade the RPMs without hosing YUM at a later date. In practice, we've been forced into doing a number of arcane hacks to get most of the survey tools to report anything better than a "C" grade, with the situation worsening as time goes on. Before people say "well that's a problem with dovecot", and not CentOS, you can't get OCSP stapling (which is an important security feature to help fix SSL's revocation system) with Apache out of the box. You need to either patch Apache 2.2 in place, or upgrade to 2.4.
This problem also has shown its head on Ubuntu. To Canonical's credit, their security team actually has gone through the work of mainlining newer security features in popular products; Ubuntu 14.04's Apache 2.2 supports OCSP stapling because they patch Apache in their binaries. However this practice only goes so far. Deploying CAA records to SoylentNews in the last round of tweaks was an exercise in frustration because only the most recent versions of BIND knows how to handle the CAA record type. Once again, we're in serious voodoo territory if we tried to upgrade BIND outside of a distro release.
This brings me to my final point: trying to follow industry best-practices falls apart if you can't easily update your stack. Release based distros at best (with Ubuntu) update once every six months, or once every year or so for longer term support from other distros. That's a very long time in the security world. Furthermore, each major upgrade is an event and a large time sink in and of itself. As such I've (grudgingly) come to the conclusion that if you actually want to have real security, you need to update frequently. Furthermore, by having smaller upgrades at a given time instead of them in one large pile, you have a better chance of not getting overwhelmed at those release points.
Gentoo ultimately won by being both rolling-release based, and source based. It meant that we could easily upgrade the entire stack (including rehash's special dependencies) as a single emerge world, and then deploy. It also edged out the other options by not forcing systemd on us (and OpenRC is an absolute pleasure to debug and maintain in comparison). We've also discussed the issue at length and have determined how we're going to approach the rather daunting task ahead of us.
The first step, which was already completed, was to migrate our development system over to Gentoo to get an idea of how much pain we're going to be in. This was accomplished by booting the system in rescue mode, moving "/" (i.e. the root of our filesystem) to "/old-rootfs", extracting a stage3, cooking the kernel, and rebooting. audioguy and TheMightyBuzzard worked out the correct set of USE flags for our environment, and I used the serial console to do the actual changeover. Aside from Varnish breaking, the migration was actually relatively smooth if time consuming. Right now, we're still wrestling with varnish, but after kicking MySQL cluster's init scripts and copying configs, it sputtered to life and dev.soylentnews.org popped back onto the internet.
The next steps is to create ebuilds for hesinfo (a Hesiod support tool that Gentoo doesn't ship in their hesiod package), and then to create a custom stage3 with our kernel config and base system with catalyst. We're going to work out the set of packages we need and configure lithium to work as a binary package source for portage. In effect, every package we need will be compiled once on lithium, then published via a private portage repository. Other machines will simply be able to emerge world and download the pre-tested and compiled binaries in one fell swoop keeping the software stack across SoylentNews consistent across the organization. As an added bonus, we can now easily migrate our custom set of compilation scripts to ebuilds and have sane package and dependency tracking for the entire site infrastructure.
Since most of the site infrastructure is fully redundant, we don't expect too much downtime or breakage as we begin migrating other boxes from Ubuntu. As usual, we'll keep the community apprised of our status, as well as if we need to schedule actual site downtime during this period. While some of us might thing we're insane, I will just note for the record we took a similarly drastic step of migrating to a IPv6-only backend two years ago in the name of administration sanity, and serving SN needs best. As always, I'll be reading and commenting below.
The only weirdness I have personally seen so far is that the filesystem check did not run properly on a dirty boot. Can't have a 5 minute Fsck hold back parallel booting, now can we? The Knoppix live DVD got rid of systemd because parallel boot slows things down considerably when seeking costs up to 300ms.
There was that bug with systemd re-implementing the rm utility...poorly [github.com]
Systemd is behaving badly by trying to do everything: with a development team that apparently does not take serious bugs seriously.
The advantage of "the UNIX way" is that you can rip out and re-implement troublesome components. With systemd, it is largely an "all or nothing" proposition.
Wow ... hadn't heard of THIS one lol. I guess bricking the machines of people who did rm -rf --no-preserve-root / wasn't stupid enough for them.
Do you know what exactly "R!" is, why it exists, etc.? I can't find too much documentation on this command -- if it is a command -- not unusual for SystemD's documentation to be poor, of course.
Good question. I had to go back to the bug report to find it.
R is used in configuration files to tell systemd what directories to routinely delete.
It is actually documented here: tmpfiles.d(5) [freedesktop.org]
Good find. Thanks for the info.
The bug you linked is an excellent example of the problem with their style and attitude. Reimplementing "rm" is NIH to an absurd and terrifying extreme.
They apparently needed a way to clean up files without using simple shell scripts....