Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by martyb on Sunday July 24 2016, @12:36PM   Printer-friendly
from the ups-and-downs dept.

We just learned that our VM provider, Linode, had to perform some emergency reboots. Three of our servers have already been taken care of, but more are still to come. This led to our site being unavailable for a period of approximately an hour. Here is the reboot schedule:

Identified - Linode has received a Xen Security Advisory (XSA) that requires us to perform updates to our legacy Xen host servers. In order to apply the updates, hosts and the Linodes running on them must be rebooted. The XSAs will be publicly released by the Xen project team on July 26th. We must complete this maintenance before then.

Here's the schedule and status:

Server Purpose Maintenance Schedule (UTC)
lithium Development Completed
magnesium Frontend Proxy Completed
sodium Frontend Proxy Completed
fluorine Production Cluster Completed
helium Production Cluster Completed
hydrogen Production Cluster Completed
neon Production Cluster Completed
beryllium Services Cluster Completed
boron Services Cluster Completed

We apologize for any inconvenience.

[Update: It appears the second round of reboots has completed successfully, and, thanks to the advance notice, the site stayed up throughout. We anticipate that the site will still continue to operate normally through the last-scheduled reboot. Many thanks for your understanding and patience.]

[Update #2: We are taking advantage of a free offer from Linode, our hosting provider, to convert our VPSs (Virtual Private Servers) from Xen to KVM. The rebooting was required to repair a Xen vulnerability. As a bonus, the Xen to KVM conversion gives us a free upgrade to twice as much memory. The additional memory will provide much-needed additional headroom on our servers and possibly provide a performance improvement. Thanks to our redundancy, the changes should not be noticeable when we reboot/upgrade, except for the IRC and e-mail servers as they are single-hosted.]

[Update #3: Thanks to the tireless efforts of paulej72 well into the wee hours of this morning with able assistance by audioguy in straightening out some IP issues as well as Deucalion and TheMightyBuzzard providing guidance and support, all but two of our Xen servers have been upgraded to KVM. This free upgrade doubled the amount of memory available to our VMs, giving us some much-needed headroom. That leaves beryllium (IRC and email) and boron (DNS, Hesiod name service) as the two servers that have not been upgraded yet. Date/time is TBD.]

[Update #4: Boron will be reconfigured shortly, and then Bberyllium after that. Plan on an hour or two, though, obviously, we'll try to keep the downtime to a minimum!]

[Update #5: Boron's second upgrade for the ram sat in the queue for several hours, so Beryllium had to wait until paulej72 got up and finished it this morning (0830 EDT)]

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Friday July 22 2016, @04:45AM

    by Anonymous Coward on Friday July 22 2016, @04:45AM (#378339)

    So you don't have to log in and launch stuff manually in a panic. If you forget a startup script we'll enjoy some extra downtime. You know, for kids!

    • (Score: 0) by Anonymous Coward on Friday July 22 2016, @08:56PM

      by Anonymous Coward on Friday July 22 2016, @08:56PM (#378773)

      This is my systemd configuration on Gentoo. I've found that it makes Linode automated reboots a breeze.

      File: /etc/portage/package.mask


      # Lennart Poettering
      sys-apps/systemd
      net-misc/networkmanager
      media-sound/pulseaudio

    • (Score: 0) by Anonymous Coward on Friday July 22 2016, @10:00PM

      by Anonymous Coward on Friday July 22 2016, @10:00PM (#378814)

      I start so much stuff @reboot in root's crontab it's not even funny. OK fine the bare-bones webserver runs @reboot out of nobody's crontab instead. I wrote a lot of custom code and I'm too lazy to package it and I'm too lazy to write proper init scripts so I use cron @reboot instead.

  • (Score: 0) by Anonymous Coward on Friday July 22 2016, @04:50AM

    by Anonymous Coward on Friday July 22 2016, @04:50AM (#378340)

    All the larger Xen hosters got advanced warning; mine allowed some input to when the host was going to get booted.

  • (Score: 2) by Gravis on Friday July 22 2016, @05:01AM

    by Gravis (4596) on Friday July 22 2016, @05:01AM (#378341)

    Can thinking do? щ(゚Д゚щ)

    • (Score: 0) by Anonymous Coward on Friday July 22 2016, @12:57PM

      by Anonymous Coward on Friday July 22 2016, @12:57PM (#378469)

      noyes

  • (Score: 2) by mhajicek on Friday July 22 2016, @05:25AM

    by mhajicek (51) on Friday July 22 2016, @05:25AM (#378348)

    Is the funding goal bar on the main page broken, or is no one actually contributing?

    --
    The spacelike surfaces of time foliations can have a cusp at the surface of discontinuity. - P. Hajicek
    • (Score: -1, Troll) by Anonymous Coward on Friday July 22 2016, @06:17AM

      by Anonymous Coward on Friday July 22 2016, @06:17AM (#378357)

      I pledge never to contribute funding because I believe cyberbegging is immoral and also I secretly want to see this site die.

      • (Score: 0) by Anonymous Coward on Friday July 22 2016, @06:19AM

        by Anonymous Coward on Friday July 22 2016, @06:19AM (#378359)

        >secretly

        Sure, scum.

        • (Score: -1, Troll) by Anonymous Coward on Friday July 22 2016, @06:55AM

          by Anonymous Coward on Friday July 22 2016, @06:55AM (#378367)

          Oooops did I tap that out loud. I hate soystain so much. Stop it fingers stop tapping my secrets.

          • (Score: 0) by Anonymous Coward on Friday July 22 2016, @09:00PM

            by Anonymous Coward on Friday July 22 2016, @09:00PM (#378775)

            Kill yourself.

      • (Score: 0) by Anonymous Coward on Friday July 22 2016, @12:59PM

        by Anonymous Coward on Friday July 22 2016, @12:59PM (#378470)

        I see we found Buzzards scorned, ex, gay lover.

      • (Score: -1, Troll) by Anonymous Coward on Friday July 22 2016, @04:01PM

        by Anonymous Coward on Friday July 22 2016, @04:01PM (#378593)

        cyberfag

    • (Score: 4, Informative) by JNCF on Friday July 22 2016, @07:30AM

      by JNCF (4317) on Friday July 22 2016, @07:30AM (#378376) Journal

      It's not broken, it just updates less frequently than you're expecting.

      Effective: 2016-June to 2016-December

      Updated: 2016-07-03

      It will move in one big chunk to show recent contributions... when it feels like it.

    • (Score: 3, Informative) by The Mighty Buzzard on Friday July 22 2016, @10:28AM

      What JNCF said. mrcoolbp updates it something like once a month manually.

      --
      My rights don't end where your fear begins.
  • (Score: 3, Informative) by isostatic on Friday July 22 2016, @07:46AM

    by isostatic (365) on Friday July 22 2016, @07:46AM (#378384) Journal

    Why not migrate to KVM? From the linode email:

    Upgrading to KVM will allow you to avoid this maintenance entirely. You can use the “Upgrade to KVM” link in your Linode’s dashboard to move to KVM. Please note that KVM upgrades are not available in Tokyo at this time. More KVM upgrading information can be found here:

    • (Score: 0) by Anonymous Coward on Friday July 22 2016, @09:50AM

      by Anonymous Coward on Friday July 22 2016, @09:50AM (#378418)

      Switching to KVM changes a few things (like device paths) which could break the Linode.

    • (Score: 2) by The Mighty Buzzard on Friday July 22 2016, @11:23AM

      Free time and the desire to debug have both been short.

      --
      My rights don't end where your fear begins.
    • (Score: 2) by bziman on Friday July 22 2016, @02:52PM

      by bziman (3577) on Friday July 22 2016, @02:52PM (#378544)

      Thank you! I'm already on the KVM system, but I didn't know that they were offering free upgrades to 2x original memory, so I just logged in and got my free upgrade. Fantastic!

  • (Score: 0) by Anonymous Coward on Friday July 22 2016, @02:40PM

    by Anonymous Coward on Friday July 22 2016, @02:40PM (#378536)

    The new hotness: Dedicated hosting.

    For example:

    https://www.1and1.com/server-dedicated-tariff?__lf=Order-Product#stage-end [1and1.com]

    That L4i option would, for example, pummel your Linode mercilessly into submission (performance wise) while providing you full control over your uptime. You'd also be exempted from strange and often undiscovered classes of cross-VM vulns.

    Join the revolution!

    • (Score: 0) by Anonymous Coward on Friday July 22 2016, @03:04PM

      by Anonymous Coward on Friday July 22 2016, @03:04PM (#378550)

      And before anyone says "b-b-b-b-b-but look at the server list! we got almost ten dev/production/test/proxy machines!" I say: You wouldn't need em with the ole L4i. Trust me.

    • (Score: 2, Funny) by mechanicjay on Friday July 22 2016, @06:49PM

      Fuck that. We should be porting the whole thing to a series of docker containers.

      --
      My VMS box beat up your Windows box.
      • (Score: 0) by Anonymous Coward on Friday July 22 2016, @08:37PM

        by Anonymous Coward on Friday July 22 2016, @08:37PM (#378761)

        Docker is for brogrammers who want to impress the team. Real hackers work alone in the basement with naked bare metal.

    • (Score: 4, Insightful) by tibman on Friday July 22 2016, @07:55PM

      by tibman (134) Subscriber Badge on Friday July 22 2016, @07:55PM (#378746)

      Dedicated hosting is a throwback to ~2002. When a fan fails and your server goes down for two days waiting for an underpaid tech to pull it from the rack to fix it, lol.

      --
      SN won't survive on lurkers alone. Write comments.
      • (Score: 3, Interesting) by archfeld on Sunday July 24 2016, @04:22AM

        by archfeld (4650) <treboreel@live.com> on Sunday July 24 2016, @04:22AM (#379290) Journal

        Speaking as an underpaid tech I relish when something this basic and fixable happens. It sure beats the shell out of endless project meetings dominated by moron PM's and retarded managers. One of the greatest feelings is when I get an text message during one of those forever meetings that a simple disk or power supply or fan has failed and I can excuse my self and run gratefully into the cool quietness of the server farm or lab space to do some actual work.

        --
        For the NSA : Explosives, guns, assassination, conspiracy, primers, detonators, initiators, main charge, nuclear charge
  • (Score: 2) by opinionated_science on Friday July 22 2016, @04:01PM

    by opinionated_science (4031) on Friday July 22 2016, @04:01PM (#378594)

    some proactive security patching, rather than the media panic that is usually the opening volley...

    Of course, it makes you wonder how many exploits are still present but have not been found/released....

  • (Score: 3, Funny) by Runaway1956 on Friday July 22 2016, @04:03PM

    by Runaway1956 (2926) Subscriber Badge on Friday July 22 2016, @04:03PM (#378597) Journal

    ~blame

  • (Score: 2) by archfeld on Friday July 22 2016, @09:20PM

    by archfeld (4650) <treboreel@live.com> on Friday July 22 2016, @09:20PM (#378785) Journal

    for about an hour, haven't verified via the router logs yet but everything seems to be happy now.

    --
    For the NSA : Explosives, guns, assassination, conspiracy, primers, detonators, initiators, main charge, nuclear charge
  • (Score: 2) by maxwell demon on Friday July 22 2016, @09:29PM

    by maxwell demon (1608) on Friday July 22 2016, @09:29PM (#378794) Journal

    Hydrogen has a star after "completed". Why?

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by martyb on Friday July 22 2016, @10:23PM

      by martyb (76) Subscriber Badge on Friday July 22 2016, @10:23PM (#378822) Journal

      Hydrogen has a star after "completed". Why?

      Short answer: Sleep deprivation; fixed.

      Long answer: Intended to use the [*] to flag the last remaining server that had been slated to need rebooting... but for which it was no longer necessary. Two reasons: It got rebooted as we took advantage of a double-your-memory-for-free offer if we converted from Xen to KVM for hosting. To do that, required rebooting the server. But, once done, there was no longer a need to reboot the server to apply a fix for Xen, because it was not on Xen any more. In the course of updating the story to keep the community informed, accidentally left in the star and omitted the footnote. Star is now removed.

      Thanks for the hawk eyes!

      --
      Wit is intellect, dancing.
  • (Score: 1) by cngn on Friday July 22 2016, @10:51PM

    by cngn (1609) on Friday July 22 2016, @10:51PM (#378831)

    Keep up the good work guys, we need more selfless people on the net, and I'm grateful to you.

    If ever one of you is in Sydney where I am drop me a line we can always meet up and I'll buy the beer.

  • (Score: 0) by Anonymous Coward on Friday July 22 2016, @11:14PM

    by Anonymous Coward on Friday July 22 2016, @11:14PM (#378839)

    If you don't mind me asking, why were the reboots spread over several hours rather than doing them around the same time?

    • (Score: 2) by martyb on Saturday July 23 2016, @03:18AM

      by martyb (76) Subscriber Badge on Saturday July 23 2016, @03:18AM (#378895) Journal
      It's not within our control. Linode schedules the reboots of their bare metal machines on which their hypervisor runs and, thus, on which many many VMs run. So, they might reboot & reload a whole rack at one time, for all I know.

      Apparently, all of our servers (VMs), are not all on the same physical server / rack / whatever unit they reboot/reload. So some go down at the same time and others don't.

      Point is, they schedule and we try to work around it. We just didn't see the notice until after the reboots started. :(

      Hope that helps!

      --
      Wit is intellect, dancing.
      • (Score: 0) by Anonymous Coward on Saturday July 23 2016, @05:53PM

        by Anonymous Coward on Saturday July 23 2016, @05:53PM (#379100)

        Some VPS providers also spread out customers for a few reasons. They big two are for load balancing, like a compute server sharing space with a storage node, and to prevent random failures from taking big customers with multiple machines out at once, similar to why they don't reboot all machines at once, even if they theoretically could.

        • (Score: 2) by martyb on Monday July 25 2016, @03:37PM

          by martyb (76) Subscriber Badge on Monday July 25 2016, @03:37PM (#379853) Journal

          Excellent points! Thanks to our having redundancy in our configuration, with sufficient advance notice, we can deal with a server or two going down without issue. Were all of our VPSs hosted on the same physical machine, we'd lose that ability. It is actually to our benefit to have things spread out across multiple physical servers. Sure, there are degenerate cases, but there is at least the possibility that in many cases we can remain up even with some number of our servers down.

          --
          Wit is intellect, dancing.
  • (Score: 2) by Snotnose on Sunday July 24 2016, @04:06AM

    by Snotnose (1623) on Sunday July 24 2016, @04:06AM (#379287)

    OK, I'm a lowly embedded device driver and linux kernel guy who doesn't know squat about making webpages like this one. But it takes 9 servers to handle this site? WTF? What the hell are they all doing?

    Not trying to be sarcastic or anything, I honestly don't get it. I've written software using embedded linux that did more than I imagine this website does, but my stuff all ran on a single CPU with path to RAM and HDD storage.

    --
    Why shouldn't we judge a book by it's cover? It's got the author, title, and a summary of what the book's about.
    • (Score: 2) by NCommander on Sunday July 24 2016, @06:04AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Sunday July 24 2016, @06:04AM (#379307) Homepage Journal

      We actually get a lot of traffic and a lot of requests per traffic, and right now manage about 40-50% of capacity between the two of them. The main servers are 2x2 configuration, allowing us to offline one without taking the entire site. Magnesium is our current front-end load balancer between the two doing SSL termination, with sodium as a hot standby (these are Linode 1024s, or were). Lithium is an independent development site, and IRC/wiki/etc are isolated on seperate servers as rehash can't easily co-exist with other things on the same box.

      --
      Still always moving