Stories
Slash Boxes
Comments

SoylentNews is people

posted by NCommander on Friday March 14 2014, @06:44AM   Printer-friendly
from the timebombs-are-exciting dept.
We had an hour or so or downtime today. After debugging, the root cause came from the SSL certificates we use to establish a database connection from the webserver to the actual DB. As a prelude GoLive, we migrated from unencrypted connections to encrypted connections as we have to cross the Linode internal LAN. In an attempt to improve data security, we generated a set of SSL certificates and used those to encrypt the MySQL connections. In the flurry of golive, no one thought to check the expiry date on said certificates. Out of the box, OpenSSL generates certificates with a one month expiry unless manually changed.

As you might expect, one month later, the certificates expired, and the database stopped accepting remote connections. New certificates were generated with a ten year expiration, and we continue to work towards better documenting our internal processes on the wiki to prevent this sort of thing from happening again. Apache, and slashd are running again, and we appear to be back to status-quo in terms of site operation.

A full incident report will be written up and posted to the wiki in the next few days.
 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by GungnirSniper on Friday March 14 2014, @07:07AM

    by GungnirSniper (1671) on Friday March 14 2014, @07:07AM (#16188) Journal

    If NCommander hadn't been on IRC, what would have been the appropriate staff response?

    Starting Score:    1  point
    Moderation   +1  
       Insightful=1, Total=1
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 3, Funny) by crutchy on Friday March 14 2014, @07:11AM

    by crutchy (179) on Friday March 14 2014, @07:11AM (#16189) Homepage Journal

    If NCommander hadn't been on IRC, what would have been the appropriate staff response?

    bacon++

  • (Score: 5, Informative) by NCommander on Friday March 14 2014, @07:29AM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday March 14 2014, @07:29AM (#16196) Homepage Journal

    *cough*

    This was a problem identified during the last crisis (aka yesterday) that we don't have much in terms of "OH SHIT" plans, and we were discussing it on the staff mailing list when THIS happened. The honest truth was we got lucky. I literally woke up five minutes before the site soyled itself. Theoretically, all staff who are on dev and sys should be able to access to the boxes via SSH. The master linode account can only be accessed by myself and robind. In practice, this is only true for the web and services box, the database node is more locked down.

    Compounding the issue was slash gives a useless error message when DBIx::Password fails to connect (ASN: time is in the past!), which means unless you knew in advance we were using self-signed certificates, *and* you knew OpenSSL's expiry behavior, there was no obvious sign that this was the issue. Parts of the infrastructure are simply not documented properly, and we've got a staff effort to get everything slash related on the wiki, but we hadn't finished that by time this clusterfuck happened.

    We've been in crunch mode for the last week, and I've technically been on vacation (and in Asia on an itinerary that was mostly set in place before I was ever involved with SoylentNews). Things fell through the gaps, and we got bit in the ass. The only saving grace here is this happened during non-peak hours on the site, but it shouldn't have happened, and my failure to document the full slash setup during private beta compounded it. I return to the United States on Sunday, and then everything I know about Slash is going on the wiki so if I am unavailable, this *won't* be a problem.

    Once we have the emergency plan fully hammered, it will be on the wiki, and a post will go up here on the site so the jury can review it and poke the obvious holes in it.

    --
    Still always moving
    • (Score: 3, Funny) by frojack on Friday March 14 2014, @08:06AM

      by frojack (1554) on Friday March 14 2014, @08:06AM (#16216) Journal

      First time I genned my own cert (for a similar purpose) I made the same mistake.
      Luckily, I had three different sites to set up, discovered it while doing the third one.
      With less than a week left, I revisited the other sites, and bluffed my way back in for "security upgrades".

      --
      No, you are mistaken. I've always had this sig.
    • (Score: 1) by yarp on Friday March 14 2014, @08:24AM

      by yarp (2665) on Friday March 14 2014, @08:24AM (#16224)

      If it makes you feel any better, Steam was down for what seemed like frickin' ages yesterday.

    • (Score: 2) by juggs on Friday March 14 2014, @08:31AM

      by juggs (63) on Friday March 14 2014, @08:31AM (#16229) Journal

      In short - teething pains.

      I'm sure you guys will outgrow them, it's been a very fast journey down a very rough road you've done well to get to where you have so soon, applaud yourself for your successes so far rather than dwell on the negatives, just put in place methods to prevent them happening again and move on.

      Obligatory car analogy:-
      You're put into the driving seat of a WRC (World Rally Championship) car at the starting line of a 30Km gravel stage having never driven on a loose surface, or anything so feisty as a WRC car. The countdown is already at 1 second, your co is shouting something incoherent into your earpiece along the lines of "Go! Go! Go! And in 60 5 left then over crest 4 right then 20 2 right through gate then 100 4 left opening to 6 left 400 CAUTION jump into 1 right 20 and into 1 right over crest to 4 left"

      Well if you survived that without hitting a tree you did well as that was just 20 seconds into the stage. Reality is you already hit a tree, lots of them.

      I think the lack of trees hit so far is laudable.

      • (Score: 2) by NCommander on Friday March 14 2014, @08:33AM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday March 14 2014, @08:33AM (#16230) Homepage Journal

        To be honest, it was nice to have a crisis right now that was completely technical than the recent drama. Really says the state of recent events that I can say that with a straight face.

        --
        Still always moving
        • (Score: 2) by Reziac on Saturday March 15 2014, @05:04AM

          by Reziac (2489) on Saturday March 15 2014, @05:04AM (#16751) Homepage

          Is this why yesterday I got the "503 guru meditation varnish cache" gibberish?

          Very glad too that it was just a technical glitch and not anything Dreadful.

          --
          And there is no Alkibiades to come back and save us from ourselves.
          • (Score: 2) by NCommander on Saturday March 15 2014, @12:01PM

            by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Saturday March 15 2014, @12:01PM (#16810) Homepage Journal

            Yeah. Apache (due to mod_perl) shat itself when the database went away, so Varnish started complaining about guru meditation due to ENOBACKEND

            --
            Still always moving
    • (Score: 1, Funny) by Anonymous Coward on Friday March 14 2014, @09:54AM

      by Anonymous Coward on Friday March 14 2014, @09:54AM (#16252)

      before the site soyled itself.

      Ah, now I understand the name "SoylentNews" ... ;-)

    • (Score: 1) by Magic Oddball on Friday March 14 2014, @11:08AM

      by Magic Oddball (3847) on Friday March 14 2014, @11:08AM (#16271) Journal

      Yikes -- thank you for working on this through vacation, let alone while in a totally different part of the world from most (all?) of us.

      The thing to keep in mind during the "oh SHIT" moments is that most (if not all) of the visitors here have the basic knowledge needed to have realistic expectations. :-)

      Adding after a preview: any odd characters alongside spaces in my posts are because of some odd bug in Slashcode that evidently only my system sets off.

      • (Score: 4, Informative) by NCommander on Friday March 14 2014, @12:12PM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday March 14 2014, @12:12PM (#16294) Homepage Journal

        The distortion is due to Slash's blasted UTF-8 bitrot. We enabled it during testing, but it was buggy. There's no "off" switch for UTF-8, so my guess is whatever magic the other site never got committed to the public branch, as there's no filter that I can find in the public codebase. Fixing UTF-8 to work properly remains on the TODO, but at least it semi-works if you're careful.

        And yeah, my travel schedule was epically ill timed. The management handover happened while I was at a conference in Macau, so I've been running around like a chicken without a head.

        --
        Still always moving
        • (Score: 2) by Pslytely Psycho on Friday March 14 2014, @06:16PM

          by Pslytely Psycho (1218) on Friday March 14 2014, @06:16PM (#16547)

          "so I've been running around like a chicken without a head."

          Should we start calling you Mike then?

          so I've been running around like a chicken without a head.

          --
          Alex Jones lawyer inspires new TV series: CSI Moron Division.
        • (Score: 2) by zigbigadoorlue on Friday March 14 2014, @07:59PM

          by zigbigadoorlue (1092) on Friday March 14 2014, @07:59PM (#16605)

          Good gracious you all are doing a lot of good work for free (and on your vacation!). Do you have a full time job in addition to running this marvelous and confounded site? You all are doing an excellent job particularly as you are currently not getting payed for any of it. Thanks for all that you've given this community.

          • (Score: 2) by NCommander on Saturday March 15 2014, @02:04AM

            by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Saturday March 15 2014, @02:04AM (#16722) Homepage Journal

            I can't speak for anyone else, but I work full time in FOSS technologies. This vacation was setup before I was involved with SoylentNews, which has caused me a lot of grief in hindsight (but then again, hindsight is always 20/20). I've been trying to manage to site, my sanity, and a crazy travel schedule all at once, but I've cleared out my schedule until September to try and get the business side of things assembled.

            --
            Still always moving
  • (Score: 0) by Anonymous Coward on Friday March 14 2014, @01:24PM

    by Anonymous Coward on Friday March 14 2014, @01:24PM (#16343)

    Do the unwashed masses aka the 'audience' also get to use the Red Phone? I guess it could cause crying wolf problems but sometime it might come handy too.