Stories
Slash Boxes
Comments

SoylentNews is people

posted by NCommander on Friday March 14 2014, @06:44AM   Printer-friendly
from the timebombs-are-exciting dept.
We had an hour or so or downtime today. After debugging, the root cause came from the SSL certificates we use to establish a database connection from the webserver to the actual DB. As a prelude GoLive, we migrated from unencrypted connections to encrypted connections as we have to cross the Linode internal LAN. In an attempt to improve data security, we generated a set of SSL certificates and used those to encrypt the MySQL connections. In the flurry of golive, no one thought to check the expiry date on said certificates. Out of the box, OpenSSL generates certificates with a one month expiry unless manually changed.

As you might expect, one month later, the certificates expired, and the database stopped accepting remote connections. New certificates were generated with a ten year expiration, and we continue to work towards better documenting our internal processes on the wiki to prevent this sort of thing from happening again. Apache, and slashd are running again, and we appear to be back to status-quo in terms of site operation.

A full incident report will be written up and posted to the wiki in the next few days.
 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Interesting) by crutchy on Friday March 14 2014, @07:16AM

    by crutchy (179) on Friday March 14 2014, @07:16AM (#16192) Homepage Journal

    Thanks Fluffeh.

    In that case, how many webservers is Soylent hosted on?

    Starting Score:    1  point
    Moderation   0  
       Interesting=1, Overrated=1, Total=2
    Extra 'Interesting' Modifier   0  

    Total Score:   1  
  • (Score: 3, Informative) by NCommander on Friday March 14 2014, @07:31AM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday March 14 2014, @07:31AM (#16197) Homepage Journal

    One at the moment. We've been meaning to address this now that we've got the linode situation resolved, though in this case, it won't have actually fixed crap since this was the wwwDB interchange that went snap. This setup was done mostly so we could relatively easy spin up new instances.

    --
    Still always moving
    • (Score: 0) by crutchy on Friday March 14 2014, @07:33AM

      by crutchy (179) on Friday March 14 2014, @07:33AM (#16199) Homepage Journal

      ah ok. cool. thanks NC :-)

    • (Score: 2) by Fluffeh on Friday March 14 2014, @10:57AM

      by Fluffeh (954) Subscriber Badge on Friday March 14 2014, @10:57AM (#16269) Journal

      Hey, been curious, I see your ID as 2, I would have thought that Barabas would have been 1, but he was 22. Is 1 a special ID for the code/site or is there a sneaky little monkey that got in before you and nabbed the number 1 ID in the database before you managed to complete registration?

      • (Score: 1) by NickFortune on Friday March 14 2014, @11:24AM

        by NickFortune (3267) on Friday March 14 2014, @11:24AM (#16281)

        Never mind one; who got zero? ;)

      • (Score: 1) by cmn32480 on Friday March 14 2014, @12:07PM

        by cmn32480 (443) <reversethis-{moc.liamg} {ta} {08423nmc}> on Friday March 14 2014, @12:07PM (#16292) Journal

        I believe that bastard Anonymous Coward cheated us all out of UID 1.

        --
        "It's a dog eat dog world, and I'm wearing Milkbone underwear" - Norm Peterson
      • (Score: 5, Informative) by NCommander on Friday March 14 2014, @12:09PM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday March 14 2014, @12:09PM (#16293) Homepage Journal

        It's actually due to a bitrot in Slash. When slash was coded, you could assign negative UIDs to autoincrement fields, so the AC was -1, and the first user (Taco) was 1. I'm not sure if zero was used in early slash. Around MySQL 3.x, that behavior became invalid, so the AC migrated to 666 on the other site, and auto_increment would start at 0. Somewhere in the 4.x days, auto_increment changed again, and starts at 1. The database layer used by Slash is "special" for all the wrong reasons, including efforts by VA Linux to port us to Oracle, some attempts to post slash to postgresql (which I wish worked, I much prefer that to MySQL).

        As of right now, the AC has UID #1, and I have the lowest registered UID (as it was created automagicly by install-slashsite). While it wasn't by design, I think its kinda fitting. The most important people on the site are the users, and thus the AC being UID #1 represents that view. In theory, you can have a UID of 0, but MySQL doesn't *really* like that (and causes issues if you don't dump/reimport the database with exactly the right options with mysqldump).

        I did try and modify slash to grab UID 1 for myself (2 is not my favourite number), but eventually decided it was for the best. Of the 9 single digit UIDs, one of them is a test account, and perhaps when we have our tenth anniversary, we'll auction off UID #6. Accounts 1-100 represent people who had access from before golive (I think there are a few past 100 that are included on this), but we had to get the moderation system tested, so we basically grabbed a ton of people from ##altslashdot to test.

        --
        Still always moving
        • (Score: 2, Funny) by Random2 on Friday March 14 2014, @03:00PM

          by Random2 (669) on Friday March 14 2014, @03:00PM (#16413)

          Wait, you're telling me I'd have to compete with AC for the preferred ID number? Well there goes that opportunity...

          --
          If only I registered 3 users earlier....
      • (Score: 3, Informative) by mechanicjay on Friday March 14 2014, @12:18PM

        Anonymous Coward is UID 1 and is setup as part of the initial Slash install. Though that changed at some point in Slash's history, as AC was referenced as UID 0 in some spots. This caused a bunch of issues when first trying to spin up SN as some modules expected AC to be UID 0, others expected UID 1. I honestly think the dev team could write a really interesting post about all the stuff that went boom and went fixed in the 10 or so days it took us to get the site running. Really, that was just one of the challenges that was met in trying to rehab an abandoned code base on a tight schedule.

        Disclaimer: Memory is fuzzy from those first days, someone should correct me if the above is wrong.
        --
        My VMS box beat up your Windows box.
        • (Score: 1, Insightful) by Anonymous Coward on Friday March 14 2014, @01:39PM

          by Anonymous Coward on Friday March 14 2014, @01:39PM (#16356)

          I honestly think the dev team could write a really interesting post about all the stuff that went boom and went fixed in the 10 or so days it took us to get the site running.

          The dev team SHOULD have documented everything as a matter of process. If that hasn't been done yet, then it must be done soon before memories fade. That information is not just of historical interest, but would also help future troubleshooting.