Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 16 submissions in the queue.
Meta
posted by martyb on Thursday January 10 2019, @06:07PM   Printer-friendly
from the we're-baaaaaack! dept.

[Updated to correct time of neon CPU's spiking. --martyb]

We experienced an unexpected outage of the site this morning (20190110 00:15-07:45 UTC). At shortly after midnight approximately 0415 (UTC), CPU usage on neon suddenly pegged at 400% and things went downhill from there. Am not sure at this point what happened between 0015 and 0415.

Root cause is being investigated, but for now it seems the site is back up and working. Please let us know if you have any issues.

Note: you may need to have your browser ignore its cache (e.g. refresh with Ctrl+F5) and bring down everything fresh.

FWIW, system came back up after we rebooted neon (using the Linode manager page), and then bounced varnishd on fluorine and hydrogen (/home/bob/bin/bounce on each.)

Many thanks go to SemperOSS and cosurgi for problem determination and steps to rectify and FatPhil for his cheerleading!

[Update: TMB] So, the deal was that some unknown time in the past the ndb database node on helium had gone down. This wasn't a problem since we run a clustered database but nobody noticing it was. Then last night something caused neon to lose its cheese. Since it hosts the other node of the db, we had no db for a while. Bytram(martyb) has sysadmin powers for when unpleasant substances of various types hit the fan and thankfully he knew enough to get the neon db node back up and bounce apache/varnish on the web frontends, so kudos to him and all the folks who were backseat driving at the time due to lack of admin perms on their parts.

My brain's currently fried from going from asleep to OMGWTFBBQ without so much as a cup of coffee and a cigarette first, so I'm not going to dig into the root causes until it unfries itself but as a stopgap we have four more staff with shiny, new admin access that I'll be emergency bootcamping in the very near future. There's also going to be some monitoring reimplemented very soon so we notice this kind nonsense before it blows up in our faces again. I'll either update and bump this story or post a new one if we manage to figure out what the root causes were but at the moment the logs aren't being particularly helpful.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Thursday January 10 2019, @01:04PM

    by Anonymous Coward on Thursday January 10 2019, @01:04PM (#784478)

    Thank you for taking care of it!

  • (Score: 4, Insightful) by Anonymous Coward on Thursday January 10 2019, @01:24PM

    by Anonymous Coward on Thursday January 10 2019, @01:24PM (#784485)

    Please keep us informed on the root cause analysis, those meta post are my favorite.
    Thanks for the work

  • (Score: 5, Insightful) by realDonaldTrump on Thursday January 10 2019, @01:29PM (15 children)

    by realDonaldTrump (6614) on Thursday January 10 2019, @01:29PM (#784486) Homepage Journal

    And possibly many people thought so. But, it wasn't. Very important lesson!!!!

    The modern digital is something you can't count on. Something always goes wrong. And you almost have to be Einstein to figure it out. Crazy!

    • (Score: 4, Funny) by bzipitidoo on Thursday January 10 2019, @01:54PM (9 children)

      by bzipitidoo (4388) on Thursday January 10 2019, @01:54PM (#784491) Journal

      Russian hackers? If anyone would know, it's those who are colluding with them. How much money did they ask? What's SolyentNews worth?

      • (Score: 4, Funny) by ewk on Thursday January 10 2019, @02:40PM (7 children)

        by ewk (5923) on Thursday January 10 2019, @02:40PM (#784498)

        "What's SolyentNews worth?"

        Not sure, but SoylentNews is priceless :-)

        --
        I don't always react, but when I do, I do it on SoylentNews
        • (Score: 2) by Runaway1956 on Thursday January 10 2019, @03:36PM (3 children)

          by Runaway1956 (2926) Subscriber Badge on Thursday January 10 2019, @03:36PM (#784519) Journal

          I think it's worth a buck two-eighty.

          • (Score: 2) by bob_super on Thursday January 10 2019, @05:44PM (2 children)

            by bob_super (1357) on Thursday January 10 2019, @05:44PM (#784574)

            Lock Ness monster offers three-fifty

            All those meta posts recently, it's like website management is hard, or something. Haven't you considered yet that outsourcing it to some Indians would be better for the balance sheet and my stock ?

            • (Score: 2) by Runaway1956 on Thursday January 10 2019, @05:48PM (1 child)

              by Runaway1956 (2926) Subscriber Badge on Thursday January 10 2019, @05:48PM (#784576) Journal

              But - but - but - I thought Buzzard was Indian? Surely he's not faking it like certain congress critters?

              • (Score: 2) by maxwell demon on Thursday January 10 2019, @07:03PM

                by maxwell demon (1608) on Thursday January 10 2019, @07:03PM (#784615) Journal

                You mean, SoylentNews was outsourced to India? :-)

                --
                The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 2) by DeathMonkey on Thursday January 10 2019, @05:46PM (2 children)

          by DeathMonkey (1380) on Thursday January 10 2019, @05:46PM (#784575) Journal

          Not sure, but SoylentNews is priceless :-)

          Wait, what, I thought it was people? Sheesh, how am I supposed to eat priceless!

      • (Score: 3, Insightful) by Gaaark on Thursday January 10 2019, @04:41PM

        by Gaaark (41) on Thursday January 10 2019, @04:41PM (#784538) Journal

        What's it worth?
        Less than $3000 for those who haven't subscribed?!

        SUBSCRIBE!

        --
        --- Please remind me if I haven't been civil to you: I'm channeling MDC. ---Gaaark 2.0 ---
    • (Score: 2) by FatPhil on Thursday January 10 2019, @02:14PM

      by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Thursday January 10 2019, @02:14PM (#784492) Homepage
      Yeah, we've got better cyber than the Russians, but alas our best cyber was asleep. Or was he drugged? By the Russians?

      Anyway, on a serious note - remember that the IRC channels exist at times like these.
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 3, Touché) by DannyB on Thursday January 10 2019, @02:46PM (3 children)

      by DannyB (5839) Subscriber Badge on Thursday January 10 2019, @02:46PM (#784501) Journal

      Who needs Russian hackers when we've got systemd and Intel Management Engine?

      --
      To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
      • (Score: 0) by Anonymous Coward on Thursday January 10 2019, @03:28PM

        by Anonymous Coward on Thursday January 10 2019, @03:28PM (#784515)

        Devuan users on Raspberry Pi?

      • (Score: 0) by Anonymous Coward on Thursday January 10 2019, @08:33PM (1 child)

        by Anonymous Coward on Thursday January 10 2019, @08:33PM (#784660)

        They're running gentoo so I'm thinking openrc must have shit the bed while it was supposed to be managing processes.

  • (Score: 5, Touché) by pTamok on Thursday January 10 2019, @02:53PM

    by pTamok (3042) on Thursday January 10 2019, @02:53PM (#784504)

    I guess since you only got 71.3% funded in the last 6 months, you only need to be up 71.3% of the time, so you are still ahead of the game...

    Thank you for sorting things out and continuing with a poorly rewarded effort. I appreciate it.

  • (Score: 1, Funny) by Anonymous Coward on Thursday January 10 2019, @04:13PM (2 children)

    by Anonymous Coward on Thursday January 10 2019, @04:13PM (#784531)

    Anything to do with your "late X-mas present"? ;-)

    • (Score: 3, Informative) by The Mighty Buzzard on Thursday January 10 2019, @05:25PM (1 child)

      by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Thursday January 10 2019, @05:25PM (#784564) Homepage Journal

      Nah, I'll update the story shortly as to what we've tracked down so far. Right now my brain hurts and my cup and coffee pot are both empty though. I'll get to it after those are all resolved.

      --
      My rights don't end where your fear begins.
      • (Score: 2) by edIII on Thursday January 10 2019, @11:33PM

        by edIII (791) on Thursday January 10 2019, @11:33PM (#784723)

        Totally understandable :)

        Thank you for helping.

        --
        Technically, lunchtime is at any moment. It's just a wave function.
  • (Score: 4, Funny) by RandomFactor on Thursday January 10 2019, @06:40PM

    by RandomFactor (3682) Subscriber Badge on Thursday January 10 2019, @06:40PM (#784607) Journal

    it was the Index of the OPID that caused it!

    --
    В «Правде» нет известий, в «Известиях» нет правды
  • (Score: 1, Insightful) by Anonymous Coward on Thursday January 10 2019, @07:25PM (3 children)

    by Anonymous Coward on Thursday January 10 2019, @07:25PM (#784622)

    If a service is down that shouldn't be, you should have been notified. You may want to set up a nagios service to send you messages (with a critical path outside of the infrastructure you are monitoring).

    • (Score: 2) by The Mighty Buzzard on Thursday January 10 2019, @07:39PM (2 children)

      by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Thursday January 10 2019, @07:39PM (#784631) Homepage Journal

      We used to have monitoring software (icinga). No idea why we don't anymore. I vaguely remember hearing paulej72 and NCommander bitching about it being a pain in the ass and giving up on it but not the specifics. By the time I got roped into doing any admin work at all it was long gone and I was the junior most admin.

      --
      My rights don't end where your fear begins.
      • (Score: 0) by Anonymous Coward on Friday January 11 2019, @02:04AM (1 child)

        by Anonymous Coward on Friday January 11 2019, @02:04AM (#784796)

        Whatever happened to Ncommander?

  • (Score: 3, Touché) by DannyB on Thursday January 10 2019, @07:28PM

    by DannyB (5839) Subscriber Badge on Thursday January 10 2019, @07:28PM (#784623) Journal

    The shop will be adopting the latest hipster trend of FOP.

    (Failure Oriented Programming, or maybe Fear Oriented Programming?)

    Fortunately, several new frameworks were hastily written, and one of them will be randomly selected by management.

    The proof of value in these new frameworks and this new methodology is how quickly and efficiently a single developer can create a Hello World website. The default configuration and plumbing takes care of most of the work. So it must be good.

    --
    To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
(1)