Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by on Sunday August 09 2020, @05:51AM   Printer-friendly
from the SNAFU dept.

The Mighty Buzzard writes:

Yeah, so, failure to babysit the db node that was scheduled for a reboot on the 5th resulted in a bit of database FUBAR that left us temporarily losing everything from then to now. Fortunately we had a backup less than six hours old, restored from it, and appear to be copacetic now. Except for the missing five hours and change.

I'd usually make some sort of dumb joke here but it was already four hours past my bedtime when I found out about the problem. My brain is no work good anymore. Fill in whatever dad joke or snark about getting a do-over for a change strikes your fancy.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Informative) by Runaway1956 on Sunday August 09 2020, @06:15AM (17 children)

    by Runaway1956 (2926) Subscriber Badge on Sunday August 09 2020, @06:15AM (#1033675) Journal

    Personally, it's a disappointment that things happened when they did. But, it looks good now! Only lost a few hours, as opposed to 3 days!!

    Starting Score:    1  point
    Moderation   +4  
       Insightful=1, Interesting=1, Informative=2, Total=4
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 5, Interesting) by martyb on Sunday August 09 2020, @08:48AM (16 children)

    by martyb (76) Subscriber Badge on Sunday August 09 2020, @08:48AM (#1033715) Journal

    Personally, it's a disappointment that things happened when they did. But, it looks good now! Only lost a few hours, as opposed to 3 days!!

    My sentiments, exactly.

    It's hard enough to get things going again. Doubly so when it's late at night and your body is screaming for sleep and concentration is required to avoid an even greater FUBAR.

    Thanks Buzz!!!!

    From my (editorial) perspective, there were a couple stories that I had pushed out that got lost. I know this thanks to "stale" browser tabs, I still had open. I don't know (yet)if any other stories got pushed out after those two and were subsequently lost.

    My "lost" stories have been recreated and pushed back out. Let's hope that was it.

    I do want to acknowledge lost comments, moderations, and potentially journal postings... that's a pain. Don't know what else to say. Truly sorry it happened but that won't bring things back. Ugh.

    It's way too early in the morning for me to be up and my brain is rebelling at being awake at this hour. /me wanders back to bed.

    --
    Wit is intellect, dancing.
    • (Score: 2, Interesting) by Anonymous Coward on Sunday August 09 2020, @09:43AM (10 children)

      by Anonymous Coward on Sunday August 09 2020, @09:43AM (#1033724)

      I don't know if this is confirmation bias, y'all being more public about this, or an actual increase, but you guys seem to keep having problems related the database processes as of late. Perhaps you should think about adding a watchdog daemon to your system, giving the database itself some maintenance and optimization, making sure everything is up to date, and checking your logs for some sort of attack on your system.

      • (Score: 5, Interesting) by The Mighty Buzzard on Sunday August 09 2020, @01:45PM (9 children)

        Funny how the db clustering system that's supposed to save us headaches has caused significant data loss twice now when boring old master/slave replication never did, ain't it? I'd have to do the math to see if occasionally restoring from backups has cost us more downtime than actually having to down the site when maintenance was required but I know for sure it's more annoying.

        --
        My rights don't end where your fear begins.
        • (Score: 0) by Anonymous Coward on Sunday August 09 2020, @07:18PM (4 children)

          by Anonymous Coward on Sunday August 09 2020, @07:18PM (#1033964)

          is there a post or posts that describe how everything is set up for SN? would make for an interesting read and other admins could weigh in with their 2 cents/$denomination.

          • (Score: 4, Funny) by The Mighty Buzzard on Monday August 10 2020, @04:31AM

            The other admins have better sense than to talk to users. I'm the dumb one.

            --
            My rights don't end where your fear begins.
          • (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @05:13AM (2 children)

            Oh, if you really want to know the detailed network setup, drop me an email to remind me (I don't care if it's a real address. Throwaway is fine.) and I'll post it up as a journal entry when I get time. I've been running on busy days and four hours or so of sleep a night for what seems like about thirty years though, so don't go thinking I've forgotten about it until it doesn't show up within a week.

            --
            My rights don't end where your fear begins.
            • (Score: 2) by martyb on Monday August 10 2020, @10:14PM (1 child)

              by martyb (76) Subscriber Badge on Monday August 10 2020, @10:14PM (#1034556) Journal

              Consider me interested. :)

              If I may suggest, it you follow through in writing up something... put it up on the Wiki and then link to that in your journal. (There's probably some stuff up there to start from, anyway!)

              /me wishes there were a way to auto-explore and document (textually and graphically) connections between servers and the processes that run on each one.

              --
              Wit is intellect, dancing.
        • (Score: 0) by Anonymous Coward on Sunday August 09 2020, @10:13PM (1 child)

          by Anonymous Coward on Sunday August 09 2020, @10:13PM (#1034046)

          Are you anywhere close to the load limit on a replication setup? And a two node cluster is basically worthless because you can't get a quorum with only two nodes. Another benefit of a replication scheme for you seems to be that in the current setup, failure requires manual intervention anyway. So you can STONITH with a watchdog and degrade read-only to the replica at the first sign of trouble until you sort it out or when under maintenance.

          • (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @04:56AM

            Two nodes is plenty for our purposes. Our network load vs. the bandwidth between our boxes makes replication essentially instant unless you have to completely restore a node, so mostly what we need is for the web frontends to not have to give a shit what db server they're dealing with in the event that one of them crashes. If we were looking to fail to read-only, we'd have stuck with master/slave. We consider read-only to be failure though.

            --
            My rights don't end where your fear begins.
        • (Score: 2) by gawdonblue on Monday August 10 2020, @02:40AM (1 child)

          by gawdonblue (412) on Monday August 10 2020, @02:40AM (#1034153)

          Yeah, in the last 3 years we've had to restart the DB at work twice because of "high-availability" clustering getting out of sync. These are the only fatal DB software failures that we have had.
          Seems the more dependencies you add the more brittle things become.

    • (Score: 0) by Anonymous Coward on Monday August 10 2020, @08:02PM (4 children)

      by Anonymous Coward on Monday August 10 2020, @08:02PM (#1034486)

      Boy, won't aristarchus be annoyed to find out that three of his stories were approved, as-is with no edits to them, but unfortunately were released in that window!

      • (Score: 2) by aristarchus on Tuesday August 11 2020, @01:53AM (3 children)

        by aristarchus (2645) on Tuesday August 11 2020, @01:53AM (#1034677) Journal

        Wouldn't be the first time! Eds want deniable plausibility.

        • (Score: 2) by The Mighty Buzzard on Tuesday August 11 2020, @02:48AM (2 children)

          by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Tuesday August 11 2020, @02:48AM (#1034694) Homepage Journal

          No, they want potable decantibility.

          --
          My rights don't end where your fear begins.
          • (Score: 2) by aristarchus on Tuesday August 11 2020, @05:54AM (1 child)

            by aristarchus (2645) on Tuesday August 11 2020, @05:54AM (#1034762) Journal

            No, they want aristarchus vacuity! Unfortunately, I have been here since the beginning, and contrary to khallow's wishes, will probably be here until the end. That is how it is, with us near immortals, having to live through humanity committing the same stupid mistakes again, and again, until in the school of ages, some one like the TMB comes to a realization of a cosmic perspective. Think, TMB, if you were not a coder, and not a former assassin in the US Army. How would things appear, and what would reality, and fundamental truths be? Just a question.