Stories
Slash Boxes
Comments

SoylentNews is people

posted by on Sunday August 09 2020, @05:51AM   Printer-friendly
from the SNAFU dept.

The Mighty Buzzard writes:

Yeah, so, failure to babysit the db node that was scheduled for a reboot on the 5th resulted in a bit of database FUBAR that left us temporarily losing everything from then to now. Fortunately we had a backup less than six hours old, restored from it, and appear to be copacetic now. Except for the missing five hours and change.

I'd usually make some sort of dumb joke here but it was already four hours past my bedtime when I found out about the problem. My brain is no work good anymore. Fill in whatever dad joke or snark about getting a do-over for a change strikes your fancy.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Monday August 10 2020, @08:49PM (4 children)

    by Anonymous Coward on Monday August 10 2020, @08:49PM (#1034518)

    You really only need one more. That way you still have a quorum in case one fails. With that in place, the database nodes should be able to survive a rolling restart without "babysitting" them the whole time. However, the one located on its own machine should be the arbitrator. It might also help to set up unused slots in your cluster for all types of nodes to allow easier expansion in the future, as you can just assign new nodes to those slots in the future instead of rolling the whole thing.

    I hope my attempts to help don't feel like I'm piling on or intentionally demeaning.

  • (Score: 2) by The Mighty Buzzard on Tuesday August 11 2020, @02:53AM (3 children)

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Tuesday August 11 2020, @02:53AM (#1034696) Homepage Journal

    Nah. One server assigned as arbitrator = single point of failure. Not having that is the entire reason we're clustered to begin with, so it's a non-starter.

    --
    My rights don't end where your fear begins.
    • (Score: 0) by Anonymous Coward on Tuesday August 11 2020, @07:16AM (2 children)

      by Anonymous Coward on Tuesday August 11 2020, @07:16AM (#1034791)

      If you didn't disable arbitration, they elect one. You want a management node that isn't on the data nodes to be the preferred arbitrator with the highest rank, but you can set one of the others as fallback and even include your SQL nodes if you are paranoid. That way you require at least two failures, or as many as four, to bring the entire cluster down without losing data or degrading the cluster to the point of complete failure. But you state that is a SPOF. This, the fact your other data node stayed up, and your comments lead me to believe that you disabled arbitration, which isn't a good idea as you have no protection from a split brain and other problems in that case, or I wasn't clear enough.

      • (Score: 2) by The Mighty Buzzard on Tuesday August 11 2020, @10:11PM (1 child)

        by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Tuesday August 11 2020, @10:11PM (#1035207) Homepage Journal

        Point being, if there is only one non-data management node and it fails or is otherwise down for some reason, we're right back where we are now. I thought not being where we are right now was the entire idea.

        --
        My rights don't end where your fear begins.
        • (Score: 0) by Anonymous Coward on Wednesday August 12 2020, @02:13AM

          by Anonymous Coward on Wednesday August 12 2020, @02:13AM (#1035350)

          No, if it fails, there is no arbitrator but you still have a quorum, which means they just elect a new one or continue without one. But, you also would have survived the explained incident automatically without the split brain that resulted had you had one. I could go on, but you are going to do whatever you want anyway, so I'm not going to bother anymore.