Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

Meta: Sigh

posted by on Sunday August 09 2020, @05:51AM

from the SNAFU dept.

The Mighty Buzzard writes:

Yeah, so, failure to babysit the db node that was scheduled for a reboot on the 5th resulted in a bit of database FUBAR that left us temporarily losing everything from then to now. Fortunately we had a backup less than six hours old, restored from it, and appear to be copacetic now. Except for the missing five hours and change.

I'd usually make some sort of dumb joke here but it was already four hours past my bedtime when I found out about the problem. My brain is no work good anymore. Fill in whatever dad joke or snark about getting a do-over for a change strikes your fancy.

This discussion has been archived. No new comments can be posted.

Meta: Sigh | Log In/Create an Account | Top | 114 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 2, Interesting) by Anonymous Coward on Sunday August 09 2020, @09:43AM (10 children)

by Anonymous Coward on Sunday August 09 2020, @09:43AM (#1033724)

I don't know if this is confirmation bias, y'all being more public about this, or an actual increase, but you guys seem to keep having problems related the database processes as of late. Perhaps you should think about adding a watchdog daemon to your system, giving the database itself some maintenance and optimization, making sure everything is up to date, and checking your logs for some sort of attack on your system.

Parent

Starting Score:	0		points
Moderation		+2
Insightful=1, Interesting=1, Total=2
Extra 'Interesting' Modifier		0

Total Score:		2

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 5, Interesting) by The Mighty Buzzard on Sunday August 09 2020, @01:45PM (9 children)

by The Mighty Buzzard (18) <themightybuzzard@proton.me> on Sunday August 09 2020, @01:45PM (#1033802) Homepage Journal

Funny how the db clustering system that's supposed to save us headaches has caused significant data loss twice now when boring old master/slave replication never did, ain't it? I'd have to do the math to see if occasionally restoring from backups has cost us more downtime than actually having to down the site when maintenance was required but I know for sure it's more annoying.

--
My rights don't end where your fear begins.

Parent
- Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 0) by Anonymous Coward on Sunday August 09 2020, @07:18PM (4 children)
  
  by Anonymous Coward on Sunday August 09 2020, @07:18PM (#1033964)
  
  is there a post or posts that describe how everything is set up for SN? would make for an interesting read and other admins could weigh in with their 2 cents/$denomination.
  
  Parent
  - Re:Thanks Buzzard! (Score: 4, Funny) by The Mighty Buzzard on Monday August 10 2020, @04:31AM
    
    by The Mighty Buzzard (18) <themightybuzzard@proton.me> on Monday August 10 2020, @04:31AM (#1034191) Homepage Journal
    
    The other admins have better sense than to talk to users. I'm the dumb one.
    
    --
    My rights don't end where your fear begins.
    
    Parent
  - Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @05:13AM (2 children)
    
    by The Mighty Buzzard (18) <themightybuzzard@proton.me> on Monday August 10 2020, @05:13AM (#1034212) Homepage Journal
    
    Oh, if you really want to know the detailed network setup, drop me an email to remind me (I don't care if it's a real address. Throwaway is fine.) and I'll post it up as a journal entry when I get time. I've been running on busy days and four hours or so of sleep a night for what seems like about thirty years though, so don't go thinking I've forgotten about it until it doesn't show up within a week.
    
    --
    My rights don't end where your fear begins.
    
    Parent
    - detailed network setup? detailed network setup? (Score: 2) by martyb on Monday August 10 2020, @10:14PM (1 child)
      
      by martyb (76) on Monday August 10 2020, @10:14PM (#1034556) Journal
      
      Consider me interested. :)
      If I may suggest, it you follow through in writing up something... put it up on the Wiki and then link to that in your journal. (There's probably some stuff up there to start from, anyway!)
      /me wishes there were a way to auto-explore and document (textually and graphically) connections between servers and the processes that run on each one.
      
      --
      Wit is intellect, dancing.
      
      Parent
      - Re:detailed network setup? (Score: 2) by The Mighty Buzzard on Tuesday August 11 2020, @02:47AM
        
        by The Mighty Buzzard (18) <themightybuzzard@proton.me> on Tuesday August 11 2020, @02:47AM (#1034693) Homepage Journal
        
        It's already on the wiki [soylentnews.org]. It's not entirely up to date but I'm not putting Aluminum up there until it's actually in service doing things.
        
        --
        My rights don't end where your fear begins.
        
        Parent
- Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 0) by Anonymous Coward on Sunday August 09 2020, @10:13PM (1 child)
  
  by Anonymous Coward on Sunday August 09 2020, @10:13PM (#1034046)
  
  Are you anywhere close to the load limit on a replication setup? And a two node cluster is basically worthless because you can't get a quorum with only two nodes. Another benefit of a replication scheme for you seems to be that in the current setup, failure requires manual intervention anyway. So you can STONITH with a watchdog and degrade read-only to the replica at the first sign of trouble until you sort it out or when under maintenance.
  
  Parent
  - Re:Thanks Buzzard! (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @04:56AM
    
    by The Mighty Buzzard (18) <themightybuzzard@proton.me> on Monday August 10 2020, @04:56AM (#1034205) Homepage Journal
    
    Two nodes is plenty for our purposes. Our network load vs. the bandwidth between our boxes makes replication essentially instant unless you have to completely restore a node, so mostly what we need is for the web frontends to not have to give a shit what db server they're dealing with in the event that one of them crashes. If we were looking to fail to read-only, we'd have stuck with master/slave. We consider read-only to be failure though.
    
    --
    My rights don't end where your fear begins.
    
    Parent
- Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 2) by gawdonblue on Monday August 10 2020, @02:40AM (1 child)
  
  by gawdonblue (412) on Monday August 10 2020, @02:40AM (#1034153)
  
  Yeah, in the last 3 years we've had to restart the DB at work twice because of "high-availability" clustering getting out of sync. These are the only fatal DB software failures that we have had.
  Seems the more dependencies you add the more brittle things become.
  
  Parent
  - Re:Thanks Buzzard! (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @04:58AM
    
    by The Mighty Buzzard (18) <themightybuzzard@proton.me> on Monday August 10 2020, @04:58AM (#1034206) Homepage Journal
    
    Yeah, I'm sure there must be cluster ninjas out there that know every pitfall ahead of time and never have these problems but there aren't any on staff here.
    
    --
    My rights don't end where your fear begins.
    
    Parent

Moderator Help

I'm mentally OVERDRAWN! What's that SIGNPOST up ahead? Where's ROD STERLING when you really need him?

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Meta: Sigh

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 2, Interesting) by Anonymous Coward on Sunday August 09 2020, @09:43AM (10 children)

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 5, Interesting) by The Mighty Buzzard on Sunday August 09 2020, @01:45PM (9 children)

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 0) by Anonymous Coward on Sunday August 09 2020, @07:18PM (4 children)

Re:Thanks Buzzard! (Score: 4, Funny) by The Mighty Buzzard on Monday August 10 2020, @04:31AM

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @05:13AM (2 children)

detailed network setup? detailed network setup? (Score: 2) by martyb on Monday August 10 2020, @10:14PM (1 child)

Re:detailed network setup? (Score: 2) by The Mighty Buzzard on Tuesday August 11 2020, @02:47AM

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 0) by Anonymous Coward on Sunday August 09 2020, @10:13PM (1 child)

Re:Thanks Buzzard! (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @04:56AM

Re:Thanks Buzzard! Re:Thanks Buzzard! (Score: 2) by gawdonblue on Monday August 10 2020, @02:40AM (1 child)

Re:Thanks Buzzard! (Score: 2) by The Mighty Buzzard on Monday August 10 2020, @04:58AM