Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by martyb on Friday May 21 2021, @12:25AM   Printer-friendly

As many of you noticed, we had a site crash today. From around 1300 until 2200 UTC (2021-05-20).

A HUGE thank you goes to mechanicjay who spent the whole time trying to get our ndb (cluster) working again. It's an uncommon configuration, which made recovery especially challenging... there's just not a lot of documentation about it on the web.

I reached out and got hold of The Mighty Buzzard on the phone. Then put him in touch with mechanicjay who got us back up and running using backups.

Unfortunately, we had to go way back until April 14 to get a working backup. (I don't know all the details, but it appears something went sideways on neon).

We're all wiped out right now. When we have rested and had a chance to discuss things, we'll post an update.

In the meantime, please join me in thanking mechanicjay and TMB for all they did to get us up and running again!

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1) by khallow on Sunday May 23 2021, @01:14PM (1 child)

    by khallow (3766) Subscriber Badge on Sunday May 23 2021, @01:14PM (#1137942) Journal

    And then you wonder why a system that has a five nines guarantee in a 2/2/2 setup doesn't even have two.

    I already know of real world systems - the Space Shuttle, that failed that hard. There's no wondering over here.

  • (Score: 0) by Anonymous Coward on Monday May 24 2021, @12:37AM

    by Anonymous Coward on Monday May 24 2021, @12:37AM (#1138082)

    Because operating on the edge of science and technology at the extremes of risk with single points of failure meeting Swiss Cheese model of reality is directly analogous to running an incorrectly deployed bog-standard cluster deployment that is failing to meet its uptime guarantees despite hundreds of thousands of deployments operating successfully in worse conditions when they do deploy it correctly.

    Right.