Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
Meta
posted by martyb on Monday January 04 2021, @05:10AM   Printer-friendly

Summary:
It wasn't just you; SoylentNews.org was down today (Sunday, 2021-01-03) for a few hours in the mid morning to early afternoon UTC. It seems to be back up and running, but there are some minor artifacts.

Background:
First sign was some CSS and Slashbox issues appearing on Saturday night. I was editing a story and when I tried to preview it, saw that the SlashBoxes that normally appeared on the LHS (Left-Hand Side) of the page were missing. A page refresh or two later, and things looked okay, again. A bit later I went to view a story and saw the same symptoms. This time a hard reload that ignored and cache on my system (Ctrl+F5) did the trick.

I popped onto IRC (Internet Relay Chat), reported these symptoms, and asking if anyone else was seeing the same thing. Received a couple confirmations.

Oh. Joy. And TMB (The Mighty Buzzard) still seemed to be away on vacation.

Oh well. Skipped on over to boron and ran a script to bounce the apache servers on fluorine and hydrogen. Popped back onto IRC, reported what I did, and asked if things were better. Got some affirmations. Yay!

Just in case, I hung around for another half hour or so to confirm the site was staying up and running okay. Looking good! After thanking everyone for their help, I wished everybody a good night and then headed to bed.

Sunday:
Shortly after I woke and attempted to visit the site, I was greeted by a message explaining the site was down due to DB issues. When I got back onto IRC, found that TMB was already hands-on. The site had crashed early in the morning. With the site already down, and it being Sunday morning, he decided to take advantage of the opportunity to make some backups and then do some maintenance work.

Status:
Site is back up, system loads seem back-to-normal, and things seem to be pretty much as they should be. Except... the Older Stuff slashbox that appears on the RHS (Right-Hand Side) of the main page seemed to be missing some entries. The newest entry as I write this is YouTube Class Action: Same IP Address Used To Upload 'Pirate' Movies and File DMCA Notice.

I suspect the missing entries will eventually start to stream in and repopulate the list.

tl;dr:
The DB crashed and took the site with it. TMB was soon on the scene and fixed the DB and did some other work. We're back up and running.

Thanks TMB!


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Tuesday January 05 2021, @02:00AM (1 child)

    by Anonymous Coward on Tuesday January 05 2021, @02:00AM (#1094805)

    Did they disconnect from each other, the mgmt nodes, the API nodes, or all of the above? What message did they give as to why they refused to reconnect? Did you have to roll the whole cluster? What did the interleaving of the logs look like? Your ndbinfo tables and SHOW output should have been more than enough to diagnose the various issues you have.

    That said, based on what little information you've given so far in this and the past few weeks post, it sounds like your various nodes are having trouble communicating with each other cleanly. It would roughly explain the problems over the past few weeks as much as they were described. You may want to monitor your traffic between systems for drops, jitter, and the rest. With that in hand, it might be helpful to escalate it to your VPS host if you record those logs as the true problem may be on their end.

  • (Score: 2) by The Mighty Buzzard on Wednesday January 06 2021, @06:58AM

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Wednesday January 06 2021, @06:58AM (#1095528) Homepage Journal

    The data nodes disconnected and either shut down or crashed with no messages other than one each telling me the node was disconnecting (within a minute of each other). Started them back up and they connected like they hadn't had any problems. The management servers kept chugging right along the whole time.

    The reason I gave such little information is because there was such little information. There was literally nothing else out of the ordinary in the logs aside from those two messages. When I have time I'll be spending many a morning poking everything with a stick but we're still a little ways off from being moved into the church. Thankfully the drywall guys are moving right along and we're going to be able to start painting this weekend. Hot water tank, building some temporary kitchen counters that'll be later repurposed into workbenches, outlets/switches/fixtures, trim, and hanging a few doors should just about do it after that.

    --
    My rights don't end where your fear begins.