Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Tuesday April 23 2019, @03:19PM   Printer-friendly
from the and-then-there-were-some dept.

We had a minor site hiccup today. All seems to be working, now.

We have always been open and upfront about the site, so in the interests of full disclosure here is a summary of the problem and steps taken to fix it.

tl;dr Comment counts shown for each story on the main page seem to have stopped getting updated since about midnight this morning; appears to be working now. Please accept our apologies for any who were inconvenienced.

Read on past the fold for details.

Problem: Comment counts on the main page showed "0" comments on recent stories, but opening a story showed the correct number of comments for it.

Actions Taken:

1.) Try bouncing the front-end servers to restart apache (This is a low-risk step that seems to fix a surprising number of issues).

No joy.

2.) Ask for help on the #dev channel on IRC.

Ncommander replied asking if slashd (an over-seeing daemon for the site) was running.

Looked through my log files and on the site wiki; determined that slashd should be running on server: fluorine

ps -AF | grep slashd | wc showed 32 processes

Ncommander suggested: killall -9 slashd

Try: killall -9 slashd

"No process found."

Inspection of output of PS -AF suggested this one-liner should do it:
$(ps -AF | grep slashd | awk '{print "kill -9 " $2}' )

Got most of the processes, but there still seemed to be some stragglers.

/etc/init.d/./slash stop
/etc/init.d/./slash restart

Conclusion:

Looked like it might have worked... reloaded main page... see updated comment counts!

Looks like all is working again.

It's a credit to the staff here that the site has been running so smoothly and without crashing or hiccups for... I can't remember when we last had an outage. Given that in the early days of the site we had maybe a few hours of uptime between crashes, we have come a long ways!

I'm going to assume this is one of those "have you tried turning it off and back on again" kind of problems, and unless the problem re-occurs, assume it is solved.

Need to hurry to get to work, so I apologize for the brevity of this posting.

--martyb


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by The Mighty Buzzard on Wednesday April 24 2019, @12:37PM (2 children)

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Wednesday April 24 2019, @12:37PM (#834302) Homepage Journal

    Just stopping restarting the apache/varnish processes. I even wrote a script named "bounce" so folks not comfortable with the system can do everything properly and in the correct order.

    The actual cause was slashd though, which is basically a silly-assed reinvention of a cron daemon by the the folks who wrote slashcode in the first place. It only takes about a minute to fix, even counting sshing in and such, if you've done it a time or three. If it gave us problems more than once a year or so, I'd look into fixing it. I was out fishing/camping this time or it wouldn't have caused poor martyb any headaches this time around.

    --
    My rights don't end where your fear begins.
    Starting Score:    1  point
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by RS3 on Wednesday April 24 2019, @02:57PM (1 child)

    by RS3 (6367) on Wednesday April 24 2019, @02:57PM (#834357)

    > I was out fishing/camping this time...

    Good! I need to make time for something outdoors more than an hour here and there.

    Someday the really big fish will reel YOU in. Then martyb, et al, will learn what "we're fu....." means!

    slashd is the systemd of slashcode? Someday I'm gonna download that slashcode and marvel...

    What would happen if your ran your "bounce" script in cron.weekly or monthly just for the heck of it? Or maybe the problem occurs randomly, well, due to an unknown problem and bounce needs run on demand. So maybe a cron script that scans maybe every 5 minutes for whatever the problem's symptoms are and calls (or does) bounce?