Site Hiccup (Non-Update of Comment Counts on Main Page) -- Fixed!

Accepted submission by martyb at 2019-04-23 12:29:25 from the and then there were some dept.

We had a minor site hiccup today. All seems to be working, now.

We have always been open and upfront about the site, so in the interests of full disclosure here is a summary of the problem and steps taken to fix it.

tl;dr Comment counts shown for each story on the main page seem to have stopped getting updated since about midnight this morning; appears to be working now. Please accept our apologies for any who were inconvenienced.

Read on past the fold for details.

Problem: Comment counts on the main page showed "0" comments on recent stories, but opening a story showed the correct number of comments for it.

Actions Taken:

1.) Try bouncing the front-end servers to restart apache (This is a low-risk step that seems to fix a surprising number of issues).

No joy.

2.) Ask for help on the #dev channel on IRC.

Ncommander replied asking if slashd (an over-seeing daemon for the site) was running.

Looked through my log files and on the site wiki; determined that slashd should be running on server: fluorine

ps -AF | grep slashd | wc showed 32 processes

Ncommander suggested: killall -9 slashd

Try: killall -9 slashd

"No process found."

Inspection of output of PS -AF suggested this one-liner should do it:
$(ps -AF | grep slashd | awk '{print "kill -9 " $2}' )

Got most of the processes, but there still seemed to be some stragglers.

/etc/init.d/./slash stop
/etc/init.d/./slash restart

Looked like it might have worked... reloaded main page... see updated comment counts!

Looks like all is working again.

It's a credit to the staff here that the site has been running so smoothly and without crashing or hiccups for... I can't remember when we last had an outage. Given that in the early days of the site we had maybe a few hours of uptime between crashes, we have come a long ways!

I'm going to assume this is one of those "have you tried turning it off and back on again" kind of problems, and unless the problem re-occurs, assume it is solved.

Need to hurry to get to work, so I apologize for the brevity of this posting.


