
from the Constants-aren't-and-variables-won't dept.
[2021-02-14 15:53:00 UTC: UPDATE added need to check apache log before doing a slash -restart]
We seem to have experienced some difficulties with the SoylentNews site.
I've noticed that both the number of hits and comments for each story do not seem to be updating.
Corrective measures taken:
- "Bounce" the Servers I doubted it would help, but it causes no harm to try it, so why not? And, as expected, it did not help, either.:This is my personal "bounce" script:
cat ~/bin/bounce#!/bin/bash
servers='hydrogen fluorine'
for server in ${servers} ; do echo Accessing: ${server} && rsh ${server} /home/bob/bin/bounce ; doneWhich, in turn, runs the following script on each of the above servers:
cat /home/bob/bin/bounce
#!/bin/bash
sudo /etc/init.d/varnish restart
sudo -u slash /srv/soylentnews.org/apache/bin/apachectl -k restart - Restart slash For those who are unaware, slash has its own internal implementation of what is, effectively, cron. It periodically fires off tasks that support the site's operations. But, this potentially has side-effects, so first need to check the apache error_log.
# Go to the appropriate server:
ssh fluorine
# Ensure the apache log is not showing issues: tail -f /srv/soylentnews.org/apache/logs/error_log
# Restart slash:
sudo /etc/init.d/slash restart
>> slashd slash has no PID file
>> Sleeping 10 seconds in a probably futile attempt to be clean: ok.
>> Starting slashd slash: ok PID = 3274NB: this failed to run to a successful conclusion when I originally tried it a few hour ago. I gave it one more try while writing this story... it seemed to run okay this time?!
Things appears to be running okay, now. Please reply in the comments if anything else is amiss. Alternatively, mention it in the #dev channel on IRC (Internet Relay Chat, or send an email to admin (at) soylentnews (dot) org.
We now return you to the ongoing discussion of: teco or ed?
(Score: 4, Funny) by DannyB on Sunday February 14 2021, @04:58AM (8 children)
Can moisture get in the servers?
Aircraft, for example, are designed to keep moisture out. Except aircraft in Spain. Because the rain in Spain stays mainly in the plane. This has been going on since the days when conductors had to walk the length of the aircraft and ask everyone for their tickets. They would ask a woman: "Hey, where's my fare, lady?"
Moisture in the servers might not only be water, but can be snot from a nose, or ethanol. (aka, alky haul) Someone could have hidden some in there back during prohibition.
Microsoft tried putting a small data center under water a few years back. But it didn't make their software work any better.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 2) by sjames on Sunday February 14 2021, @05:12AM (1 child)
The Elks on the other hand live up in the hills and in the spring they come down for their annual convention. It is very interesting to watch them come to the water hole. And you should see them run when they find that it's only a water hole. What they're looking for is an Elk-a-hole.
(Score: 3, Funny) by DannyB on Sunday February 14 2021, @01:53PM
People are complaining that fuel prices are back up to what they previously were.
They are confused about the difference between a backup and a restore.
Fuel prices have been restored to what they previously were.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 3, Insightful) by c0lo on Sunday February 14 2021, @06:57AM (5 children)
Way better than a naked conductor running above the train.
https://www.youtube.com/@ProfSteveKeen https://soylentnews.org/~MichaelDavidCrawford
(Score: 2) by DannyB on Sunday February 14 2021, @01:44PM (4 children)
I would have modded that shocking.
Discover the shocking secret behind why electrical cables have insulation.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 2) by c0lo on Sunday February 14 2021, @02:25PM (3 children)
Even if just medium voltage, I can't quite call 25kV moddely shocking, no.
https://www.youtube.com/@ProfSteveKeen https://soylentnews.org/~MichaelDavidCrawford
(Score: 2, Interesting) by fustakrakich on Sunday February 14 2021, @02:50PM (2 children)
I can generate 25kV by walking on a shaggy carpet.
La politica e i criminali sono la stessa cosa..
(Score: 2) by c0lo on Sunday February 14 2021, @10:25PM (1 child)
It still doesn't make the voltage moderately shocking, even if the charge amount is too small to be deadly.
https://www.youtube.com/@ProfSteveKeen https://soylentnews.org/~MichaelDavidCrawford
(Score: 2) by DannyB on Monday February 15 2021, @04:26PM
Cables running above trains should be required to have insulation. Some federal department will require this. For safety. Think of the children!
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 2) by RS3 on Sunday February 14 2021, @05:03AM (9 children)
Sorry, I don't know the SN slash code or SN admin, but does one of those things restart the database? I'd try restarting mysql (or whatever it is). And maybe do a db integrity check... IIRC there are 2 in a cluster or rsync or something? Not something I keep track of. Heck, filesystem check never hurts, but I don't know how you'd accomplish that.
(Score: 5, Informative) by The Mighty Buzzard on Sunday February 14 2021, @05:11AM (8 children)
The db is fine. Slashd has always been slightly persnickety though, so it needs restarted a couple times a year. When the comment counts aren't updating, it's always slashd. This was just martyb's first time trying to restart it and he wasn't aware that the init script had to be run as root and that the user slash (the user that all the SN-specific stuff on the web frontends runs as) doesn't have sudo perms of any sort.
My rights don't end where your fear begins.
(Score: 0) by Anonymous Coward on Sunday February 14 2021, @09:29AM (4 children)
If slash seems to require restarting a few times a year, does there seem to be any pattern of how many days before that seems to happen? Or is it completely random? Because if it is the former, maybe some preventative scheduled reboot would be in order. But you've probably already thought of that.
(Score: 0) by Anonymous Coward on Sunday February 14 2021, @11:04AM (3 children)
I've suggested various watchdogs in the past but the hiccups seem to be too random and far between to be worth the trouble.
(Score: 3, Interesting) by The Mighty Buzzard on Sunday February 14 2021, @11:32AM (2 children)
Bingo. The "a couple times a year" is on average. I don't remember having to restart it in 2019 at all aside from it being restarted on server reboots.
My rights don't end where your fear begins.
(Score: 2) by RS3 on Sunday February 14 2021, @12:06PM (1 child)
cron.monthly job? Maybe with a file that contains a countdown variable so the restart only happens every so many months?
Hate those kinds of workarounds though. If I had time I'd look into the code...
(Score: 3, Interesting) by The Mighty Buzzard on Sunday February 14 2021, @02:16PM
Don't know that the issue is time-based but it wouldn't hurt anything to restart it once in a while.
My rights don't end where your fear begins.
(Score: 3, Insightful) by martyb on Sunday February 14 2021, @02:26PM (2 children)
/me makes a mental note of this for future reference.
I think I may have done this once before, but it's certainly not something I am entirely comfortable with.
Was not aware that user slash "doesn't have sudo perms of any sort". And... now I know; thanks!
Wit is intellect, dancing. I'm too old to act my age. Life is too important to take myself seriously.
(Score: 3, Insightful) by The Mighty Buzzard on Sunday February 14 2021, @03:02PM (1 child)
Nod nod, we don't give slash sudo perms so we don't have to worry as much about it being an attack vector that could compromise the entire server. Not that it'd be terribly easy anyway being as the web frontends are behind an nginx reverse proxy that we're using as a load balancer. But any bit of extra security that doesn't slow stuff down or take too much effort is worth doing.
My rights don't end where your fear begins.
(Score: 0) by Anonymous Coward on Sunday February 14 2021, @09:49PM
Just one reverse proxy? Why not a double reverse?
(Score: 2) by DavePolaschek on Sunday February 14 2021, @12:23PM (5 children)
Evening MST yesterday, about half of the time I would try to load a page it would hand me an unstyled page, as if the CSS was failing to load (though I was on my iPad and didn’t really have a good way to debug). Don’t know if that helps at all or not, but it was a symptom...
(Score: 3, Informative) by drussell on Sunday February 14 2021, @01:05PM (1 child)
Ah, so it was still doing that into the evening? I guess nobody had fixed it still by then. I first saw it acting up at 4:something PST.
You posted while I was posting that post below this post. :)
(Score: 2) by DavePolaschek on Monday February 15 2021, @12:50PM
Well, that was 5:something MST, which I’d call evening. But then back in the days when dining out was something people did, we frequently had dinner before the blue-hair crowd, so my clock may be skewed.
(Score: 3, Informative) by martyb on Sunday February 14 2021, @02:37PM (2 children)
Yes, I'd seen a couple reports of "CSS failing to load" on IRC. Whenever I tried to reproduce it, all my attempts loaded successfully with no issues. I inquired of others there, and someone else confirmed things were loading okay for them, too. I'd seen that happen a few times before, so figured whatever went sideways had somehow righted itself and gotten back in line.
That said, thanks for mentioning it here. I'm starting to see a pattern. Every CSS "burp" does not necessarily lead to non-updating counts, *but* it does seem that every incident of non-updating counts was preceded by CSS issues. Can't prove a negative, of course, but I'll add this idea to my bag-o-tricks. Thanks again for the report!
Wit is intellect, dancing. I'm too old to act my age. Life is too important to take myself seriously.
(Score: 3, Informative) by The Mighty Buzzard on Sunday February 14 2021, @03:06PM (1 child)
I wouldn't necessarily connect those dots too quickly. For some reason dev has the slashd issues without ever having the CSS issues. It's not a mystery we couldn't look into and fix, it's just not a dire emergency if it's a thirty second fix less often than once a month.
My rights don't end where your fear begins.
(Score: 2) by martyb on Sunday February 14 2021, @04:13PM
Wit is intellect, dancing. I'm too old to act my age. Life is too important to take myself seriously.
(Score: 2) by drussell on Sunday February 14 2021, @12:27PM (4 children)
Was your need to bounce the system later still related to the issues/symptoms I was reporting like lack of CSS rendering off and on that I first mentioned at something like 4:30am yesterday?
(Score: 2) by The Mighty Buzzard on Sunday February 14 2021, @02:17PM (1 child)
Yeah, the missing CSS issue is cleared up if you restart varnish and apache on the web frontends.
My rights don't end where your fear begins.
(Score: 2) by martyb on Sunday February 14 2021, @02:49PM
For those following along at home, I believe that's the "bounce hydrogen and fluorine" which was mentioned above.
Wit is intellect, dancing. I'm too old to act my age. Life is too important to take myself seriously.
(Score: 2) by martyb on Sunday February 14 2021, @02:45PM
Yes, it is starting to look that way. See my earlier reply [soylentnews.org].
And, thanks so much for the earlier report on IRC! I encourage anyone who sees something strange about the site's behavior to check on IRC. There's often someone idling there who can help try to corroborate an issue. And, if necessary, try to rouse someone to help out.
teamwork++
Wit is intellect, dancing. I'm too old to act my age. Life is too important to take myself seriously.
(Score: 2) by krishnoid on Sunday February 14 2021, @08:03PM
Noting somewhere that the servers were bounced *and* that it didn't help is useful, with all the complaints about science not valuing negative results [plos.org]. It's also good practice to actually log it because per Adam Savage (and his ballistics expert, Alex Jason): "The difference between screwing around and science is writing it down."
(Score: 1, Funny) by Anonymous Coward on Sunday February 14 2021, @04:59PM (1 child)
Rajneesh in tech support says that works for his customers most of the time.
(Score: 2) by Fnord666 on Sunday February 14 2021, @05:38PM
We need a "laughing to keep from crying" moderation.
(Score: 0) by Anonymous Coward on Sunday February 14 2021, @05:39PM (2 children)
Just spotted a little typo. Instead of
you have to do
You'r' welcome
(Score: 2) by Fnord666 on Sunday February 14 2021, @07:27PM (1 child)
systemd is evil and is the daemon that shall not be named.
(Score: 3, Funny) by inertnet on Sunday February 14 2021, @10:41PM
Plus AC tried to start 'shlash', whatever that is. Maybe a Sean Connery version of slash?
(Score: 0) by Anonymous Coward on Monday February 15 2021, @12:16AM (1 child)
Soylentnews is just trying to create its own problems so that it can create its own stories and headlines to run with and break its own news stories. Now Soylentnews gets to do its own investigative journalism on itself and claim that it is the one breaking the news ;)
(j/k obviously).
(Score: 1) by Eratosthenes on Monday February 15 2021, @01:49AM
Later, this was generally seen as a catastrophic mistake.
(Score: 0) by Anonymous Coward on Monday February 15 2021, @05:18PM
I don't mean to sound grumpy, but how fucking difficult is it to document your interdependencies and write a runbook?
Figure it out! What depends upon what? Draw some diagrams. You built this thing and you can't draw a simple hierarchical diagram of its operation?
Basically, you are looking for dependencies.
I prefer to illustrate the dependencies as a 'stack', in case I am being managed by someone who understands wedding cakes better than they do software and hardware interdependencies.
At the bottom of the stack is the grounding. No reliable ground, no reliable power. Several times I've been involved in diagnosing computer problems in buildings built along the bayshore. Those salty marshes and tides play hell with electrical grounding.
Next is the power. You can't boot without power. It can't be spiky power and the power needs to be delivered as sine waves, not triangles or squares.
Next is the hardware. When is the last time you ran memtest86 on your servers? When is the last time you did a dd(1) of each hard drive in its entirety to assure yourself there were no bad blocks in use? Do you have diagnostic CDROMs for your servers? Do you ever use them? Memtest86 is the FIRST thing I do on ALL of my computers.
Make sure the computer isn't clogged with dust! Make sure the fans are running! Make sure the hard drives aren't making horrible noises, too.
Next, system resources. Make sure you aren't running out of disk space! Cleanup can and should be automated.
Next, the database. No point in starting a web server when your database is tits up. The database comes before the web server.
Next, the business logic (AKA 'middleware'). Are you using Java? Is the JVM running? Make sure the business logic is in communication with the database. Refer to your diagram. Identify test points and create tests for those test points. Automate it. You should be able to run a shell script and see that your business logic is working, that all of the required processes are in the process table and acting normally. Ideally it should be written as an /etc/init.d or /etc/rc.d script. Refer to other such scripts for tips on how to achieve quality start/stop scripts.
(Odds are good that the business logic is where the startups get complicated; that would indicate that a better understanding of your business logic's interdependencies is called for. It may also be appropriate to invest in a Nagios server, to monitor interdependencies in a graphical fashion. And some cron jobs, to make sure certain pieces are running and to restart them if they are not.)
Finally, the web server. If you're convinced you have correctly started your database and your business logic is working correctly and you have content to serve, then you can start your web server.
There are other processes I have not addressed such as DNS and user authentication. If you are using an LDAP database to manage users, for instance, and that creates another dependency, IE, you can't start processes until you can log in and you can't log in until the LDAP database is restarted, then you need to include those in your diagrams and startup sequences and runbooks.
Programmers adore complexity and messing with new versions but sysadmins adore consistency and reliability. When programmers are in charge of things, they tend to get horribly complicated, and when things go tits up, programmers tend to stand around and say "it SHOULD do this", relying upon some written document somewhere, whereas the sysadmin will observe, "it is NOT doing what is says it will do", and will happily rip it out and replace it with a small shell script, which is more reliable.
I haven't followed Soylent News' architectural design that closely but I hope there is a staging environment, and maybe a bug-tracking infrastructure.
My $0.02
The goasl of your documentation should be to make it simple for you to restart the system after a night of heavy drinking OR to walk a clever ten-year-old child through doing the same thing.