2021-07-22 12:14:55 ..
2021-07-29 11:57:17 UTC
2021-07-30 13:44:35 UTC --martyb
We always have a place for talented people, visit the Get Involved section on the wiki to see how you can make SoylentNews better.
Some of you will recall that I recently underwent several bouts of surgery and, despite your welcome comments and good wishes, your best guesses that I was having a sex change, having my breasts enhanced in either size or number, or receiving some fairly radical treatment for hemorrhoids, were all fairly wide of the mark. The surgery is now long past and I have made a reasonable recovery for someone of my age. But I do wish to thank you all for your comments because, almost without exception, they raised a smile when things were not going too well for me.
My wife suffers from a severe medical condition and I have been her full-time carer for over a decade now. Although I wish she had never developed the condition, I expend great effort, and also receive considerable personal satisfaction, in providing many hours of care each day so that she can remain in our home and we can continue our lives to the fullest extent possible. However, she has recently suffered from a deterioration in her condition. This was not unexpected but no-one could say when the next problems would affect her. But the result is that I now have to provide more support to her and my free time is reduced.
I joined this site at its inception and have enjoyed every minute of my time here. But I cannot dedicate the time that role of Editor-in-Chief (E-in-C) deserves and, several weeks ago, I made the difficult decision to stand down from the post. (I can hear the cheering from some in our community even here in France!) The fact that most of you will not have noticed any of this means that the transition has been successful. The entire editorial team (which is nowhere near as large as that phrase makes it sound!) has stepped up to the plate and has maintained the output as it was before, ably led by Martyb who has assumed the role as E-in-C in addition to his numerous other roles on this site. I am grateful to them for their efforts and support both during my time as E-in-C and more recently in their work in editing the stories that we read each day. Thanks guys, you do a tremendous job with relatively little recognition. I've asked the powers-that-be to increase your salaries by an appropriate percentage.[*] I am also grateful to the other folk who do so much in the background keeping this site on-line. You have all become good friends although we could be standing next to each other and wouldn't know it.
Equally important to the site's success are you - the community. You provide the submissions, the comments, the funding, and you are the reason that we have a site at all. I thank each and every one of you for your contribution; from the regular submitters, the ACs, the 'characters', and those of you who just visit to read the stories that we publish. If I have offended anyone then I apologise but being E-in-C has been likened to herding cats in the dark: an almost impossible task and one in which you are certain to make a few mistakes.
Hopefully, I will remain on the site as an editor making whatever contribution I can. But, for the next few months at least, that contribution will be minimal as I have to solve several practical problems on how to make our home function satisfactorily for us both. I trust that you will give Martyb and the team the same support in the future that I have been fortunate enough to enjoy over the past couple of years. Thank you.
[*] Clarification: This is an inside joke among the staff. Nobody here receives any kind of payment for their efforts on the site; we are strictly volunteers. So, a 20% raise on zero is... still zero. --martyb
As you probably have noticed, our site has been a bit sluggish lately.
We are aware of the issue and are developing plans for dealing with it. The primary issue lies in the database structure and contents. On-the-fly joins across multiple tables cause a performance hit which is exacerbated by the number of stories we have posted over the years (yes, it HAS been that long... YAY!). Further, stories which have been "archived" — allowing no further comments or moderation — are still sitting in the in-RAM DB and could be offloaded to disk for long-term access. Once offloaded, there would be much less data in the in-RAM database (queries against empty DBs tend to be pretty quick!) so this should result in improved responsiveness.
A complicating factor is that changing the structure on a live, replicated database would cause most every page load to 500 out. So the database has to be offlined and the code updated. That would likely entail on the order of the better part of a day. Obviously, shorter is better. On the other hand "The longest distance between two points is a short cut." We're aiming to do it right, the first time, and be done with it, rather than doing it quick-and-dirty, which usually ends up being not quick and quite dirty.
So, we ARE aware of the performance issues, are working towards a solution, and don't want to cause any more disruption than absolutely necessary.
We will give notice well in advance of taking any actions.
Gift subscriptions from ACs (Anonymous Cowards) are working again. If you're curious what was broken, have a look.
If you attempted to make a gift subscription as an AC since early to mid May, and received an error, please try again at: https://soylentnews.org/subscribe.pl (Or click the link in the "Navigation" Slashbox).
As is standing SN policy, martyb is to blame for anything warranting blame. =) You can go about your business. Move along.
We strive for openness about site operations here at SoylentNews. This story continues in that tradition.
tl;dr: We believe all services are now functioning properly and all issues have been attended to.
Problem Symptoms: I learned at 1212 UTC on Sunday 2018-08-19, that some pages on the site were returning 50x error codes. Sometimes, choosing 'back' in the browser and trying to resubmit the page would work. Oftentimes, it did not. We also started receiving reports of problems with our RSS and Atom feeds.
Read on past the break if you are interested in the steps taken to isolate and correct the problems.
Problem Isolation: As many of you may be aware, TheMightyBuzzard is away on vacation. I logged onto our IRC (Internet Relay Chat) Sunday morning (at 1212 UTC) when I saw chromas had posted (at 0224 UTC) there had been reports of problems with the RSS and Atom feeds we publish. I also noticed that one of our bots, Bender, was double-posting notifications of stories appearing on the site.
While I was investigating Bender's loquaciousness, chromas popped in to IRC (at 1252 UTC) and informed me that he was getting 502 and 503 error codes when he tried to load index.rss using a variety of browsers. I tried and found no issues when using Pale Moon. We tried a variety of wget requests from different servers. To our surprise we received incomplete replies which then caused multiple retries even when trying to access it from one of our SoylentNews servers. So, we surmised, it was probably not a communications issue.
At 1340 UTC, SemperOss (Our newest sysadmin staff member... Hi!) joined IRC and reported that he, too, was getting retry errors. Unfortunately, his account setup has not been completed leaving him with access to only one server (boron). Fortunately for us, he has a solid background in sysops. We combined his knowledge and experience with my access privileges and commenced to isolate the problem.
(Aside: If you have ever tried to isolate and debug a problem remotely, you know how frustrating it can be. SemperOss had to relay commands to me through IRC. I would pose questions until I was certain of the correct command syntax and intention. Next, I would issue the command and report back the results; again in IRC. On several occasions, chromas piped up with critical observations and suggestions — plus some much-needed humorous commentary! It could have been an exercise in frustration with worn patience and frazzled nerves. In reality, there was only professionalism as we pursued various possibilities and examined outcomes.)
From the fact we were receiving 50x errors, SemperOss surmised we were probably having a problem with nginx. We looked at the logs on sodium (which runs Ubuntu), one of our two load balancers, but nothing seemed out of the ordinary. Well, let's try the other load balancer, on magnesium (running Gentoo). Different directory structure, it seems, but we tracked down the log files and discovered that access.log had grown to over 8GB... and thus depleted all free space on /dev/root, the main file system of the machine.
That's not a good thing, but at least we finally knew what the problem was!
Problem Resolution: So, we renamed the original access.log file and created a new one for nginx to write to. Next up came a search for a box with sufficient space that we could copy the file to. SemperOss reported more than enough space free on boron. We had a few hiccups with ACLs and rsync, so moved the file to /tmp and tried rsync again, which resulted in the same ACL error messages. Grrrr. SemperOss suggested I try to pull the file over to /tmp on boron using scp. THAT worked! A few minutes later and the copy was completed. Yay!
But, we still had the original, over-sized log file to deal with. No problemo. I ssh'd back over to magnesium and did an rm of the copy of the access.log and... we were still at 100% usage. Doh! Needed to bounce nginx so it would release its hold on the file's inode so it could actually be cleaned up. Easy peasy; /etc/init.d/nginx restart and... voila! We were now back down to 67% in use.
Finally! Success! We're done, right?
Did you see what we missed? The backup copy of access.log was now sitting on boron on /tmp which means the next system restart would wipe it. So, a simple mv from /tmp to my ~/tmp and now the file was in a safe place.
By 1630 UTC, we had performed some checks with loads of various RSS and atom feeds and all seemed well. Were unable to reproduce 50x errors, either.
And we're still not done.
Why/how did the log file get so large in the first place? There was no log rotation in place for it on magnesium. That log file had entries going back to 2017-06-20. At the moment, we have more than sufficient space to allow us to wait until TMB returns from vacation. (We checked free disk space on all of our servers.) The plan is we will look over all log files and ensure rotation is in place so as to avoid a recurrence of this issue.
Problem Summary: We had a problem with an oversized logfile taking up all free space on one of our servers but believe we have fixed it and that all services are now functioning properly and all issues have been attended to.
Conclusion: Please join me in thanking chromas and SemperOss for all the time they gave up on a Sunday to isolate the problem and come up with a solution. Special mention to Fnord666 who we later learned silently lurked, but was willing to jump in had he sensed we needed any help. Thank-you for having our backs! Further, please join me in publicly welcoming SemperOss to the team and wishing him well on his efforts here!
Lastly, this is an all-volunteer, non-commercial site — nobody is paid anything for their efforts in support of the site. We are, therefore, entirely dependent on the community for financial support. Please take a moment and consider subscribing to SoylentNews, either with a new subscription, by extending an existing subscription, or making a gift subscription to someone else on the site. Any amount entered in the payment amount field, above and beyond the minimum amount is especially appreciated!