2020-07-01 00:00:00 ..
2020-08-02 18:26:48 UTC
2020-08-03 12:59:18 UTC
We always have a place for talented people, visit the Get Involved section on the wiki to see how you can make SoylentNews better.
Gift subscriptions from ACs (Anonymous Cowards) are working again. If you're curious what was broken, have a look.
If you attempted to make a gift subscription as an AC since early to mid May, and received an error, please try again at: https://soylentnews.org/subscribe.pl (Or click the link in the "Navigation" Slashbox).
As is standing SN policy, martyb is to blame for anything warranting blame. =) You can go about your business. Move along.
We strive for openness about site operations here at SoylentNews. This story continues in that tradition.
tl;dr: We believe all services are now functioning properly and all issues have been attended to.
Problem Symptoms: I learned at 1212 UTC on Sunday 2018-08-19, that some pages on the site were returning 50x error codes. Sometimes, choosing 'back' in the browser and trying to resubmit the page would work. Oftentimes, it did not. We also started receiving reports of problems with our RSS and Atom feeds.
Read on past the break if you are interested in the steps taken to isolate and correct the problems.
Problem Isolation: As many of you may be aware, TheMightyBuzzard is away on vacation. I logged onto our IRC (Internet Relay Chat) Sunday morning (at 1212 UTC) when I saw chromas had posted (at 0224 UTC) there had been reports of problems with the RSS and Atom feeds we publish. I also noticed that one of our bots, Bender, was double-posting notifications of stories appearing on the site.
While I was investigating Bender's loquaciousness, chromas popped in to IRC (at 1252 UTC) and informed me that he was getting 502 and 503 error codes when he tried to load index.rss using a variety of browsers. I tried and found no issues when using Pale Moon. We tried a variety of wget requests from different servers. To our surprise we received incomplete replies which then caused multiple retries even when trying to access it from one of our SoylentNews servers. So, we surmised, it was probably not a communications issue.
At 1340 UTC, SemperOss (Our newest sysadmin staff member... Hi!) joined IRC and reported that he, too, was getting retry errors. Unfortunately, his account setup has not been completed leaving him with access to only one server (boron). Fortunately for us, he has a solid background in sysops. We combined his knowledge and experience with my access privileges and commenced to isolate the problem.
(Aside: If you have ever tried to isolate and debug a problem remotely, you know how frustrating it can be. SemperOss had to relay commands to me through IRC. I would pose questions until I was certain of the correct command syntax and intention. Next, I would issue the command and report back the results; again in IRC. On several occasions, chromas piped up with critical observations and suggestions — plus some much-needed humorous commentary! It could have been an exercise in frustration with worn patience and frazzled nerves. In reality, there was only professionalism as we pursued various possibilities and examined outcomes.)
From the fact we were receiving 50x errors, SemperOss surmised we were probably having a problem with nginx. We looked at the logs on sodium (which runs Ubuntu), one of our two load balancers, but nothing seemed out of the ordinary. Well, let's try the other load balancer, on magnesium (running Gentoo). Different directory structure, it seems, but we tracked down the log files and discovered that access.log had grown to over 8GB... and thus depleted all free space on /dev/root, the main file system of the machine.
That's not a good thing, but at least we finally knew what the problem was!
Problem Resolution: So, we renamed the original access.log file and created a new one for nginx to write to. Next up came a search for a box with sufficient space that we could copy the file to. SemperOss reported more than enough space free on boron. We had a few hiccups with ACLs and rsync, so moved the file to /tmp and tried rsync again, which resulted in the same ACL error messages. Grrrr. SemperOss suggested I try to pull the file over to /tmp on boron using scp. THAT worked! A few minutes later and the copy was completed. Yay!
But, we still had the original, over-sized log file to deal with. No problemo. I ssh'd back over to magnesium and did an rm of the copy of the access.log and... we were still at 100% usage. Doh! Needed to bounce nginx so it would release its hold on the file's inode so it could actually be cleaned up. Easy peasy; /etc/init.d/nginx restart and... voila! We were now back down to 67% in use.
Finally! Success! We're done, right?
Did you see what we missed? The backup copy of access.log was now sitting on boron on /tmp which means the next system restart would wipe it. So, a simple mv from /tmp to my ~/tmp and now the file was in a safe place.
By 1630 UTC, we had performed some checks with loads of various RSS and atom feeds and all seemed well. Were unable to reproduce 50x errors, either.
And we're still not done.
Why/how did the log file get so large in the first place? There was no log rotation in place for it on magnesium. That log file had entries going back to 2017-06-20. At the moment, we have more than sufficient space to allow us to wait until TMB returns from vacation. (We checked free disk space on all of our servers.) The plan is we will look over all log files and ensure rotation is in place so as to avoid a recurrence of this issue.
Problem Summary: We had a problem with an oversized logfile taking up all free space on one of our servers but believe we have fixed it and that all services are now functioning properly and all issues have been attended to.
Conclusion: Please join me in thanking chromas and SemperOss for all the time they gave up on a Sunday to isolate the problem and come up with a solution. Special mention to Fnord666 who we later learned silently lurked, but was willing to jump in had he sensed we needed any help. Thank-you for having our backs! Further, please join me in publicly welcoming SemperOss to the team and wishing him well on his efforts here!
Lastly, this is an all-volunteer, non-commercial site — nobody is paid anything for their efforts in support of the site. We are, therefore, entirely dependent on the community for financial support. Please take a moment and consider subscribing to SoylentNews, either with a new subscription, by extending an existing subscription, or making a gift subscription to someone else on the site. Any amount entered in the payment amount field, above and beyond the minimum amount is especially appreciated!
First the good news. I just received word that janrinok, our Editor-in-Chief, is finally out of the hospital and back in his own home! He is very tired and has severe restrictions on his activities but is otherwise in excellent spirits. He very much appreciated the kind thoughts and wishes expressed by the community in our prior stories. It will still be many weeks or months before he can resume his prior level of activities on SoylentNews, but hopes to pop in once in a while to "second" stories that are in the story queue. Please join me in welcoming him back home!
Next, the good news. In janrinok's absence, the other editors have stepped up to the challenge. I'd like to call out chromas, fnord666, mrpg, and takyon who have all freely given from their spare time to make sure we have a steady stream of stories appearing here. I even saw CoolHand pop in on occasion to second some stories! teamwork++
Then, I have to bring up the good news that our development and systems staff have kept this whole thing running so smoothly. Besides the site, there is e-mail, the wiki, our IRC server, and a goodly number of other processes and procedures that make this all happen. That they are largely invisible attests to how well they have things set up and running!
Lastly, the good news. This is what's known in the press as the "silly season". Summer in the Northern Hemisphere means most educational institutions are on break, so less research is done and reported. other ventures are closed or running on reduced staffing levels. In short, the amount of news to draw from is greatly diminished. Yet, even in that environment, the vast majority of the time finds us with a selection of stories in the submissions queue to draw from.
We recently hit a low spot where I combed the web for a couple quick stories I could submit, but that has been the exception rather than the rule. Generally, we look for stories that have some kind of tech-related angle to them. The community has spoken loud and clear that there are plenty of other sites to read about celebrities, politics, and religion. We make a slight nod to politics in so much as it affects technical areas or has large scale ramifications (e.g. a story about President Trump having a meeting with Russian president Vladimir Putin would fit that description). Even then we generally try to keep it down to one story per day.
That said, if you see a story on the 'net that catches your fancy, please send it in! Feel free to draw upon titles listed on our Storybot page, then pop onto IRC (Internet Relay Chat) and simply issue the command ~arthur $code where $code is taken from the second column on the Storybot page.
Whether you contribute by submitting a story, buying a subscription, writing in one's journal, moderating or making a comment, we continue to provide a place where people can discuss, share knowledge and perspectives, and maybe learn a thing or two, too!
As you might recall, in an earlier story we noted that SoylentNews' Editor-in-Chief, janrinok, was scheduled for a medical procedure.
I have just received word that there were some (not totally unanticipated, but thought to be a very unlikely) complications and the expected 3 day hospital stay has now lasted over a week. In his own words:
No idea what will happen next is anybody's guess. My first objective is to be well enough to get home again but that looks like being the end of the week at the earliest.
He seemed to be in good spirits. In his inimitable style of humor, he noted the internet connection available to him in Hospital would lie in the bottom-most tier of our current poll!
I am torn in revealing personal details that were shared with me, and wish to not sound overly alarmist. I'll just leave here that I am reminded of a saying by Ralph Waldo Emerson "You cannot do a kindness too soon, for you never know how soon it will be too late."
JR has tirelessly (and tiredly, too) gone over and above in support of this site -- please keep him in your thoughts and, if you are of a mind to, your prayers. --martyb
This is a followup to: SoylentNews Site Certificates Expiring... We ARE Working on It, But... [Updated]
and: Site Services Restored
Certs (Not Just a Breath Mint):
Thanks to the efforts of The Mighty Buzzard and NCommander we now have valid certs, issued by LetsEncrypt installed on all of our servers. Except that the IRC server needs to be bounced to make its cert active on the backup daemon, all should now be in effect. As in our original story, you can check our certificate status with these links:
The past few days have brought into focus a situation that has been building for several months: We really only have a single person who is working on developing features for the site, The Mighty Buzzard. As with any large and on-going undertaking, this burden is taking its toll. I try to help out as I can, but as I am the primary QA/Test guy who is much better at the user-facing things than what all happens "under the covers", my abilities and assistance are limited. If you have any spare time and would like to lend a hand (and every bit helps), please reply in the comments or contact The Might Buzzard directly on IRC.
I recall in the early days of this site when things would fall over several times a day. That has largely become a thing of the past... to the point where it is unusual for any issues to appear on the site and the support services we maintain (email, wiki, IRC, etc.) The baseline code on which this site was founded (open-sourced, out-of-date, back-level, and non-functional) was not promising, but the staff managed to bludgeon it into shape and we now have a solid foundation. That it continues to run as smoothly as it has is a testament to our SysOps folk who toil largely in the background and just keep things working... as well as the continued care-and-feeding that TMB so generously provides. To all of you, please accept my heartfelt thanks and appreciation!
Some numbers: we are approaching the 23,000th story posted; have recently passed 700,000 comments submitted; have had over 3,300 journal articles posted; and are on the cusp of having our 120th Poll!
Though all numbers are approximate and unofficial, it appears we surpassed our funding goal for the first half of the year ($3,000) with a net subscription tally of just over $3,250! I'll leave it to our treasurer to collate and post the official numbers. I'll leave the "Funding Goal" side bar as is for a week or so to commemorate this accomplishment. Do note that subscriptions are still being accepted and will count towards the second half of the year's funding needs.
Folding@Home: Not all of you may be aware, but our soylentnews team for Folding@Home is currently at 240th place... in the world! It started with a single story posted to this site. Just over four years ago, we were at 230,319th place! If you have any spare computes you would like to contribute, especially GPU-based, we'd love to have you sign up! Just reply in the comments and I'm sure someone will get back to you.
Whenever I write one of these stories, I always fear I'll have omitted someone or something important. Please accept my humble apologies if I have done so as there is no intent to slight any contributor.
To the community, I offer my thanks for your contributions to the site as well as your patience and understanding during the challenges of the past few days. Contributions are not just financial (though we wouldn't be here without them -- Thank You!), but also submitting stories and comments, and moderating comments, too! The community continues to impress me with your wide-ranging knowledge and expertise; I have learned much from the exchanges in the story comments!
Lastly, please keep janrinok (our Editor-in-Chief) in your thoughts and wishes while he undergoes a medical procedure and attendant recovery period. Best of luck JR!
Just a quick note, as previously noted, our SSL certificates were due to expire. Due to various headaches involving issues changing our DNS records, and my personal unavailability, we were unable to renew our certificates in time. Right now, soylentnews.org is running on a LetsEncrypt certificate that was issued after quite a bit of pain. We're still trying to fix the fundamental issues that prevented us from being issued a two year Gandi certificate. Currently, I'm unable to resolve this issue more in-depth, but I've granted access to TheMightyBuzzard to be able help handle the necessary issues that caused the downtime. I will try to get a full writeup of the situation, but for the time being, the main site is up. Secondary services remain down due to the same renewal issues.
Apologies for any inconvenience,
[Updated 2018/06/30 15:19:00 UTC. I've received word that "certificate renewed, but hasn't been issued" and that it will be installed as soon as we receive the new cert. Original story follows. --martyb]
We've been up front with the community right from the start... and intend to keep doing so in the future.
Where we are at:
We have encountered an issue with Lets Encrypt (LE), the certificate issuer for the majority of our [sub]domains. Even though we can 'see' these domains from any number of different servers... for some unknown reason, LE fails to see them. So, at the moment, we are unable to get them to generate certs for us.
Separately, the cert for soylentnews.org is handled by Gandhi. As far as I understand it (and I'm no sysadmin so take this with a healthy dose of salt) there are only two members of our staff who have the ability to update that cert. (We obviously don't want to let world+dog have access to that, right? My guess is that at that time, having a couple people seemed sufficiently redundant and secure).
What it means to you:
You may encounter a warning from your browser when trying to access the site that a certificate has expired. I cannot speak for all browsers, but I've generally seen that along with the warning is an option to trust the cert anyway. (Note: along with allowing that exception, I've sen at least one browser default a checkbox to make the exception permanent. It's entirely up to you, but I see no reason to make it a permanent exception at this point.)
We are working on it, and obviously hope to have things straightened out sooner than later! On the other hand, should things go sideways, I want to keep the community informed about what's up, what's happening, and what you can expect.
If you would like another means to check on the status of the certs, Comodo makes it easy with queries such as these:
If the site becomes unavailable because of an expired cert, yeah, we know and we're working on it. Accept a temporary exception in your browser and we'll let you now when things are back to normal.
P.S. Our Editor-in-Chief, janrinok, is currently undergoing preparations for a medical procedure... it's hard to say at this point, but it's likely he may be unavailable to help with the site for a couple weeks. Please join me in wishing him well for the procedure and for a speedy recovery!