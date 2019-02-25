from the Oops,-we've-done-it-again dept.
I expect that many noticed that the site went down and, if you are reading this you will also realise that it is now back up.
The entire server died leaving a wake of Out-Of-Memory messages, which resulted in the site itself, IRC and our email all failing. We (and by that I really mean kolie!) have restarted the server and doubled the amount of memory available to it.
Of course, that doesn't tell us why it ran out of memory, although we knew that it was a bit tight, nor what specifically happened today to push it over the edge. That will probably take a while to work out.
It might take us a while to put more stories in the queue but you should be able to comment on many of today's stories that have only just appeared on your screens.
We are sorry for the inconvenience and we are getting back on our feet again. As always, a big THANK YOU to kolie for his efforts.
(Score: 5, Funny) by Tork on Wednesday February 19, @07:10PM (21 children)
Speaking of out-of-memory messages... what were we talking about?
(Score: 2) by janrinok on Wednesday February 19, @07:24PM (19 children)
I don't know yet - kolie is still recovering various elements.
I think we were running the site on 16Gb and it has been good for several months, but we have switched up to 32Gb now. I do not yet know what caused the OOM. I don't want to interrupt kolie at this point as he is still investigating and recovering.
I am not interested in knowing who people are or where they live. My interest starts and stops at our servers.
(Score: 2) by Tork on Wednesday February 19, @07:31PM (15 children)
(Score: 2) by Unixnut on Wednesday February 19, @07:37PM (12 children)
It sounds really generous to me, unless they have really complicated the setup with things like containers (resulting in multiples copies of an entire userland in RAM, all a bit different) and/or other inefficiencies gobbling up RAM.
Of course, it could also just be a bog-standard memory leak somewhere which ate up all the RAM until the OOM killer got involved. I guess we wait to see what they find during the post-mortem debugging :-)
(Score: 2) by drussell on Wednesday February 19, @07:46PM (9 children)
I think the 16/32GB is referring to the entire amount of memory for the overall VM that then runs all the various docker containers for the various servers, yes...
(Score: 2) by drussell on Wednesday February 19, @07:48PM (8 children)
... I meant various services ... (which are in the various separate docker containers)
You know what I mean. :)
(Score: 5, Interesting) by Unixnut on Wednesday February 19, @08:37PM (7 children)
wow... each service in its own docker container? No wonder 16GB of RAM to host a website isn't enough XD
In seriousness, I can't comment on the rationale for doing such a thing. Lately a lot of my work has been to move companies away from Docker/K8s after they all jumped on the "its the future of tech" bandwagon and ended up with Docker creating more problems then it solved. Funnily enough the OOM killer randomly terminating containers is an issue I've been debugging at work, current plan is to disable memory overcommit.
That way rather than processes being randomly OOM killed, they will just have to deal with failed malloc's themselves, or if they are poorly written they will segfault. In either case it will help us work out exactly what software is problematic and then decisions can be made on how to deal with it.
(Score: 3, Interesting) by drussell on Wednesday February 19, @08:49PM (2 children)
Well, I don't think it's really every service in a separate container, more like what was essentially running on each of the crazy number of separate virtual machines on linode are now containers on the same VM, but with a bit more granularity, like the backend database stuff is in one container while the frontend web servers / caching etc. are in another container, etc.
Someday I'm sure kolie et al. will have time to update the wiki or something to show an overview of how it's all configured. I do know that actually properly documenting the system backend again is a goal to be undertaken at some point, so I assume there will also be a summary somewhere for the community members to grok. :)
(Score: 5, Informative) by kolie on Wednesday February 19, @08:58PM (1 child)
It's all under the SoylentNews GitHub account on the infrastructure project.
(Score: 2) by Unixnut on Thursday February 20, @10:33AM
Nice, good to know. May have a perusal one of these days when I have some time.
Also a good moment to thank you for your work in getting the site back up and running. Much appreciated from me (and I am sure from the rest of the SN community!).
(Score: 2) by JoeMerchant on Wednesday February 19, @11:53PM
I have had OOM issues where they will snowball once they start.
One question is: is it just too much stuff or is it a slow leak? Do you have enough logs to track something like that?
(Score: 3, Insightful) by gnuman on Thursday February 20, @12:03AM (2 children)
And what's really magical about that? Containers are really low overhead and allow for easy snapshoting and rollback of various configurations. It really is a no-brainer these days to run everything in containers. Adding orchestration on top, that tends to be overkill, but podman or docker are *not* the problem here.
For people that have no clue, you can think of "containers" are just current iteration of chroots or after that they were called "vserver" or similar. And if you think of container as a VM, you are thinking wrong.
(Score: 2) by Unixnut on Thursday February 20, @10:30AM (1 child)
Probably because based on my understanding containers are not "low overhead" at all. While they are low overhead in execution (unlike VMs) the fact they each have their own isolated userland means you end up loading multiple copies of libraries per container.
For example unlike before where you load one copy of GLIBC into RAM and every running process links to that version, now each container has its own GLIBC loaded (sometimes even different versions, as long as they are ABI compatible with the kernel). Repeat this for every single other library in the container (and for each of the containers you are running on a host) and you will see that containers are responsible for massive memory bloat.
To me its the equivalent of statically linking every single thing in an OS into one large binary blob, then loading it into RAM just to do one task, horribly wasteful. While snapshotting/rollback may be of use (and you can have that without needing containers at all) the rest of the features of Containers are not universally beneficial, and should be used sparingly and when it makes sense to (i.e. definitely a bad idea to run everything in containers)
chroots from what I remember would still link to the same libraries as the host as there was no namespace isolation within the kernel. As such you did not have the memory bloat of containers, making them a better option in most cases if you need isolation.
(Score: 3, Interesting) by janrinok on Thursday February 20, @11:02AM
Containers also provide another level of security. Even if one container is compromised then the intruder cannot get outside of that container very easily. He is limited to only the ports that are open as exits from that container and in the format of the data that the port will accept. He cannot reconfigure anything outside the container so he cannot send the data to another network. The containers communicate with each other on a separate internal network.
The size of kernel that is included can be as small as 20MB - only what is absolutely necessary for that container to work is included. Even Debian is somewhere between 80 and 120MB for quite complicated containers.
Finally, containers can be set to automatically restart in the event of a temporary interruption to the overall system. Unfortunately, in our case the OOM didn't allow the container management software to do its job but that would have been a problem no matter how our servers had been configured. It required a complete reset.
(Score: 2) by gnuman on Thursday February 20, @12:11AM
and what would "entire userland" be? parts of glibc and openssl? and if you use same OS for images, those actually de-duplicate making your point even less relevant..
(Score: 2) by VLM on Thursday February 20, @05:02PM
They tend to be VERY small userlands. Like you don't even get a "ping" command, for example.
If the data is much larger than the userland (DB, etc) it tends to be a rounding error.
(Score: 5, Informative) by kolie on Wednesday February 19, @08:13PM
We run a ton of services. The site itself would be more than happy on far less than 16gb. The containers etc are actually fairly light weight, most of the images are based on the same layers and there's not much overhead in the processes themselves.
All the services together, including the dev site use almost exactly 16gb
We had set the VM to use 32 after the next reboot as we essentially knew an OOM issue was imminent. Kind of forced the schedule by happening before we had a window to address.
(Score: 2) by corey on Thursday February 20, @08:49PM
Yeah I was a LAMP guy bank when e-commerce was all the rage too.
Thanks to kolie!
I was having trouble with reception on my home wifi and tested SN, to find it wasn’t loading so I thought it was my network. Then happened to try somewhere else and that worked so I noticed SN had issues.
(Score: 3, Funny) by drussell on Wednesday February 19, @07:51PM (2 children)
Whoosh! :)
(Score: 3, Funny) by janrinok on Wednesday February 19, @08:55PM (1 child)
Yep, I missed that one!
(Score: 3, Informative) by ls671 on Thursday February 20, @02:42AM
Email me if you think about moving to something more reliable. I host sites with more traffic than soylent, have redundancy in 2 datacenters with replication and full backups in third data center and I supervise multiple websites, mailservers etc. running on the infrastructure. I have WAF (mod_security) dns_based blacklist, spam filtering, my own dns servers etc. etc. Cheers and good luck!
(Score: 3, Funny) by c0lo on Wednesday February 19, @09:37PM
DOGE visits and loss of services?
(Score: 4, Insightful) by tekk on Wednesday February 19, @07:35PM
Thanks for all the work y'all put into the site 3
(Score: 2, Interesting) by dioxide on Wednesday February 19, @07:44PM (6 children)
:(){ :|:& };:
(Score: 2) by Tork on Wednesday February 19, @08:55PM (5 children)
I'm not 100% certain but I think I made each one of those faces while watching Section 31.
(Score: 2) by drussell on Wednesday February 19, @09:08PM (4 children)
Off topic, I know, but "modern" Star Trek is NOT Star Trek.
They tortured and killed the real thing a couple decades ago, unfortunately. RIP. Sniff... Sniffle!
(Score: 3, Interesting) by Tork on Wednesday February 19, @10:07PM (2 children)
One thing that has changed for me in the intervening decades is my wife. She's not deep into scifi. She does like Doctor Who, HitchHiker's, Red Dwarf, even Transformers! But Star Wars, and Star Trek never settled well with her. HOWEVER, she found JJ's Trek movie watchable and that led to her giving Disco, Picard, and SNW a try. Sadly SNW is the only Trek show continuing to make the cut, but at least I'm not watching it by myself anymore!
🏳️🌈 Proud Ally 🏳️🌈
(Score: 2) by Hyper on Friday February 21, @07:52AM (1 child)
Doctor Who? The latest seasons, or the older ones?
https://www.lbc.co.uk/news/entertainment/doctor-who-faces-axe-nucti-gatwa-quits-show-drop-ratings-woke-storylines/ [lbc.co.uk]
They seem to have alienated the dedicated fanbase, but could not pick up enough new eyeballs to justify the new direction. It would be nice to know what someone who doesn't have decades of dedication to the series thinks of it.
(Score: 0) by Anonymous Coward on Friday February 21, @02:15PM
How much of it is actually because the show is bad, and how much of it is people being biased because someone else said the show is woke and bad?
I know someone who's seen every existing and reconstructed serial from classic Doctor Who, and a good portion of the stories from the new series through Matt Smith's time as the Doctor. He said he didn't like that the show as going to have the Doctor regenerate with a female body, and he had previously said he wasn't fond of that idea for the Master, either. He also has very conservative views and watches a lot of right-wing content online. I got him some of the episodes from Jodie Whitaker's time as the Doctor, but never told him that the show had developed a reputation of being woke, and a lot of people complained about Whitaker. I expected him to tell me he didn't like Whitaker, but it was the opposite. Aside from one story that he said grossed him out because of an aversion to spiders, he really liked Whitaker, and he said he was sorry when she left the show.
My theory is based on anecdotal evidence. But nobody bothered to tell this person that the show was considered woke and people didn't like it, and he ended up really liking Whitaker. I bet he would have reached a very different conclusion if I told him in advance that people were complaining about the show. By the way, I suspect the opposite is true as well. If people are told that a lot of their peers really like a show, they'll come back giving it great reviews even if the show wasn't actually very good. I suspect I'd probably view the recent episodes with a lot of bias as well because of how so many people praised Russell T Davies for his work on the first few series. I'd watch with the bias that the episodes are good, even if the stories aren't actually that good.
Just be glad we didn't have internet message boards when Plato's Stepchildren aired. I bet a lot of people would have demanded that NBC immediately cancel Star Trek. Instead, despite the fears from network executives, there was very little outrage over the kiss. Maybe almost nobody got angry because they didn't have other people suggesting that they should be angry about the kiss.
(Score: 4, Interesting) by epitaxial on Thursday February 20, @12:29AM
Strange New Worlds is classic Trek and it's awesome.
(Score: 2, Insightful) by pTamok on Wednesday February 19, @08:18PM (3 children)
That was an Oops. They happen.
I thought IRC was (historically, at least) on a different server, so in the event of the main site taking a breather, we still had IRC to keep people in contact/informed. Perhaps we have over-consolidated a tad.
Just a thought.
Meanwhile, thanks to all those tidying up in the foreground/background. It is much appreciated.
(Score: 5, Informative) by janrinok on Wednesday February 19, @08:47PM
We have another Soylent IRC on Libera.chat for the very purpose you have described. It doesn't matter which server we run our own IRC on, if it were to go down then we would lose IRC which is used during all recoveries.
I thought everyone knew about it but perhaps I need to publicise it again.
The privilege of running every service on a different server was costing us around $6000 a year. The main site now uses a single server. Kolie generously provides a server and the data feed for free (at least for the current levels of activity), and Fliptop currently provides another free server for an off-site backup. That has saved us $6000 per annum. The balance is that we do not have the redundancy that we once had but we are running the site at a significant financial saving. Our annual running costs are almost trivial in comparison with what they once were. We have to pay for our domain registration and other basic running costs but nothing like we were paying in the past.
(Score: 2) by kolie on Wednesday February 19, @08:57PM
There's another docker host that's currently idle for SN, we can probably split services across them.
(Score: 2) by driverless on Thursday February 20, @01:30AM
Is the server running OpenSSH [thehackernews.com]?
(Score: 2) by acid andy on Wednesday February 19, @10:45PM
I had an issue with my internet connection where some sites were inaccessible, maybe an ISP issue. When everything else had come back, SN was giving me 404s so I thought the two were related.
(Score: 3, Insightful) by DBCubix on Thursday February 20, @12:09AM (2 children)
If you don't mind, could you post the server specs?
And do y'all need a few bucks for upgrades?
(Score: 3, Informative) by janrinok on Thursday February 20, @09:06AM
I'm afraid that only kolie can answer the first question. It is his server.
As to your second question, subscriptions or donations are always welcome. You can pay using Stripe, Paypal or by wire transfer.
(Score: 5, Informative) by kolie on Thursday February 20, @04:57PM
System is a dell R720
Dual Xeon E5-2650 v2
128 GB of RAM
10 Gb networking
SSD Storage
I've got a few similar nodes.
(Score: 3, Insightful) by datapharmer on Thursday February 20, @01:28AM (3 children)
I know maybe it should be too obvious to ask, but are you using swap? I’m always surprised how many cloud containers/VMs are configured by default without swap enabled. You can do so much more on a memory limited platform with a sizable swap and often little performance impact given how fast storage is these days.
I’d absolutely confirm you all have swap configured if at all possible!
(Score: 0) by Anonymous Coward on Thursday February 20, @04:55AM
I was coming to say something similar. I use the zram mechanisms for swap, and in low memory conditions, the response of the computer slows significantly, so I am able to notice being nearly out of memory before things fail critically.
(Score: 2) by bzipitidoo on Thursday February 20, @05:35AM (1 child)
Used to be that running swap on an SSD was a Very Bad Idea. Caused premature drive failure. In more recent years, I read that the number of write cycles that SSDs can handle now are far greater, great enough that it's okay to run swap on them.
(Score: 1, Informative) by Anonymous Coward on Thursday February 20, @02:28PM
Yes, never run swap on SSD. Plug in good old spinning rust HD for swap space.
(Score: 3, Interesting) by Snospar on Thursday February 20, @08:32AM (2 children)
I meant to renew my subscription yesterday but I thought the forces of evil had stepped in to make that impossible. Glad your back! Subscription renewed :)
Huge thanks to all the Soylent volunteers without whom this community (and this post) would not be possible.
(Score: 2) by janrinok on Thursday February 20, @09:04AM (1 child)
(Score: 3, Insightful) by Snospar on Thursday February 20, @03:15PM
No, all your hard work is appreciated. Putting a paltry sum in the tip jar is easy.
