
from the from-the-crack-team-of-flying-monkeys dept.
General Information
Our nodes are named after elements in the periodic table, starting with hydrogen, and going up from there; named roughly in the order they were brought online. With two exceptions, we're standardized on Ubuntu 12.04 Precise Pangolin. Nodes dedicated to running slashcode are Linode 4096s, with everything else being Linode 2048s due to Linode's recent free upgrade.
Where possible, all services (with the exception of MySQL) are high-availability, and can survive any node suddenly flaking out. This includes our internal DNS, LDAP, Kerberos, gluster, web frontends, and slashd*. It is our goal to get us to a 100% HA configuration so we can easily offline nodes, or upgrade systems without any interruption in service though we're still somewhat short of that (mostly due to limitations in MySQL).
User management and single-sign-on are handled by a combination of kerberos and LDAP, with SSH keys for users storied in the LDAP backend with a bit of voodoo to allow them to be dynamically loaded whenever staff wish to access a machine. Service accounts (i.e., slash or icinga) use Kerberos keytabs to perform passwordless authentication to allow us to be able to centrally revoke and replace any compromised keys instead of playing the age-old game of editing authorized_keys in 20 places.
Furthermore, we use AppArmor quite extensively internally to try and keep ourselves relatively well protected. Its no secret that we're currently stuck on an outdated Perl and Apache which no longer receives security updates. While we have plans to work through, and migrate to mod_perl 2, the frontend is horribly tied to Apache (including hooks in various stages of the httpd lifecycle). I plan to run a dedicated article about this, but lets just say its a bit in-depth.
The li694-22 Domain
I've mentioned this on comments, and its on the wiki as well, but we use an internal gTLD for referencing nodes throughout the backend. Every node can access each other at hostname.li694-22. The name itself is a reference to the original private URL which we used for bringing up Slashcode way back before SN was decided as our temporary name. We have full forward and reverse resolution available, and only publish AAAA records for normal services. Oh yeah, about that ...
Use of IPv6 internally
Yeah, we were serious when we axed IPv4 internally. Since that article was written, we've had to re-introduce IPv4 addressing for the internal webservices (via ipv4.hostname.li694-22) due to compatibility issues with gluster. Using IPv6 internally allows us to have kerberos and other IP dependent services work properly from multiple places across the internet such as our off-site backup box.
Anyway, enough that, let's get a look at the machines themselves:
Production Cluster
- hydrogen/fluorine - web frontends
- helium/neon - database backends
- beryllium - wiki host + mail accounts; runs CentOS 6
- boron - gluster+slashd
Services Cluster
- carbon - IRC server
- nitrogen - tor proxy (also runs staff slash)
- oxygen - off-site backup
Development
- lithium - dev.soylentnews.org, running Ubuntu 14.04
As you can tell, its quite a bit of virtual iron that keeps this site up and running. We've got considerable excess capability at the moment, so I'm not too worried about us having to bring up additional frontends any time soon, and we're trying to keep it that if half our web/DB servers were offline, we'd still be able to remain up and functional. Perhaps its a bit overkill, but you never know when you need to bring a node offline
The next article is going to go somewhat in-depth into the system administration aspects, including a hands-on look at our Kerberos, LDAP, and Iciniga instances, including a brief overview of each of these technologies in turn. Many who have worked on staff had no previous experience with kerberos in a UNIX-like environment, which I consider unfortunate, since it can drastically simplify administration burdens. Drop your questions below, and I'll either answer them inline, or later in this series of articles. Until the next time, NCommander, signing off.
(Score: 3, Interesting) by goodie on Monday May 05 2014, @08:21PM
Love this idea. Rather than saying "we won't tell you how we're built it would be a security issue", disclosing how it's setup shows you guys trust your systems and don't work through obfuscation. I don't have many questions on this yet but I may as you unpeel the layers of the SN onion :).
But really cool idea IMHO, keep it going! ;) )
(and yes people will give their $0.02 on why this is like that and not like this etc. which may at times be constructive but will most often be about bitching nerdgasms, it's all good
(Score: 2, Informative) by poutine on Monday May 05 2014, @08:37PM
While I'm definitely glad the days of soylentnews being down due to a poorly planned DNS change are over. I have to say this is a bit overkill. What kinds of traffic are you seeing that you need a 10 node setup? Seems like you guys had good intentions but went a little overboard, and that's not a very sustainable business pattern.
Just to give some ideas of price, a linode 4096 (which they claim to have 8 of) costs $40/mo, costing $320/mo, and the other two are $20/mo, which means they're paying $360 just for the base packages. Who knows what upgrades they have to that. $360 is way too much for a site this size.
Also I think it's fairly dishonest to claim this is a decently large website. This is by any measure a SMALL website, with a small userbase.
(Score: 5, Insightful) by NCommander on Monday May 05 2014, @08:43PM
Says the person who complains that we had downtime to do site updates. The 2x2 configuration allows us to update and manage our hardware without taking the site down, which is something you blasted me for on IRC on numerous occasions.
Still always moving
(Score: 2) by crutchy on Monday May 05 2014, @09:42PM
(Score: 2) by Fluffeh on Monday May 05 2014, @09:57PM
The other things I have seen are planets and moons, LoTR characters or cities and my favourite, though it didn't make life easy was physicists that had made great contributions to the field.
(Score: 5, Insightful) by Ethanol-fueled on Monday May 05 2014, @10:47PM
I know I'm going to sound like a lackey-ass sycophantic cocksucker-of-the-admins in saying this (I have some of Janrinok's come on the corner of my mouth and N1 is next tonight for a blowjob), but engaging the community in this manner is in my opinion one of the things this site gets right. There's an informative frankness that makes the admins of this place much more credible, unlike the disengaged "ivory-tower" mentality the admins at The Other Site had when we left.
It's unbelievable that some readers would even complain about these "meta" articles, especially while the site is in its infancy and we're better off knowing what exactly the hell is going on. The admins at the other site didn't even pretend to give a shit about what you all thought about Beta, in fact the official response was one of impatience, hostility, and condescension. My only regret is not yet knowing jack shit about web programming and site maintenance and being too lazy to learn.
(Score: 1) by lcklspckl on Monday May 05 2014, @11:51PM
I wish I had mod points right now. Too funny, eh, so I write. Well done, Ethanol-fueled. I had to read it twice.
I too appreciate the meta articles. It's been an interesting journey from the break and seeing it as it unfolds has been a great experience. Thanks NCommander.
(Score: 2) by Reziac on Tuesday May 06 2014, @02:26AM
Same here. And I've learned more about how a site like this is run from these articles than from, well, the whole rest of my life combined (sadly, I too know less than jack shit, so this is progress). 'Tis better to light one candle than to curse the darkness.
oxygen - off-site backup -- that's hilariously appropriate. What do you do if the working copy goes completely tits-up? Give it oxygen! :D
And there is no Alkibiades to come back and save us from ourselves.
(Score: 5, Insightful) by tibman on Monday May 05 2014, @11:15PM
360$ a month is overkill? Get out of here. You are probably one of those developers that shuns backups, version control systems, and a separate test environment. After all, they aren't needed to ship product. Right?
This is a small website? You may wish it was small but that doesn't make it true. Hundreds of unique visitors a day is doing pretty well. Even so, the site is not declining in size which means you would want to purchase more resources than you need at present. A high user count does not translate directly to high revenue. My company, for example, only gets a dozen unique visitors a day (who spend thousands of dollars a month for access).
If you wanted to be helpful you could just ask how long funding will last at this level. If there is a need to scale back, is it possible? How quick to scale back up? Your criticism is lacking anything useful. So, in your opinion, how many nodes should SN have, what services should be hosted on them, and how much is a sane monthly budget? Let's hear your suggestions.
SN won't survive on lurkers alone. Write comments.
(Score: 2) by Tork on Tuesday May 06 2014, @04:07AM
*;)
🏳️🌈 Proud Ally 🏳️🌈
(Score: 1, Interesting) by Anonymous Coward on Tuesday May 06 2014, @10:32AM
In all fairness, yeah: hundreds of visitors per day is small. Near as I see, SN posts 6-8 stories/day, which get 20 or so comments. That's fewer than 200 database updates per day, fewer than 1 every seven minutes. Granted, they likely have 1000 page views for every comment. You should still be able to run that on a single 1GHz core without breaking a sweat.
Sure $360/month is an objectively small number. And if the team here are passionate enough about this project to donate those dollars, or they can coax together those dollars from the hundreds of visitors, then that's fine. The point is that it's more than they have to spend. Even if they're really excited about the project now, $4500/year will weigh on one's enthusiasm after a while.
They could probably condense all of the production and services clusters to a single node, or two redundant nodes if failover really is critical. The admins would gain the experience of building a real cluster (and in their place, I would put a high premium on that). That one or two node setup should work for a site 5x what the load here looks like from the outside.
(Score: 4, Informative) by NCommander on Tuesday May 06 2014, @06:11PM
We're actually fairly more active than that. During peak hours (roughly 5-9pm EST), we're averaging over 100+ connections a second, and a fair bit of the backend SQL is not optimized (memcache is the only thing that prevents the site from going snap). We actually could run on Linode 2048s (we were right up until the free Linode upgrade, but I had already purchased fluorine + neon, and I didn't want to downgrade).
Most of these machines were brought up when the lowest tear was a Linode 1024, and we've learned that 1024 MiB wasn't a lot. I found it was easier just to add a new node vs. upgrading since they would be the same price (nitrogen only exists because at the time beryllium was essentially maxed out; slash loads a lot of stuff in memory due to the sheer number of perl modules it uses; it will eat almost an entire gig by itself).
Still always moving
(Score: 2) by VLM on Tuesday May 06 2014, @12:37AM
There is of course a hidden assumption that the linode guys are charging list price.
They're good guys in general (aside from one legendary security issue) and I would not be surprised if they were providing a discount, especially knowing you can get the purchaser to tell people all day they have a 4096 all day even if it never runs about 1% utilization so "giving it away" for 5% list price still runs a profit if SN is only using 1% of it.
I have no connection with linode at all, other than being a very long term customer (like about the time they started up, years and years and years ago) and being a fan of caker other that that time he lost my CC and charges to some UK gambling site appeared. If we ever get really bored, it would be humorous to try to figure out which SN admin / notable happens to be caker, assuming he is here.
(Score: 2) by VLM on Monday May 05 2014, @08:42PM
In before the first claim kerberos doesn't work over ipv6.
Yes it was a total shotgun which specific features and abilities worked under ipv6, and it took from like 1.4 till 1.9 to support everything, but by 1.9 absolutely everything works under ipv6 as far as I know. There never was a forklift upgrade WRT ipv6 where it went from no work to all work, it was one little thing at a time. If I recall correctly the last thing fixed was being able to change your password over ipv6, or maybe it was KDC replication, whatever its all history now.
Its kerberos's cool friend openafs which is still sorta kinda ipv4 only sorta, despite being under occasional development as a feature for a decade or so. Frankly openafs is gonna have to forklift, I think, it get ipv6, implemented all at once everywhere. That kinda sucks. I like AFS and use it extensively at home.
(Score: 2) by NCommander on Monday May 05 2014, @08:45PM
Actually, it works better when using IPv6 addressing due to the fact that you can have globally-routable addresses, and don't run into stupid issues when you have multiple private IP spaces overlapping; I've seen some "clever" hacks to deal with kerberos across multiple sites that were using private IPv4 address space. The biggest headache is setting up IPv6 reverse-dns, but there are zone file generators which help with that.
Still always moving
(Score: 2) by VLM on Monday May 05 2014, @08:54PM
"allow us to be able to centrally revoke and replace any compromised keys instead of playing the age-old game of editing authorized_keys in 20 places."
I use kerberos the same way you do, for totally different situation of course. Before that, in the 00s or late 90s, I had a directory full of keys, one to a file, and a pretty dumb script that cat them together and sent them out via puppet (before puppet I had my own wanna-be puppet, like everyone else)
At a former employer they had an interesting hybrid solution where everyone's ssh keys were cat together like my strategy and scp'd out in a long script that invariably hung up or crashed halfway thru. Those refusing to implement puppet are doomed to reinvent it, poorly.
For extra fun you can centralize and distribute host keys too. Of course sysadmins have been trained for decades to ignore when a host key changes and just Y thru or rm ~/.ssh/known_hosts and try connecting again, so the obvious advantage is quickly eliminated.
You can also store ssh keys in their individual files in a git, and have a crontab on each machine that creates and then installs a new file if a change is detected. So you create a new authorized_keys and if its md5sum doesn't match the md5sum of the local copy, you replace the real file with the new file, this is because if you replace it every 5 minutes regardless of change or just cat a new one into place every 5 minutes, you end up crashing cron jobs that started during the instant of creating the new file in place or whatever.
(Score: 4, Interesting) by codemachine on Monday May 05 2014, @10:41PM
Interesting that SN gives us a behind the scenes the same day as Pipedot releases their source.
Different approaches, but similar goals.
(Score: 2) by NCommander on Tuesday May 06 2014, @03:47AM
I actually wrote this last night, and just scheduled it to go live when it did. For the record, our full source has been available on github since day 1, for anyone to loose their sanity with :-)
Still always moving
(Score: 3) by kaszz on Tuesday May 06 2014, @04:27AM
It would be useful if there was a forum to communicate found bugs or missing features.
(Score: 2) by NCommander on Tuesday May 06 2014, @08:34AM
We have a github page (Bug List link on the left) for this purpose. Perhaps though a QA topic will be more ideal ...
Still always moving
(Score: 0) by Anonymous Coward on Tuesday May 06 2014, @06:55AM
Ask my Mrs!
(Score: 1, Interesting) by Anonymous Coward on Tuesday May 06 2014, @10:30AM
Great article. I'm looking forward to learn more about how a website like SN is working, as the only experience I have with running websites, is a site built in Netscape Composer and hosted on Geocities in 1997.
How do you find out how many servers to rent and how powerful they should be? Is there a specific plan you can use to calculate this?
(Score: 1, Interesting) by Anonymous Coward on Tuesday May 06 2014, @12:54PM
I remember when my lab was small enough to use fun names for servers. We used country names. When we had 15 it wasn't bad, but when we grew past 20 I started hearing shouts from developers like, "How to you spell 'Liechtenstein'?" and "Is 'Austria' a database or an app server?" Now that we have 50 some physical, and god knows how many virtual, we use boring, functional names like DEV-DB03, PROD-WEB06, and TEST-PROXY01.
(Score: 2) by GlennC on Tuesday May 06 2014, @12:30PM
I'd much rather read more about the nuts and bolts of the site than about the name voting or where to incorporate.
Those are administrative minutiae, barely worthy of a two-line blurb.
Sorry folks...the world is bigger and more varied than you want it to be. Deal with it.
(Score: 2) by tynin on Tuesday May 06 2014, @12:33PM
How big is your gluster volume, and how has it treated you so far? I imagine you are using it to keep the site content current across slashd/web server instances? Have you run into any file locking / gluster self healing problems?
(Score: 3, Informative) by NCommander on Tuesday May 06 2014, @06:18PM
Aside from the fact that gluster's IPv6 support is more mythical than reality, we've had relatively few issues. We're not using it as its intended, more as NFS on steroids, and have it set as a 3x1 brick configuration (that is to say, every machine is mirroring the entire gluster setup). THe biggest issue is that write speed took a dive. Buildout and deployment used to take 1-2 minutes tops; now it takes 15 because slash touches every file when it installs (we can optimize the makefile to be less ugly in this regard, but ...)
Gluster has the advantage that its relatively simple compared to ceph, and doesn't require the madness DRDB NFS requires, which seems to be the alternative for HA NFS. If we add more web nodes in the future, I'll likely reconfigure gluster to a 2x2 brick configuration, but with only three nodes, the 3x1 is saner.
Still always moving
(Score: 3, Informative) by NCommander on Tuesday May 06 2014, @06:21PM
Oops, didn't fully answer the question. According to comments in slashcode, the othersite runs with a single NFS server, with the web frontends running ro. We *could* use georeplication for this, but in case we have to offline boron, I just want to be able to start slashd on the webnodes, which requires r/w permissions to the webfolder. The entire server stack (perl 5.10/apache 1.3/slash/etc.) lives on gluster.
We're not using a seperate partition or dedicated drive for this (which I know is against recommended practice) since we're using a VPS, so when I discussed it with others on the team, none of us saw that there was any real advantage of repartitioning the web frontends. The gluster storage is stored in /srv/glusterfs-replication, and the gluster client connects via loopback to the server running on each node. I think we've got 2-3 GiB of stuff on the gluster partition, mostly CPAN + source files; slash itself is only about 10 MiB.
Still always moving
(Score: 1) by pTamok on Tuesday May 06 2014, @05:08PM
Thank-you for a very interesting article. Please ignore criticism from the peanut gallery.
If you can afford it, I'm strongly of the opinion that one should have a site that is overkill on hardware/compute capacity for normal situations, so that one has plenty in reserve for when things are under pressure for whatever reason. Better overkill than underperformance when one really needs it.