Stories
Slash Boxes
Comments

SoylentNews is people

posted by NCommander on Monday May 05 2014, @08:00PM   Printer-friendly
from the from-the-crack-team-of-flying-monkeys dept.
So, in a bid to keep my sanity while working on the manifesto, I felt an interesting side-project would be to do a series of articles going in-depth on our backend is put together, and what goes into the nuts and bolts of a decently large website. I'm sort of writing this as I get writers block on the manifesto, and I have no set agenda, so if interesting questions come up, I may dedicate an article or two about them. For this first one though, I wanted to give a relatively broad overview of our backend, then an article about each major component that comes together to form SN.
So, as of writing, SoylentNews is hosted almost entirely on Linode (in the Dallas, TX datacenter), and split across ten nodes. Four nodes are dedicated to our slash instance, with the others doing various auxiliary purposes.

General Information
Our nodes are named after elements in the periodic table, starting with hydrogen, and going up from there; named roughly in the order they were brought online. With two exceptions, we're standardized on Ubuntu 12.04 Precise Pangolin. Nodes dedicated to running slashcode are Linode 4096s, with everything else being Linode 2048s due to Linode's recent free upgrade.

Where possible, all services (with the exception of MySQL) are high-availability, and can survive any node suddenly flaking out. This includes our internal DNS, LDAP, Kerberos, gluster, web frontends, and slashd*. It is our goal to get us to a 100% HA configuration so we can easily offline nodes, or upgrade systems without any interruption in service though we're still somewhat short of that (mostly due to limitations in MySQL).

User management and single-sign-on are handled by a combination of kerberos and LDAP, with SSH keys for users storied in the LDAP backend with a bit of voodoo to allow them to be dynamically loaded whenever staff wish to access a machine. Service accounts (i.e., slash or icinga) use Kerberos keytabs to perform passwordless authentication to allow us to be able to centrally revoke and replace any compromised keys instead of playing the age-old game of editing authorized_keys in 20 places.

Furthermore, we use AppArmor quite extensively internally to try and keep ourselves relatively well protected. Its no secret that we're currently stuck on an outdated Perl and Apache which no longer receives security updates. While we have plans to work through, and migrate to mod_perl 2, the frontend is horribly tied to Apache (including hooks in various stages of the httpd lifecycle). I plan to run a dedicated article about this, but lets just say its a bit in-depth.

The li694-22 Domain
I've mentioned this on comments, and its on the wiki as well, but we use an internal gTLD for referencing nodes throughout the backend. Every node can access each other at hostname.li694-22. The name itself is a reference to the original private URL which we used for bringing up Slashcode way back before SN was decided as our temporary name. We have full forward and reverse resolution available, and only publish AAAA records for normal services. Oh yeah, about that ...

Use of IPv6 internally
Yeah, we were serious when we axed IPv4 internally. Since that article was written, we've had to re-introduce IPv4 addressing for the internal webservices (via ipv4.hostname.li694-22) due to compatibility issues with gluster. Using IPv6 internally allows us to have kerberos and other IP dependent services work properly from multiple places across the internet such as our off-site backup box.

Anyway, enough that, let's get a look at the machines themselves:

Production Cluster
  • hydrogen/fluorine - web frontends
  • helium/neon - database backends
  • beryllium - wiki host + mail accounts; runs CentOS 6
  • boron - gluster+slashd

Services Cluster

  • carbon - IRC server
  • nitrogen - tor proxy (also runs staff slash)
  • oxygen - off-site backup

Development

  • lithium - dev.soylentnews.org, running Ubuntu 14.04

As you can tell, its quite a bit of virtual iron that keeps this site up and running. We've got considerable excess capability at the moment, so I'm not too worried about us having to bring up additional frontends any time soon, and we're trying to keep it that if half our web/DB servers were offline, we'd still be able to remain up and functional. Perhaps its a bit overkill, but you never know when you need to bring a node offline

The next article is going to go somewhat in-depth into the system administration aspects, including a hands-on look at our Kerberos, LDAP, and Iciniga instances, including a brief overview of each of these technologies in turn. Many who have worked on staff had no previous experience with kerberos in a UNIX-like environment, which I consider unfortunate, since it can drastically simplify administration burdens. Drop your questions below, and I'll either answer them inline, or later in this series of articles. Until the next time, NCommander, signing off.

Related Stories

We've Killed IPv4! 71 comments
As part of wanting to be part of a brighter and sunny future, we've decided to disconnect IPv4 on our backend, and go single-stack IPv6. Right now, reading to this post, you're connected to our database through shiny 128-bit IP addressing that is working hard to process your posts. For those of you still in the past, we'll continue to publish A records which will allow a fleeting glimpse of a future without NAT.
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by goodie on Monday May 05 2014, @08:21PM

    by goodie (1877) on Monday May 05 2014, @08:21PM (#39926) Journal

    Love this idea. Rather than saying "we won't tell you how we're built it would be a security issue", disclosing how it's setup shows you guys trust your systems and don't work through obfuscation. I don't have many questions on this yet but I may as you unpeel the layers of the SN onion :).

    But really cool idea IMHO, keep it going!
    (and yes people will give their $0.02 on why this is like that and not like this etc. which may at times be constructive but will most often be about bitching nerdgasms, it's all good ;) )

  • (Score: 2, Informative) by poutine on Monday May 05 2014, @08:37PM

    by poutine (106) on Monday May 05 2014, @08:37PM (#39932)

    While I'm definitely glad the days of soylentnews being down due to a poorly planned DNS change are over. I have to say this is a bit overkill. What kinds of traffic are you seeing that you need a 10 node setup? Seems like you guys had good intentions but went a little overboard, and that's not a very sustainable business pattern.

    Just to give some ideas of price, a linode 4096 (which they claim to have 8 of) costs $40/mo, costing $320/mo, and the other two are $20/mo, which means they're paying $360 just for the base packages. Who knows what upgrades they have to that. $360 is way too much for a site this size.

    Also I think it's fairly dishonest to claim this is a decently large website. This is by any measure a SMALL website, with a small userbase.

    • (Score: 5, Insightful) by NCommander on Monday May 05 2014, @08:43PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday May 05 2014, @08:43PM (#39935) Homepage Journal

      Says the person who complains that we had downtime to do site updates. The 2x2 configuration allows us to update and manage our hardware without taking the site down, which is something you blasted me for on IRC on numerous occasions.

      --
      Still always moving
    • (Score: 2) by crutchy on Monday May 05 2014, @09:42PM

      by crutchy (179) on Monday May 05 2014, @09:42PM (#39945) Homepage Journal

      This is by any measure a SMALL website, with a small userbase.

      ...but it is also a nerd paradise, and what would nerd paradise be without HA and the periodic table?

      • (Score: 2) by Fluffeh on Monday May 05 2014, @09:57PM

        by Fluffeh (954) Subscriber Badge on Monday May 05 2014, @09:57PM (#39949) Journal

        and the periodic table?

        The other things I have seen are planets and moons, LoTR characters or cities and my favourite, though it didn't make life easy was physicists that had made great contributions to the field.

      • (Score: 5, Insightful) by Ethanol-fueled on Monday May 05 2014, @10:47PM

        by Ethanol-fueled (2792) on Monday May 05 2014, @10:47PM (#39960) Homepage

        I know I'm going to sound like a lackey-ass sycophantic cocksucker-of-the-admins in saying this (I have some of Janrinok's come on the corner of my mouth and N1 is next tonight for a blowjob), but engaging the community in this manner is in my opinion one of the things this site gets right. There's an informative frankness that makes the admins of this place much more credible, unlike the disengaged "ivory-tower" mentality the admins at The Other Site had when we left.

        It's unbelievable that some readers would even complain about these "meta" articles, especially while the site is in its infancy and we're better off knowing what exactly the hell is going on. The admins at the other site didn't even pretend to give a shit about what you all thought about Beta, in fact the official response was one of impatience, hostility, and condescension. My only regret is not yet knowing jack shit about web programming and site maintenance and being too lazy to learn.

        • (Score: 1) by lcklspckl on Monday May 05 2014, @11:51PM

          by lcklspckl (830) on Monday May 05 2014, @11:51PM (#39970)

          I wish I had mod points right now. Too funny, eh, so I write. Well done, Ethanol-fueled. I had to read it twice.

          I too appreciate the meta articles. It's been an interesting journey from the break and seeing it as it unfolds has been a great experience. Thanks NCommander.

        • (Score: 2) by Reziac on Tuesday May 06 2014, @02:26AM

          by Reziac (2489) on Tuesday May 06 2014, @02:26AM (#40001) Homepage

          Same here. And I've learned more about how a site like this is run from these articles than from, well, the whole rest of my life combined (sadly, I too know less than jack shit, so this is progress). 'Tis better to light one candle than to curse the darkness.

          oxygen - off-site backup -- that's hilariously appropriate. What do you do if the working copy goes completely tits-up? Give it oxygen! :D

          --
          And there is no Alkibiades to come back and save us from ourselves.
    • (Score: 5, Insightful) by tibman on Monday May 05 2014, @11:15PM

      by tibman (134) Subscriber Badge on Monday May 05 2014, @11:15PM (#39966)

      360$ a month is overkill? Get out of here. You are probably one of those developers that shuns backups, version control systems, and a separate test environment. After all, they aren't needed to ship product. Right?

      This is a small website? You may wish it was small but that doesn't make it true. Hundreds of unique visitors a day is doing pretty well. Even so, the site is not declining in size which means you would want to purchase more resources than you need at present. A high user count does not translate directly to high revenue. My company, for example, only gets a dozen unique visitors a day (who spend thousands of dollars a month for access).

      If you wanted to be helpful you could just ask how long funding will last at this level. If there is a need to scale back, is it possible? How quick to scale back up? Your criticism is lacking anything useful. So, in your opinion, how many nodes should SN have, what services should be hosted on them, and how much is a sane monthly budget? Let's hear your suggestions.

      --
      SN won't survive on lurkers alone. Write comments.
      • (Score: 2) by Tork on Tuesday May 06 2014, @04:07AM

        by Tork (3914) on Tuesday May 06 2014, @04:07AM (#40016)
        Why don't you just migrate it to the cloud*?


        *;)
        --
        🏳️‍🌈 Proud Ally 🏳️‍🌈
      • (Score: 1, Interesting) by Anonymous Coward on Tuesday May 06 2014, @10:32AM

        by Anonymous Coward on Tuesday May 06 2014, @10:32AM (#40090)

        In all fairness, yeah: hundreds of visitors per day is small. Near as I see, SN posts 6-8 stories/day, which get 20 or so comments. That's fewer than 200 database updates per day, fewer than 1 every seven minutes. Granted, they likely have 1000 page views for every comment. You should still be able to run that on a single 1GHz core without breaking a sweat.

        Sure $360/month is an objectively small number. And if the team here are passionate enough about this project to donate those dollars, or they can coax together those dollars from the hundreds of visitors, then that's fine. The point is that it's more than they have to spend. Even if they're really excited about the project now, $4500/year will weigh on one's enthusiasm after a while.

        They could probably condense all of the production and services clusters to a single node, or two redundant nodes if failover really is critical. The admins would gain the experience of building a real cluster (and in their place, I would put a high premium on that). That one or two node setup should work for a site 5x what the load here looks like from the outside.

        • (Score: 4, Informative) by NCommander on Tuesday May 06 2014, @06:11PM

          by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday May 06 2014, @06:11PM (#40254) Homepage Journal

          We're actually fairly more active than that. During peak hours (roughly 5-9pm EST), we're averaging over 100+ connections a second, and a fair bit of the backend SQL is not optimized (memcache is the only thing that prevents the site from going snap). We actually could run on Linode 2048s (we were right up until the free Linode upgrade, but I had already purchased fluorine + neon, and I didn't want to downgrade).

          Most of these machines were brought up when the lowest tear was a Linode 1024, and we've learned that 1024 MiB wasn't a lot. I found it was easier just to add a new node vs. upgrading since they would be the same price (nitrogen only exists because at the time beryllium was essentially maxed out; slash loads a lot of stuff in memory due to the sheer number of perl modules it uses; it will eat almost an entire gig by itself).

          --
          Still always moving
    • (Score: 2) by VLM on Tuesday May 06 2014, @12:37AM

      by VLM (445) on Tuesday May 06 2014, @12:37AM (#39985)

      There is of course a hidden assumption that the linode guys are charging list price.

      They're good guys in general (aside from one legendary security issue) and I would not be surprised if they were providing a discount, especially knowing you can get the purchaser to tell people all day they have a 4096 all day even if it never runs about 1% utilization so "giving it away" for 5% list price still runs a profit if SN is only using 1% of it.

      I have no connection with linode at all, other than being a very long term customer (like about the time they started up, years and years and years ago) and being a fan of caker other that that time he lost my CC and charges to some UK gambling site appeared. If we ever get really bored, it would be humorous to try to figure out which SN admin / notable happens to be caker, assuming he is here.

  • (Score: 2) by VLM on Monday May 05 2014, @08:42PM

    by VLM (445) on Monday May 05 2014, @08:42PM (#39934)

    In before the first claim kerberos doesn't work over ipv6.

    Yes it was a total shotgun which specific features and abilities worked under ipv6, and it took from like 1.4 till 1.9 to support everything, but by 1.9 absolutely everything works under ipv6 as far as I know. There never was a forklift upgrade WRT ipv6 where it went from no work to all work, it was one little thing at a time. If I recall correctly the last thing fixed was being able to change your password over ipv6, or maybe it was KDC replication, whatever its all history now.

    Its kerberos's cool friend openafs which is still sorta kinda ipv4 only sorta, despite being under occasional development as a feature for a decade or so. Frankly openafs is gonna have to forklift, I think, it get ipv6, implemented all at once everywhere. That kinda sucks. I like AFS and use it extensively at home.

    • (Score: 2) by NCommander on Monday May 05 2014, @08:45PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday May 05 2014, @08:45PM (#39936) Homepage Journal

      Actually, it works better when using IPv6 addressing due to the fact that you can have globally-routable addresses, and don't run into stupid issues when you have multiple private IP spaces overlapping; I've seen some "clever" hacks to deal with kerberos across multiple sites that were using private IPv4 address space. The biggest headache is setting up IPv6 reverse-dns, but there are zone file generators which help with that.

      --
      Still always moving
  • (Score: 2) by VLM on Monday May 05 2014, @08:54PM

    by VLM (445) on Monday May 05 2014, @08:54PM (#39937)

    "allow us to be able to centrally revoke and replace any compromised keys instead of playing the age-old game of editing authorized_keys in 20 places."

    I use kerberos the same way you do, for totally different situation of course. Before that, in the 00s or late 90s, I had a directory full of keys, one to a file, and a pretty dumb script that cat them together and sent them out via puppet (before puppet I had my own wanna-be puppet, like everyone else)

    At a former employer they had an interesting hybrid solution where everyone's ssh keys were cat together like my strategy and scp'd out in a long script that invariably hung up or crashed halfway thru. Those refusing to implement puppet are doomed to reinvent it, poorly.

    For extra fun you can centralize and distribute host keys too. Of course sysadmins have been trained for decades to ignore when a host key changes and just Y thru or rm ~/.ssh/known_hosts and try connecting again, so the obvious advantage is quickly eliminated.

    You can also store ssh keys in their individual files in a git, and have a crontab on each machine that creates and then installs a new file if a change is detected. So you create a new authorized_keys and if its md5sum doesn't match the md5sum of the local copy, you replace the real file with the new file, this is because if you replace it every 5 minutes regardless of change or just cat a new one into place every 5 minutes, you end up crashing cron jobs that started during the instant of creating the new file in place or whatever.

  • (Score: 4, Interesting) by codemachine on Monday May 05 2014, @10:41PM

    by codemachine (1333) on Monday May 05 2014, @10:41PM (#39959)

    Interesting that SN gives us a behind the scenes the same day as Pipedot releases their source.

    Different approaches, but similar goals.

    • (Score: 2) by NCommander on Tuesday May 06 2014, @03:47AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday May 06 2014, @03:47AM (#40014) Homepage Journal

      I actually wrote this last night, and just scheduled it to go live when it did. For the record, our full source has been available on github since day 1, for anyone to loose their sanity with :-)

      --
      Still always moving
  • (Score: 3) by kaszz on Tuesday May 06 2014, @04:27AM

    by kaszz (4211) on Tuesday May 06 2014, @04:27AM (#40024) Journal

    It would be useful if there was a forum to communicate found bugs or missing features.

  • (Score: 0) by Anonymous Coward on Tuesday May 06 2014, @06:55AM

    by Anonymous Coward on Tuesday May 06 2014, @06:55AM (#40050)

    Ask my Mrs!

  • (Score: 1, Interesting) by Anonymous Coward on Tuesday May 06 2014, @10:30AM

    by Anonymous Coward on Tuesday May 06 2014, @10:30AM (#40088)

    Great article. I'm looking forward to learn more about how a website like SN is working, as the only experience I have with running websites, is a site built in Netscape Composer and hosted on Geocities in 1997.

     
    How do you find out how many servers to rent and how powerful they should be? Is there a specific plan you can use to calculate this?

    • (Score: 1, Interesting) by Anonymous Coward on Tuesday May 06 2014, @12:54PM

      by Anonymous Coward on Tuesday May 06 2014, @12:54PM (#40128)

      I remember when my lab was small enough to use fun names for servers. We used country names. When we had 15 it wasn't bad, but when we grew past 20 I started hearing shouts from developers like, "How to you spell 'Liechtenstein'?" and "Is 'Austria' a database or an app server?" Now that we have 50 some physical, and god knows how many virtual, we use boring, functional names like DEV-DB03, PROD-WEB06, and TEST-PROXY01.

  • (Score: 2) by GlennC on Tuesday May 06 2014, @12:30PM

    by GlennC (3656) on Tuesday May 06 2014, @12:30PM (#40121)

    I'd much rather read more about the nuts and bolts of the site than about the name voting or where to incorporate.

    Those are administrative minutiae, barely worthy of a two-line blurb.

    --
    Sorry folks...the world is bigger and more varied than you want it to be. Deal with it.
  • (Score: 2) by tynin on Tuesday May 06 2014, @12:33PM

    by tynin (2013) on Tuesday May 06 2014, @12:33PM (#40122) Journal

    How big is your gluster volume, and how has it treated you so far? I imagine you are using it to keep the site content current across slashd/web server instances? Have you run into any file locking / gluster self healing problems?

    • (Score: 3, Informative) by NCommander on Tuesday May 06 2014, @06:18PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday May 06 2014, @06:18PM (#40256) Homepage Journal

      Aside from the fact that gluster's IPv6 support is more mythical than reality, we've had relatively few issues. We're not using it as its intended, more as NFS on steroids, and have it set as a 3x1 brick configuration (that is to say, every machine is mirroring the entire gluster setup). THe biggest issue is that write speed took a dive. Buildout and deployment used to take 1-2 minutes tops; now it takes 15 because slash touches every file when it installs (we can optimize the makefile to be less ugly in this regard, but ...)

      Gluster has the advantage that its relatively simple compared to ceph, and doesn't require the madness DRDB NFS requires, which seems to be the alternative for HA NFS. If we add more web nodes in the future, I'll likely reconfigure gluster to a 2x2 brick configuration, but with only three nodes, the 3x1 is saner.

      --
      Still always moving
    • (Score: 3, Informative) by NCommander on Tuesday May 06 2014, @06:21PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday May 06 2014, @06:21PM (#40258) Homepage Journal

      Oops, didn't fully answer the question. According to comments in slashcode, the othersite runs with a single NFS server, with the web frontends running ro. We *could* use georeplication for this, but in case we have to offline boron, I just want to be able to start slashd on the webnodes, which requires r/w permissions to the webfolder. The entire server stack (perl 5.10/apache 1.3/slash/etc.) lives on gluster.

      We're not using a seperate partition or dedicated drive for this (which I know is against recommended practice) since we're using a VPS, so when I discussed it with others on the team, none of us saw that there was any real advantage of repartitioning the web frontends. The gluster storage is stored in /srv/glusterfs-replication, and the gluster client connects via loopback to the server running on each node. I think we've got 2-3 GiB of stuff on the gluster partition, mostly CPAN + source files; slash itself is only about 10 MiB.

      --
      Still always moving
  • (Score: 1) by pTamok on Tuesday May 06 2014, @05:08PM

    by pTamok (3042) on Tuesday May 06 2014, @05:08PM (#40231)

    Thank-you for a very interesting article. Please ignore criticism from the peanut gallery.

    If you can afford it, I'm strongly of the opinion that one should have a site that is overkill on hardware/compute capacity for normal situations, so that one has plenty in reserve for when things are under pressure for whatever reason. Better overkill than underperformance when one really needs it.