Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


About Today's Site Explosion

Posted by NCommander on Thursday April 17 2014, @04:07AM (#304)
7 Comments
Soylent

Since we've got a fair number of complaints about us running too many site news articles, I'm going to condemn this to my journal, then link it next time we *do* post something about the site. For a large portion of today (4/16), SoylentNews users had issues with commenting, and moderation was completely hosed. This was due to a backend change; we shifted the site behind a loadbalancer in preparation of bringing up a new frontend and give us considerably more redundancy and latitude with working with the backend.

This change had been setup on dev for the last week with us testing it to see what (if anything) broken, and it was discussed and signed off by all of the staff. Last night, I flipped the nodebalancer to connect to production instead of dev, then changed the DNS A record for the site to point at the loadbalancer.

I stayed up for several hours at this point to ensure nothing odd was going on, and satisfied that the world would keep spinning, I went to bed. What I found though was I broke the formkeys system. Slash knows about the X-Forwarded-By header, a mechanism for when a site is behind a proxy on how to relay client IP information (this mechanism was already used by both varnish and nginx), however, for security reasons, we strip out the XFF header from inbound connections unless its on a specific whitelist. On both dev and production, we had whitelisted the nodebalancer to pass this header in properly.

Or so we thought. Linode's documentation doesn't mention, but the IP address listed in the admin interface is *not* the IP used to connect to the site; instead it uses a special internal IP address which isn't listed or documented anywhere. Our security precautions stripped out the X-Forwarded-By header, and made it appear that all inbound users were coming from the same IP. This wasn't noticed on dev as slash ignores the formkeys system for admins, and the few of us beating on it with non-admin accounts weren't able to do enough abuse to trigger the formkey limiters.

Our peak hours are generally evenings EDT, which means the low traffic at night wasn't enough to trip it either (or at least no one on IRC poked me about it, nor were there any bugs on it on our github page. However, once traffic started picking up, users began to clobber each other, commenting broke, and the site went to straight to hell. When I got up, debugging efforts were underway, but it took considerable time to understand the cause of the breakage; simply reverting LBing wasn't an easy fix since we'd still have to wait for DNS to propagate and we needed the load balancer anyway. After a eureka moment, we were able to locate the correct internal IPs, and whitelist them, which got the site partially functional again. (we have informed Linode about this, and they said our comments are on its way to the appropriate teams; hopefully no other site will ever have this same problem).

The last remaining item was SSL; we had originally opted out of terminating SSL on the loadbalancer, prefering to do it on the nginx instance, so Port 443 was set to TCP loadbalancing. This had the same effect as there is no way for us to see the inbound IP (I had assumed it would do something like NAT to make connections appear like they were coming from the same place). The fix was utlimately installing the SSL certificate on the load balancer, then modifying varnish to look for the X-Forwarded-Proto header to know if a connection was SSL or not. I'm not hugely happy about this as it means wiretapping would be possible between the load balancer and the node, but until we have a better system for handling SSL, there isn't a lot we can do about it.

As always, leave comments below, and I'll leave my two cents.

Why The Proxy Detection Code Pissed Me Off

Posted by NCommander on Thursday April 10 2014, @12:54AM (#277)
11 Comments
Soylent

Now that I've had some time to clear my head, I want to expand on my original feelings. I'm pissed off about this, and my temper flared through on the original post. I'm leaving it as is because I'm not going to edit it to make myself look better, and because it sums up my feelings pretty succinctly. How would you feel if something you worked on under the promise of building the best site for a community was regularly and routinely causing corporate firewalls and IDS systems to go off like crazy?

You'd be pissed. Had we known about this behaviour in advance, it would have been disabled at golive or in a point release, and a minor note would have gone up about it. Instead, I found out because we were tripping a user's firewall causing the site to get autoblocked. I realize some people feel this is acceptable behaviour, but a website should *never* trigger IDS or appear malicious in any way. Given the current state of NSA/GCHQ wiretapping and such, it means that anything tripping these types of systems is going to be looked at suspiciously to say the least. I'm not inherently against such a feature (IRC networks check for proxying for instance), but its clearly detailed in the MOTD of basically every network that does it.

There wasn't a single thing in the FAQ that suggested it, and a Google search against the other site didn't pop something up that dedicated what was being done; just a small note that some proxies were being blocked. Had the stock FAQ file, or documentation, or anything detailed this behaviour, while I might still have thought it wrong, at least I wouldn't have gotten upset about it. I knew that there was proxy scanning code in slashcode, but all the vars in the database were set to off; as I discovered, they're ignored leading me to write a master off switch in the underlying scanning function.

Perhaps in total, this isn't a big deal, but it felt like a slap in the face. I know I have a temper, and I've been working to keep it under wraps (something easier said than done, but nothing worthwhile is ever easy). CmdrTaco himself commented on this on hackernews and I've written a reply to him about it. Slashdot did what they felt was necessary to stop spam on their site, and by 2008, slashcode only really existed for slashdot itself; other slash sites run on their own branches of older code. Right or wrong, such behaviour should be clearly documented, as its not something you expect, and can (and has) caused issues to users and concerns due to lack of communication. Transparency isn't easy, but I have found its the only way to have a truly healthy community. Perhaps you disagree. I'll respond to any comments or criticisms left below.

rainbow irc

Posted by crutchy on Friday April 04 2014, @07:26AM (#251)
0 Comments
Code

https://github.com/crutchy-/test/blob/master/karma_published.php

bacon+
(only single +/- to differentiate from bender)

~karma bacon

~rainbow pretty text

etc

todo: quotes

Bacon trick

Posted by Yog-Yogguth on Wednesday April 02 2014, @01:14AM (#245)
9 Comments
/dev/random

Don't use a skillet for your bacon, use your oven! In my case 225 degrees Celsius for about 10 minutes results in perfectly crisp bacon simmering in its own fat.

I use a sheet pan in the middle of the oven, two layers of baking paper under the bacon, and leave room for some half-baked small baguettes that I add when the remaining time is right.

Take it all out, slice the baguettes, put on bacon, put on cheese (maybe some cheddar slices) = simple and quick filling hot bacon & cheese sandwich.

Next time I do this I'll try wrapping the paper around the bacon to minimize any grease splatter. I might have to add a bit more baking time to get it as crisp since it's loosely covered.

Ovens are also great for making super-crisp sausages but I've only tried it with the thick kind that are about 3cm or 1 and 1/2 inches across: bake them until they rupture! Exploded sausages taste a lot better but be careful as they're really hot.

crunch has been re-tasked

Posted by crutchy on Saturday March 29 2014, @02:14PM (#239)
0 Comments
Code

no more searching

reset color:

~color -1

bold white:

~color 00

change color per mirc values: http://www.mirc.com/colors.html

~color 01

thru

~color 15

requote last in weird and wonderful ways (or show about):

~

bot doesn't quote itself (shows about)
atm only verbs ending in "ing" and a small set of nouns recognised, but this will grow

if you're interested in contributing (even just to the arrays) have a squiz at:
https://github.com/crutchy-/test/blob/master/bacon.php

anyone new to git, have a squiz at http://wiki.soylentnews.org/wiki/User:Crutchy#Git.2FGitHub
you can also edit directly on github (ideally only for simple changes such as additions to arrays).

todo: add collective noun substitution
todo: add ability to append arrays from within irc

thanks heaps mrbluze... ideas man and english extraordinaire

Site Backend Changes

Posted by NCommander on Friday March 28 2014, @09:15AM (#237)
4 Comments
Soylent

We're testing a new configuration between the site and the database. There may be unexpected issues with the site while we're testing. Keep calm and carry on.

crunch irc search bot

Posted by crutchy on Thursday March 27 2014, @12:18PM (#233)
0 Comments
Code

https://github.com/crutchy-/test/blob/master/crunch.php

designed to quote either the last thing said by a nick or the last thing said by a nick containing a search query

usage:
~
quotes a little about string including github source link
~q or ~quit
tells bot to quit
~find nick
quotes last thing said by nick (in local recorded log files)
~find nick query
quotes last thing said by nick that contains query (in local recorded log files)

code is fairly short and (hopefully) sweet. no comments sorry.

TODO: search online logs @ http://logs.sylnt.us/

php irc bot for posting wiki content

Posted by crutchy on Tuesday March 25 2014, @12:29PM (#226)
1 Comment
Code

i was inspired to work on this after i saw mention of piping irc to the wiki @ http://wiki.soylentnews.org/wiki/CommunitySupport#Projects

it's been tested some but is still a work in progress.
getting around the anti-spam/anti-bot features of wiki is something i'll have to consult a wizard on.

https://github.com/crutchy-/test/blob/master/bot.php

i'm not a professional programmer so it probably sucks.
any criticisms etc are welcome, and if i can be bothered i may even take them on board, or you can do a pull request if you feel like having a play.

this is my first open source code file :-)

Overhaul of Server Backend

Posted by NCommander on Monday March 24 2014, @06:48AM (#222)
0 Comments
Soylent

So I'm pretty sure you're all aware, but I've gone through and done a massive amount of work on the backend and infrastructure in the name of sanity, proper user permissions and such, and documenting as much as I can.

As a note, a lot of this was brought on by the fact we have relatively credible threat against the site, so I wanted to go through and make sure everything was in good shape and hardened (there's a lot of good bits here). I might have gone overboard. Here's the cliff notes version of what was done.

  * Static Status Page

http://status.soylentnews.org

This is on boron in /var/www, we should probably move it to Oxygen in case the entire linode DC goes down, but its fine there for now

  * Through documentation on node access, SSH, etc.

Basically, the links here http://wiki.soylentnews.org/wiki/SystemAdministration are required reading for all staff who play with dev, or production.

There are still gaps, varnish, slash, and apache only have limited documentation which is outdated, but I'll try and get those written in the next few days

  * Node renaming

This one might seem silly, but its sometimes hard to know what we're refering to when we talk about webserver/etc and a specific node. While at the moment we have no redundancy, I changed the hostnames of everything. The original soylent-* names are aliased in the internal DNS. List is here:

http://wiki.soylentnews.org/wiki/SystemAdministration/TheHitchhikersGuideToTheli694-22Domain

  * Internal DNS

Major thanks to xlefay for getting this up and running. All nodes exist in an internal li694-22 TLD, and are both forward and reverse resolvable (needed to make kerberos work properly, and make life easier).

  * Dev server

Announced, but falls into stuff done this weekend :-).

  * Varnish

I drastically reworked the varnish configuration file for better performance. The server is considerably more responsive than it used to with apache hit considerably less. As a side effect, slash hitcounts will be skewed as ACs will not be counted.

Rate limiting to prevent DOS was implemented, and xlefay pounded the dev server with some impressive apachebench numbers to confirm we won't go down. The dev server is much more loaded than production due to sharing the database, so I'm optimistic it will take a serious effort to pound us into oblivion with just ab or similar tools from a few nodes.

  * Disabled static page generation

This has been a PITA and on the TODO for awhile. Dynamically generated pages are now used for articles and comments. Varnish caches for ACs on a 5 minute basis. Logged in users get access to the site directly

  * SSL on Production

Doesn't fully work, but I reworked the nginx termination, and the varnish configuration so it is possible to login and use SSL. slash redirects the login to http, but the cookie gets properly set now so if you login SSL then reload the SSL page, it works. Need someone to dig into slash and figure out why ConnectionIsSSL is returning false. Need a volunteer to setup nginx termination on dev to debug.

  * LDAP setup

God, this was a pain, but we have a full LDAP setup on helium now. Replication to boron is on the TODO list, so if helium goes down, SSH authethication goes down, which is a bad thing. People with linode accounts can access the console and log in as root directly

Documentation (with pictures!) here: http://wiki.soylentnews.org/wiki/SystemAdministration/LDAPManagementForDummies

  * Passwords logged and recorded

Went through, made sure every password is saved in a master PW file which is in helium in root's home directory. sysops should keep a local copy of this file as its needed to use lish to access boxes should LDAP be down. Other important passwords like mysql, LDAP, and kerberos are also in this file.

  * Centralized ACLs

All machines require that a user be in the correct POSIX group to access them. List of groups is available here. This ensures that also everyone who has access can have it

http://wiki.soylentnews.org/wiki/SystemAdministration/GroupPermissions

  * SSH Policy

This one probably going to cause me some flack, but you need to go through the staff box (boron) to access any more. I don't like having open SSH ports on any of our nodes because it feels like we have our balls in the wind and a misconfiguration can leave us vulnerable.

I'm not kidding on that last bit. On production for the last month, slash:slash has worked as a username and password to log into the slash account. Using LDAP doesn't solve this as we still have local accounts for things LIKE slash.

Everyone must use SSH public key to autheticate; keys are stored in LDAP and are pulled on the fly by OpenSSH (this required updating OpenSSH on all nodes with a backport).

I know that due to slashd seizing up at a bad time this caused people to get locked out as I haven't gotten SSH keys from most people. I've got 8 users now with keys in LDAP. Right now, I don't have all the sudo files fully massaged, so if you have access to the dev server, you also have full sudo on all nodes. This isn't really desirable as I believe in limiting permissions, but this is a case of preventing us from going SNAP. Looking for someone to work out the necessary sudo voodoo

Also need someone to write upstart files for apache 1.3 so it comes back on a restart (xlefay is doing this, but feel free to work with him)

  * New Node Bringups

lithium (dev server), carbon (IRC server), and oxygen (offsite backup) were brought up this weekend. Bringup documentation was written here: http://wiki.soylentnews.org/wiki/SystemAdministration/TheRiseAndFallOfNewNodeManagement

  * OpenVPN

Setup a OpenVPN server on boron with magic iptables setup to allow oxygen to access all nodes. There's a fair bit of magic going on here, and I don't have the setup documented yet, but its basically following the Ubuntu Serer documentation for OpenVPN, plus a few iptable rules (saved in /etc/iptables.rules) on boron. Should be pretty self-explainatory.

  * Kerberos

To handle users that can't use ProxyCommand, to make life easier for internode stuff, and to be sexy, kerberos was setup to allow single signon. As most people probably never have managed Kerberos, the quick start guide is here: http://wiki.soylentnews.org/wiki/SystemAdministration/KerberosAdministration

Kerberos replication is setup, but not running as I need to make sure everything is sane. KDC master is helium, slave is boron.

  * AppArmored Apache

This was the real reason for the scheduled downtime last week as we had to migrate to apparmor capable kernels. AppArmor is basically SELinux but less braindead, and I handwrote a config that essentially puts Slash in a straightjacket. This should prevent things like process exploitation or a bug in slash from getting any traction. The apparmor config is installed on both lithium and hydrogen and is in /etc/apparmor.d. If you take a look, Apache can't take a piss without explicate permission :-).

(note, this doesn't do much to help us with SQL injections but every bit helps. Nothing short of a full rewrite of MySQL.pm to use stored procedures will fix this. Any takesr? (or migrating us to pgSQL then doing this?)

There's more to do here, slashd should be apparmored as well, but thats more difficult, and as its not directly user accessible, I'm less concerned that with apache itself. Ideally, every userfacing component should be apparmored (nginx, varnish, and slashd), but the former two run under very restrictive user accounts, and slashd only works with data in the database that already passed through Apache, and for the most part is just simple maintenance scripts, so its not that easy to attack.

I need to write up and document apparmor like I did for other things, but its relatively idiot proof to write files, and it makes good logs in /var/log/syslog.

  * Preparations for offsite backup

We've got a dedicated server (oxygen) with a 500 GIB HDD from http://www.kimsufi.com/en/ for €10 a month in France (oxygen) This will be used for offsite backups. xlefay looking and will be implementing this for all nodes.

  * Ubuntu package repo

As we need to maintain at least one backport, and need other things packaged, I setup a Launchpad PPA to do package building and binary distribution to all nodes: https://launchpad.net/~li69422-staff/+archive/backports-for-precise

This repo is added on all nodes. As you need to know how to do Debian packaging to use it, build an example package or two, and then I'll add you to the team. Its pretty straight forward on how to do this.

  * Staff userdir

Any staff can generate a userdir on boron by creating a public_html and using staff.soylentnews.org/~username

bot talk

Posted by crutchy on Thursday March 20 2014, @12:24PM (#207)
0 Comments
/dev/random

some some notes & snippets from fun with the chat bots in IRC.
times are australian eastern daylight saving time.

[22:25] <@aqu4> crutchy: s/tim/blaat/
[22:27] <crutchy> $sr /i/u/s
[22:27] <@aqu4> s/u/i/
[22:27] <SedBot> <aqu4> /taalb/mit/s :yhctirc

[22:31] <NCommander> O_o;
[22:34] <crutchy> $sr /O_o/o_O/s :rednammoCN
[22:34] <@aqu4> NCommander: s/O_o/o_O/
[22:34] <SedBot> <aqu4> <NCommander> o_O;

[22:39] <crutchy> $sr /O_o/o_O/s :rednammoCN ## yas sb/
[22:39] <@aqu4> /bs say ## NCommander: s/O_o/o_O/

$sr ++nocab
/bs say ## $sr ++nocab
/bs say ## bacon++

yet to try (bender+aqu4+sedbot?):
xyz say first: bacon++
/bs say ## $sr /--/++/s :zxy