We had an hour or so or downtime today. After debugging, the root cause came from the SSL certificates we use to establish a database connection from the webserver to the actual DB. As a prelude GoLive, we migrated from unencrypted connections to encrypted connections as we have to cross the Linode internal LAN. In an attempt to improve data security, we generated a set of SSL certificates and used those to encrypt the MySQL connections. In the flurry of golive, no one thought to check the expiry date on said certificates. Out of the box, OpenSSL generates certificates with a one month expiry unless manually changed.
As you might expect, one month later, the certificates expired, and the database stopped accepting remote connections. New certificates were generated with a ten year expiration, and we continue to work towards better documenting our internal processes on the wiki to prevent this sort of thing from happening again. Apache, and slashd are running again, and we appear to be back to status-quo in terms of site operation.
A full incident report will be written up and posted to the wiki in the next few days.