Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by on Sunday October 30 2016, @02:45PM   Printer-friendly
from the no-fishing-for-me-this-morning dept.

Right, so there's currently a DDoS of our site specifically happening. Part of me is mildly annoyed, part of me is proud that we're worth DDoS-ing now. Since it's only slowing us down a bit and not actually shutting us down, I'm half tempted to just let them run their botnet time out. I suppose we should tweak the firewall a bit though. Sigh, I hate working on weekends.

Update: Okay, that appears to have mitigated it; the site's functional at a reasonable rate of responsiveness.

Update2: Attack's over for now. You may go about your business.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Interesting) by blackhawk on Sunday October 30 2016, @09:14PM

    by blackhawk (5275) on Sunday October 30 2016, @09:14PM (#420643)

    That would really just saturate the outgoing bandwidth, and if the site is hosted on Amazon or similar, it's going to take a lot to saturate the bandwidth.

    It's more effective to find the pages that take the most server resources to serve up and slam those. You want to target ones that are hitting the database a lot, really tie up those back end resources, that will stop pretty much every page that needs DB access from running in a reasonable time-frame.

    A textbook example for me was back in the 00's when I was working with a company who made online software for large companies. There were two teams, I was in charge of one, and the other was writing a CMS for a very popular newspaper.

    Their team has followed all the standard textbook code examples, liberally using factories, running all the SQL queries through a small section of code that ensured connection pooling, and using small well defined requests to the DB to markup their pages. Everything was factored so the code was lovely and readable c# and performed exactly the function you'd expect. There was only one problem - when they went to speed test it on a pair of very fast (for the time) web servers, and a massive quad CPU (Xeon) database server...they got unexpectedly (for them) slow results. They called me over to help them speed it up.

    I fired up the web testing software and benchmarked the main page and a couple of typical pages and the result was laughable. They could only serve up 4 pages / second on average - so, 2 pages / second / web server!

    It took just a couple of minutes to figure out why. Hitting refresh on the main page with the SQL monitor open showed them making something like 30 database calls to mark up that one page. Each call added at least 50-100ms of processing time, despite how simple it was. The menu alone added something like 4 calls to the DB. most of the calls were to data that could be served up a little cold, so rather than getting them to rewrite the whole dam lot, I suggested they identify those parts and use a cacheing scheme on the returned data. That alone had them up to maybe 16 pages / sec. With a few more sensible changes to the simplistic design path they had chosen it was up to 40 pages / sec which was fast enough to ship. It really should have been capable of serving up closer to 200 - 400 pages / sec but the project was well overdue and 40 was the stated target.

    A second more salient example is for another database driven website where there were a couple of queries that took a lot more resources to complete that average. The worst took 20 seconds to complete the SQL when it landed on my desk, but a few hours with query planner and some ad-hoc SQL showed it was T-SQL handling null fields very badly. In the end I could run in it perhaps 0.5 seconds.

    If you can identify pages with queries like those on them, ones that slam the DB and have long execution times (bonus points for locking the tables), you can cripple most active websites far far more than pulling in some JPG files.

    Starting Score:    1  point
    Moderation   +3  
       Insightful=1, Interesting=1, Informative=1, Total=3
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   5  
  • (Score: 5, Informative) by NCommander on Sunday October 30 2016, @10:37PM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Sunday October 30 2016, @10:37PM (#420682) Homepage Journal

    In practice, that attack would be extremely difficult to pull off on SN. Right now, only logged in users actually touch our application servers, everything else is shunted into a varnish cache (ACs basically get 5 minute snapshots of the site at any given moment). If push came to shove, we could enable that for logged in users to increase our cache hits at the cost of updates no longer being in real time. Every taking that into account, each application server is backed by an independent mysqld instance which operates as a hot cache against NDBd (which in turn has its own hot caches). The database is small enough (since its 98% text) that between the shards, the entire thing can be stored in memory between two machines, with the option of spinning up additional DB stores as we need them. Furthermore, we always are overprovisioned; our basic rule is no single service can be at 50% average capacity, in case a node suddenly craps out on us which means we're well in excess of what we need at any given moment, and have redundancy always available.

    If push came to shove, we're architected in such a way that if we need to spin up additional frontends to weather the storm, we could have them up in an hour or so counting replication, setup and reconfiguration. Same with the backend. We did a lot of code optimizations to rehash to the point as a codebase, I suspect it scales better than the other site given the same hardware.

    --
    Still always moving
    • (Score: 2) by blackhawk on Monday October 31 2016, @03:04AM

      by blackhawk (5275) on Monday October 31 2016, @03:04AM (#420754)

      It sounds like SN is architected a lot better than the example I gave. To be fair, some of those things wouldn't have been an option at the time, or would have been expensive, but still - a little internal cacheing goes a long way.

      Their DB was also mostly text, given it was a CMS, it's just they didn't pay attention to how they were accessing it. Most content was static enough that they could have marked up the pages and then written them to a file cache, and just kept serving them from there - invalidating that cache entry when an edit was made. Ironically, the CMS *was* capable of serving up a static copy of the site - code they had written for another client, but they never put two and two together. I think they were just too blinkered with making it all dynamic content without ever considering if it even needed to be dynamic.

      They also never considered options like breaking from third normal form or using temporary tables to get the data moving faster. The lead developer was just too rigid and tunnel visioned on whatever the latest wisdom from the Microsoft bloggers happened to be.

      • (Score: 3, Informative) by NCommander on Monday October 31 2016, @05:03AM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday October 31 2016, @05:03AM (#420787) Homepage Journal

        Slashcode actually originally had this functionality. The database and themes were essentially exported as HTML files and served up to ACs; only accessing the .pl files directly (which happened when logged in) would give you a real-time view (this is what the original "delay in updating static page" meant when you post on the site). Due to coderot, and the fact that generating and syncing this HTML across multiple web frontends was a massive headache, we eventually just migrated everything to using varnish and put a bullet in the static content generation (most of the code is still there but disabled).

        One thing that constantly drives me up a wall though is poor use of existing tools. Originally, this site used memcached as a hotcache, and standard MySQL for backend storage, which lead to an entire class of bugs where the two would disagree with each other and occasionally blow crap up. While memcached is very good at what it does, it adds a massive layer of complexity as you essentially have to map structed data to KV pairs, and deal with dirty reads. We had a lot of pain in the earlier days of the site because of this because memcached isn't a distributed datastore, the assumption is you have a central "cache" and then introduce network latency to the mix which actually tends to cause performance to go down the crapper. We got around this by running memcached on each web frontend, but the dirty read problem got a lot worse because now different parts of the stack might have different ideas on what the current state of everything was.

        The entire problem got solved by moving everything to MySQL cluster which allowed for in-memory/disk-based durability for CP important data such as posts, stories, and user data. Cluster is amazing for high read performance and allows for easy multi-master operation; data is cached in the frontend mysqld cluster instances, and data is stored in the NDB instance backing it. If we needed even better performance, we'd likely migrate things like templates, statistics and such from MySQL to an AP datastore like Cassandra which is ideal for these sorts of cases where the data is non-essential, but very useful to have "at hand" (I know TMB has experimented with Redis, but I don't think it made it to production). Even without cluster, it would have been possible to use MySQL MEMORY tables as a hotcache backed to InnoDB ones to get the same effect (and we will likely have to do something similar with NDB in the future).

        What always staggers me is people praising things like Postgres, but when I look at their code, its basically being used a dumb store with no concept of using FORIEGN KEYS, or stored procedures, or anything that their DB is *good* at, and instead reinventing the wheel.

        --
        Still always moving
    • (Score: 0) by Anonymous Coward on Monday October 31 2016, @11:19AM

      by Anonymous Coward on Monday October 31 2016, @11:19AM (#420828)

      Furthermore, we always are overprovisioned; our basic rule is no single service can be at 50% average capacity, in case a node suddenly craps out on us which means we're well in excess of what we need at any given moment, and have redundancy always available.

      Quite obviously you don't have an MBA supervising your performance, or else you'd surely not be allowed to "waste" so many resources. ;-)

  • (Score: 3, Informative) by NCommander on Sunday October 30 2016, @10:39PM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Sunday October 30 2016, @10:39PM (#420684) Homepage Journal

    Oh, and to answer, we're hosted on Linode. Amazon might be better to better wither a DDoS (we've gone into degraded performance because someone blasted Linode's data centers off the net before), but we'd loosing IPv6 capability. We could also deploy a CDN or a LOT of other things before we'd run out of options to weather a storm.

    --
    Still always moving
  • (Score: 2) by The Mighty Buzzard on Sunday October 30 2016, @11:00PM

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Sunday October 30 2016, @11:00PM (#420692) Homepage Journal

    Heh, right now I'm pretty certain I could knock us off the net entirely with as little as half a dozen hosts. But then I know exactly what hits the DB the hardest and there's a pull request fixing it sitting on github as we speak.

    --
    My rights don't end where your fear begins.
  • (Score: 0) by Anonymous Coward on Monday October 31 2016, @12:02AM

    by Anonymous Coward on Monday October 31 2016, @12:02AM (#420714)

    Ah the 'its pure' design. Well yeah it is readable. But runs like crap.

    It usually takes a mess like that to teach someone that 'yeah 50 round trip calls across the network are not good'. Cache it and aggregate it.

    With one system I worked on a stored proc was digging through the same LARGE table 15 times to get different columns. One bit of caching it into a temp table and moving stuff around it went from 1.5 million reads to about 12 reads.

    • (Score: 2) by NCommander on Monday October 31 2016, @04:48AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday October 31 2016, @04:48AM (#420783) Homepage Journal

      Honestly, you can get both without having a ton of crap as your codebase. Build an interface to do what you want, then slide a caching layer below it. If you've done it right, you get the best of both worlds. While not completely invisible, if you're using caching sanely, and know exactly how it operates, you can drastically reduce cache misses while rarely having to explicately code around the problem of dirty reads (MySQL cluster handles this for us). Rehash's core basically is proof; it's not the most brilliant pile of Perl ever written, but CmdrTaco actually knew what he was doing from a software development perspective. Most of the garbage we stripped out were later additions that were basically tacked on.

      --
      Still always moving