Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Tuesday October 29 2019, @11:59PM   Printer-friendly
from the stop-looking-at-the-wrong-thing dept.

From the following story:

Amazon has still not provided any useful information or insights into the DDoS attack that took down swathes of websites last week, so let's turn to others that were watching.

One such company is digital monitoring firm Catchpoint, which sent us its analysis of the attack in which it makes two broad conclusions: that Amazon was slow in reacting to the attack, and that tardiness was likely the result of its looking in the wrong places.

Even though cloud providers go to some lengths to protect themselves, the DDoS attack shows that even a company as big as Amazon is vulnerable. Not only that but, thanks to the way that companies use cloud services these days, the attack has a knock-on impact.

"A key takeaway is the ripple effect impact when an outage happens to a third-party cloud service like S3," Catchpoint noted.

The attack targeted Amazon's S3 - Simple Storage Service - which provides object storage through a web interface. It did not directly target the larger Amazon Web Services (AWS) but for many companies the end result was the same: their websites fell over.

[...] Amazon responded by rerouting packets through a DDoS mitigation service run by Neustar but it took hours for the company to respond. Catchpoint says its first indications that something was up came five hours before Amazon seemingly noticed, saying it saw "anomalies" that it says should have served as early warnings signs.

When it had resolved the issue, Amazon said the attack happened "between 1030 and 1830 PST," but Catchpoint's system shows highly unusual activity from 0530. We should point out that Catchpoint sells monitoring services for a living so it has plenty of reasons to highlight its system's efficacy, but that said, assuming the graphic we were given [PDF] is accurate - and we have double-checked with Catchpoint - it does appear that Amazon was slow to recognize the threat.

Catchpoint says the problem is that Amazon - and many other organizations - are using an "old" way of measuring what's going on. They monitor their own systems rather than the impact on users.

"It is critical to primarily focus on the end-user," Catchpoint argues. "In this case, if you were just monitoring S3, you would have missed the problem (perhaps, being alerted first by frustrated users)."

-- submitted from IRC


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Interesting) by redneckmother on Wednesday October 30 2019, @02:21AM (6 children)

    by redneckmother (3597) on Wednesday October 30 2019, @02:21AM (#913546)

    I'm not not as "sharp" as I used to be. That being said....

    This incident has echoes from a book I read long ago... "Normal Accidents", By Charles Perrow.

    Tightly coupled / complexly interactive systems fail in unpredictable ways.

    --
    Mas cerveza por favor.
    • (Score: 0) by Anonymous Coward on Wednesday October 30 2019, @02:45AM (4 children)

      by Anonymous Coward on Wednesday October 30 2019, @02:45AM (#913565)

      Attacks aren't accidents.

      • (Score: 2, Informative) by redneckmother on Wednesday October 30 2019, @03:42AM (3 children)

        by redneckmother (3597) on Wednesday October 30 2019, @03:42AM (#913583)

        Methinks you missed my point.

        While an attack isn't an accident, the failure of the provider / developer to design a system resistant to such an attack IS.

        --
        Mas cerveza por favor.
        • (Score: -1, Troll) by Anonymous Coward on Wednesday October 30 2019, @03:56AM (2 children)

          by Anonymous Coward on Wednesday October 30 2019, @03:56AM (#913589)

          OK, I'll just punch you in the face, and we'll see how your broken nose is your own failure to design yourself an unbreakable face. Nothing is my fault for punching you. It's a normal accident, completely the system's fault.

          • (Score: 3, Interesting) by c0lo on Wednesday October 30 2019, @07:54AM

            by c0lo (156) Subscriber Badge on Wednesday October 30 2019, @07:54AM (#913621) Journal

            Where's my "Don't you dare to touche me" ** mod when I need it.

            ** In a complex and interactive systems environment, the continuation could be "... or else accidents may happen to you and it will be your fault. Fair warning"

            --
            https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
          • (Score: 5, Insightful) by c0lo on Wednesday October 30 2019, @02:13PM

            by c0lo (156) Subscriber Badge on Wednesday October 30 2019, @02:13PM (#913702) Journal

            Before being smug, you may stop and think a bit. Here are some points:

            If a complex system is designed for resilience, it has high chances to survive a partial failure, no matter the cause of it - DDoS included. If you only would be actually more interested or just curious than to "display your muscles", who knows what you may have discovered? Maybe something like Resilience Design Patterns [arxiv.org] from Oak Ridge National Laboratory (yeap, the ones maintaining and used several of the Top 50 supercomputers. I have this nagging feeling they know a bit more than you about resilience).

            Now, if those complex systems are not designed for resilience, very interesting things may happen. Things like cascading failures [wikipedia.org], in which a network goes down because of a single node failure, in spite of the entire network having more than enough capacity to handle the load.

            And this is far from being specific to computers. In spite of the idiots wanting to see the world burn down in ashes (I have moments when I'm among them), there actually are things like too big to fail [wikipedia.org] (if you're too lazy to follow the link, think of cascading bank runs).

            Now, the problem is that designing for resilience and operating resilient systems will require extra cost. Be it only because such designs wil apply to large/complex systems, handling complexity requires brains (as opposed to punching muscles) and those brains aren't cheap.

            So yeah I can see as plausible a statement like "AWS went down because of systemic faults, it just happened this time to be triggered by a DDoS, but it was bound to happen sooner or later starting from a large variety triggers". Things like fat fingers taking down half the Web [soylentnews.org]

            --
            https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 1, Interesting) by Anonymous Coward on Wednesday October 30 2019, @03:04PM

      by Anonymous Coward on Wednesday October 30 2019, @03:04PM (#913728)

      > "Normal Accidents", By Charles Perrow

      Enough of the book is available on Google Books to give a good flavor of the thesis. Starts with an everyday sequence of "bad luck" that cascades into a failure to make an important appointment -- tl:dr version is: car won't start, neighbor who normally would loan a car happens to have that car in for service, public transport (not normally used by this car owner) is surprisingly down for a strike, taxi system overloaded due to bus driver strike (and all this was written well before "ride sharing" was an option).

      Systems fail for dumb reasons, in ways that are hard to project. Only gets worse with more complexity and more optimization.

      I'm keeping my stick shift car with wind up windows and manual door locks! It does have electronic fuel injection & ignition, so starts up consistently in all weather, but that system isn't coupled to any of the other parts of the car.

  • (Score: 2) by jmichaelhudsondotnet on Wednesday October 30 2019, @07:04PM

    by jmichaelhudsondotnet (8122) on Wednesday October 30 2019, @07:04PM (#913832) Journal

    When the cloud breaks, the report on the failure is equally cloudy.

    Can I claim this as a law of technology?

    I personally would be more comfortable calling the cloud the 'crystal palace of self-serving fantasy' or 'moongel catacomb of over-simplified wishes'

    You can to trust these systems to the extent that you say nothing their owners want you to say, and to the extent this powers most of the internet, your chances for censorship on your vacation just rose significantly.

    And of course, to the extent they did not change what you are allowed to say in the last 5 minutes. Or no one in their vast corporate hierarchy leaked your credentials to someone who wants to.

    Or, also, when Jeff Bezos pisses off anyone. Hard to imagine eh? Heck, his wife could buy her own cloud at this point and simply ddos him at critical moments. She might even be able to afford some zero day cpu flaws with that kind of cash.

    We might be better off with Zeus just being able to strike us with lightning rather than this intenational rich douchebag duel we are all tied to the mast of presently.

    But what do I know. Oh, yeah,

    thesesystemsarefailing.net

(1)