Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Tuesday September 01 2020, @04:17PM   Printer-friendly
from the was-anything-of-value-lost? dept.

CenturyLink outage led to a 3.5% drop in global web traffic:

US internet service provider CenturyLink has suffered a major technical outage on Sunday after a misconfiguration in one of its data centers created havoc all over the internet.

Due to the technical nature of the outage -- involving both firewall and BGP routing -- the error spread outward from CenturyLink's network and also impacted other internet service providers, ending up causing connectivity problems for many more other companies.

The list of tech giants who had services go down because of the CenturyLink outage includes big names like Amazon, Twitter, Microsoft (Xbox Live), EA, Blizzard, Steam, Discord, Reddit, Hulu, Duo Security, Imperva, NameCheap, OpenDNS, and many more.

Cloudflare, which was also severely impacted, said CenturyLink's outward-propagating issue led to a 3.5% drop in global internet traffic, which would make this one of the biggest internet outages ever recorded.

If someone can cause this much chaos accidentally, how much damage could someone deliberately cause?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0, Offtopic) by fustakrakich on Tuesday September 01 2020, @06:18PM (5 children)

    by fustakrakich (6150) on Tuesday September 01 2020, @06:18PM (#1045000) Journal

    Looks like the internet doesn't have near the redundancy that we should expect.

    Client/server is the problem. Too centralized.

    --
    La politica e i criminali sono la stessa cosa..
    Starting Score:    1  point
    Moderation   -1  
       Offtopic=1, Total=1
    Extra 'Offtopic' Modifier   0  

    Total Score:   0  
  • (Score: 2, Informative) by Anonymous Coward on Tuesday September 01 2020, @07:50PM (4 children)

    by Anonymous Coward on Tuesday September 01 2020, @07:50PM (#1045060)

    Looks like the internet doesn't have near the redundancy that we should expect.

    Client/server is the problem. Too centralized.

    Yet again you spout off on something about which you have no clue. And, as usual, you're flat wrong.

    the issue that took down AS3356 [peeringdb.com] was directly related to the *non-centralized*, distributed nature of BGP. Specifically one of its extensions called flowspec [ietf.org], which allows stuff other than routes (packet filtering rules) to be distributed via the BGP protocol.

    In fact, it was the distributed nature of BGP that both caused and prevented the quick resolution of the issue. It was also the distributed nature of BGP that solved the problem as well.

    Cloudflare has an interesting writeup [cloudflare.com] on this.

    I'll explain, and I'll use small words so you'll be sure to understand. Although I won't hold my breath.

    Cloudflare speculates (since Level3/CenturyLink hasn't issued any sort of post-mortem as yet) that overly broad flowspec rule(s) (generally used for DDOS [wikipedia.org] mitigation) were distributed to the routers in AS3356 [peeringdb.com] (an Autonomous System [wikipedia.org], which are also completely decentralized).

    The overly broad flowspec rule(s) *appear* (again, this is speculation, although Cloudflare provides some reasonably convincing evidence that this is the case) to have caused high CPU utillization on the affected routers.

    Such high CPU utilization had several impacts:
    1. The routers were unable to route traffic, causing the initial outage;
    2. The routers were unable to update their BGP tables.

    Under normal circumstances, when a router is unable to service traffic (see 1 above), the networks around it will switch network links to pass traffic via a different path, clearing the problem for each connected network. I'd note that this is *not* centralized at all. Each network impacted needs to modify their network configuration (specifically the BGP routes that they advertise) to use a different path.

    However, because the AS3356 routers were unable to update their BGP tables, they continued to advertise routes for which they were no longer able to service, and, as such, the peers of AS3356 continued to propagate those incorrect routes. Since incorrect routes were being advertised by AS3356, quite a bit of traffic was being sent via paths where the endpoints were not, in fact, available.

    It wasn't until the major peers of AS3356 were set to ignore BGP updates from AS3356 that the affected routers' CPU usage came down enough to allow the offending flowspec rule(s) and incorrect BGP routing tables to be updated, allowing correct routes to be propagated.

    Every step along the way to both creating the problem (incorrect flowspec rule(s)) as well as mitigating/solving (individual networks modifying their BGP tables, individual peers ignoring AS3356) the problem were completely distributed and non-centralized.

    So no. This issue had nothing to do with too much centralization (that's a *different* issue) on the Internet.

    But you go ahead and continue to spout off your ignorant bullshit. We can all continue to get good laughs at your expense.

    • (Score: 1) by fustakrakich on Tuesday September 01 2020, @08:01PM (3 children)

      by fustakrakich (6150) on Tuesday September 01 2020, @08:01PM (#1045063) Journal

      What "expense"? What I give is always free. What you take, well, that's your issue

      It's still a client/server setup. And the equipment should be more robust. You built a system that barely functions when it was new. Forget to close a bracket and it tumbles like a house of cards. It's as shameful as the electrical grid collapsing when a squirrel gets into the transformer.

      --
      La politica e i criminali sono la stessa cosa..
      • (Score: 0) by Anonymous Coward on Tuesday September 01 2020, @08:26PM (2 children)

        by Anonymous Coward on Tuesday September 01 2020, @08:26PM (#1045073)

        Moron.

        You're not even worth wasting a mod point on.

        • (Score: 0, Offtopic) by fustakrakich on Tuesday September 01 2020, @08:33PM (1 child)

          by fustakrakich (6150) on Tuesday September 01 2020, @08:33PM (#1045075) Journal

          Sure I am... Fire away! You know you want to!

          --
          La politica e i criminali sono la stessa cosa..
          • (Score: 1, Touché) by Anonymous Coward on Wednesday September 02 2020, @12:10AM

            by Anonymous Coward on Wednesday September 02 2020, @12:10AM (#1045171)

            Offtopic(spurt)

            Now he probably wants a cigarette...