Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Monday June 24 2019, @12:37PM   Printer-friendly
We are aware of issues when trying to access the site. First noticed at approx. 0300 UTC. Our servers look okay. It appears there may be issues with upstream connectivity.

Also, Linode is planning some server reboots over the next week or so. We will try to give advance notice and keep downtime to a minimum.

Update: Everything seems to have quieted down. Many many thanks to NotSanguine for jumping in and lending his expertise to help identify and isolate where things were borked.

Indications are that a bad BGP (Border Gateway Protocol) route was published causing a relatively small AS (Autonomous System) to have all traffic to/from a large fraction of the internet attempt to go through its routers.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Funny) by NPC-131072 on Monday June 24 2019, @02:28PM (12 children)

    by NPC-131072 (7144) on Monday June 24 2019, @02:28PM (#859356) Journal

    For those interested:

    Lawnmower man who just managed to light up the cell phone of every network and sysadmin in the world with network alerts is / was likely employed by a customer or peer of Level3.

    Starting Score:    1  point
    Moderation   +2  
       Informative=1, Funny=1, Total=2
    Extra 'Funny' Modifier   0  

    Total Score:   3  
  • (Score: 5, Informative) by NotSanguine on Monday June 24 2019, @03:06PM (11 children)

    Reading through the HN link posted [ycombinator.com], it appears that this *broad* outage was the result of a BGP [wikipedia.org] mis-configuration by Allegheny Technologies (ASN 396531).

    Apparently, they began broadcasting a route for their /24 network, but someone apparently fat-fingered the mask to be /4 instead of /24.

    For the less technical, that means they were broadcasting that a huge swath of the internet should be routed through their routers.

    Apparently, their upstream ISP (there was some mention of Verizon in the link above, but I haven't seen confirmation of this) didn't do any validation and rebroadcast the incorrect route.

    Other upstream network providers apparently did so as well, causing network traffic for a significant portion of the internet to be pushed at a small company in Pittsburgh, PA.

    Once there, it was unceremoniously dropped in the bit bucket and likely overloaded their network interfaces as well.

    --
    No, no, you're not thinking; you're just being logical. --Niels Bohr
    • (Score: 2, Informative) by NPC-131072 on Monday June 24 2019, @03:49PM (8 children)

      by NPC-131072 (7144) on Monday June 24 2019, @03:49PM (#859378) Journal

      NANOG Writeup [nanog.org]

      • (Score: 4, Informative) by NotSanguine on Monday June 24 2019, @04:11PM (7 children)

        So it wasn't specifically Allegheny's fault.

        From the post you linked [nanog.org]:

        It appears that one of the implicated ASNs, AS 33154 "DQE Communications
        LLC" is listed as customer on Noction's website:
        https://www.noction.com/clients/dqe [noction.com]

        I suspect AS 33154's [peeringdb.com] customer AS 396531 [ipinfo.io] turned up a new circuit with Verizon, but didn't have routing policies to prevent sending routes from
        33154 to 701 [peeringdb.com] and vice versa, or their router didn't have support for RFC 8212 [ietf.org].

        So a cluster-fuck by two ISPs who should know better, and small company that didn't and relied on the "pros" to handle things properly. Lovely.

        --
        No, no, you're not thinking; you're just being logical. --Niels Bohr
        • (Score: 3, Funny) by NPC-131072 on Monday June 24 2019, @05:07PM (5 children)

          by NPC-131072 (7144) on Monday June 24 2019, @05:07PM (#859409) Journal

          Yup, working theory is that ATI Metals (Allegheny) had a new circuit connected, multihomed with DQE (existing) and Verizon (new). DQE ran a BGP optimizer, these more specific roots then leaked from DQE via ATI to Verizon (who further propagated).

          Anybody expect the "fix" to this problem will "amazingly" be more centralized control of the internet?

          • (Score: 3, Insightful) by NotSanguine on Monday June 24 2019, @05:47PM (4 children)

            Anybody expect the "fix" to this problem will "amazingly" be more centralized control of the internet?

            Actually, no.

            I'd expect requiring stricter adherence to RFC 8212, as well as something akin to Cloudflare's rPKI. In addition to those steps, I'd hope to see some standards around verifying that advertised routes actually *make sense* in their context before re-advertising them.

            With the first and third items above, the major outage today would never have happened, and ATI Metals, DQE and Verizon would have dealt with this *before* any routes were advertised to the rest of the 'net

            Once such workable solutions are in place, with some lead time, I'd expect peers to reject BGP routes that aren't in compliance, thus eliminating most inadvertent *and* malicious BGP advertisements. No centralization required.

            There would be some corner cases that would likely still crop up, but we wouldn't be seeing this nearly as often.

            Besides, it would be quite difficult, if not impossible to centralize something like BGP, since the whole point of the protocol is that it's decentralized.

            What's more, there's no way you'd get the IETF to even *try* to centralize BGP. Have you ever been to an IETF meeting or participated in a working group? Not gonna happen.

            --
            No, no, you're not thinking; you're just being logical. --Niels Bohr
            • (Score: 2) by NPC-131072 on Monday June 24 2019, @09:11PM (3 children)

              by NPC-131072 (7144) on Monday June 24 2019, @09:11PM (#859503) Journal

              There's a reason IANA doesn't have the root cert isn't there? 5 RIR roots (trust anchors) would still be centralizing control over routing authority. The legislative reach argument about regional Vs. national Internet Registries and balkanization ignores that any CA is by definition a centralized point of failure. WoT [wikipedia.org] to do?

              • (Score: 2) by NotSanguine on Monday June 24 2019, @09:58PM (2 children)

                Note, that I said "something akin to rPKI" not rPKI.

                Cryptographic signatures can be useful *without* centralization.

                Especially since verification needs to be done *between* peers/upstream/downstream providers, with signatures being confirmed to be valid by each peer, then updated again before being forwarded to the next set of peers. Which does not require anything top-down or centralized, just verification and trust between peers.

                Why don't you write a protocol spec using RFC 7353 [ietf.org] that can be conformed with RFC 8212 rather than playing "gotcha" with me?

                I'm sure we'll all appreciate your hard work. I look forward to reading your Internet Draft when you're done.

                --
                No, no, you're not thinking; you're just being logical. --Niels Bohr
                • (Score: 2) by NPC-131072 on Tuesday June 25 2019, @12:33AM (1 child)

                  by NPC-131072 (7144) on Tuesday June 25 2019, @12:33AM (#859556) Journal

                  Note, that I said "something akin to rPKI" not rPKI.

                  Noted.

                  Why don't you write a protocol spec using RFC 7353 [ietf.org] that can be conformed with RFC 8212 rather than playing "gotcha" with me?

                  Wasn't playing "gotcha" but we've gone from origin to path validation. [ietf.org] Giving RIRs (or LIRs) the technical ability to revoke certs will surely make their politicization inevitable?

                  • (Score: 2) by NotSanguine on Tuesday June 25 2019, @03:05AM

                    Wasn't playing "gotcha" but we've gone from origin to path validation. [ietf.org] Giving RIRs (or LIRs) the technical ability to revoke certs will surely make their politicization inevitable?

                    Fair enough.

                    But I haven't *gone* anywhere. I think you misunderstand me.

                    Given that it behooves peers, as well as upstream/downstream providers to play it straight with each other, *especially* when it comes to BGP, given that they *need* each other to carry/forward their network traffic as expeditiously and efficiently as possible.

                    There's no profit in refusing to verify a BGP update signature via the public key provided by a peer. Once such signature is verified, the receiving peer *should* verify that the routes make sense WRT routes currently being advertised by other peers (whose signatures they *also* verify). Once such a BGP update has been validated, the receiving peer needs to *re-sign* the update with its own private key, with the public key associated with it having been securely shared with *its* peers, then forward that update to its peers.

                    The next hop should do the same thing. Ad inifinitum.

                    Given that these peers have a vested interest in maintaining those relationships, it's unclear to me why they would, unless it's warranted (e.g., malicious route updates, repeated errors in route updates, etc.), revoke the public key of a peer.

                    What "political" advantage would *anyone* get by doing so? All you'll end up doing is cutting off your nose to spite your face.

                    --
                    No, no, you're not thinking; you're just being logical. --Niels Bohr
        • (Score: 1, Interesting) by Anonymous Coward on Tuesday June 25 2019, @01:58AM

          by Anonymous Coward on Tuesday June 25 2019, @01:58AM (#859575)

          So a cluster-fuck by two ISPs who should know better, and small company that didn't and relied on the "pros" to handle things properly. Lovely.

          Additional information: DQE is the new (again) name for Duquesne Light, the incumbent electricity supplier to the Greater Pittsburgh area. The electic company has tons of right of way and easements to move cables around. An age ago I worked for their regional neighbor, who was also trying to play ISP at the time.

    • (Score: 0) by Anonymous Coward on Tuesday June 25 2019, @08:09AM (1 child)

      by Anonymous Coward on Tuesday June 25 2019, @08:09AM (#859645)

      So is this suspicious or it isn't because Russia or China aren't involved? ;)
      https://arstechnica.com/information-technology/2018/11/strange-snafu-misroutes-domestic-us-internet-traffic-through-china-telecom/ [arstechnica.com]

      • (Score: 2) by NotSanguine on Tuesday June 25 2019, @02:09PM

        Why don't you ask these folks [atimetals.com]? They're a pretty suspicious looking bunch, I'd say.

        Alex Jones is reporting that moments after this happened, large funds transfers from both Russian and Chinese banks were made to a day-care center in Lakeland, FL. and Netcraft confirms it! Definitely something evil going on, if you ask them.

        I'd take a look at this [zdnet.com] and save it offline somewhere, as it's likely to be taken off-line pretty soon to protect sources and methods.

        Since I cannot verify your security clearances, I can neither confirm nor deny the type of milk (skim, 1%, 2% or whole) I put in my coffee. That assumes I use milk at all. Black? With cream? Half and half? Do I even drink coffee?

        I'm sorry. You'll need to go through your handler for a query like this.

        I'll answer your question by citing Secret UN Resolution NWO-666 [wikipedia.org].

        Don't contact me again!

        --
        No, no, you're not thinking; you're just being logical. --Niels Bohr