Stories
Slash Boxes
Comments

SoylentNews is people

posted by on Sunday May 28 2017, @09:11AM   Printer-friendly
from the chaos-is-always-one-crash-away dept.

Serious problems with British Airways' IT systems have led to thousands of passengers having their plans disrupted, after all flights from Heathrow and Gatwick were cancelled.

Passengers described "chaotic" scenes at the airports, with some criticising BA for a lack of information.

The airline has apologised, and told passengers not to come to the airport.

BA chief executive Alex Cruz said: "We believe the root cause was a power supply issue."

In a video statement released via Twitter, he added: "I am really sorry we don't have better news as yet, but I can assure you our teams are working as hard as they can to resolve these issues."

Mr Cruz said there was no evidence the computer problems were the result of a cyber attack.

-- submitted from IRC


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Informative) by tonyPick on Sunday May 28 2017, @09:26AM (13 children)

    by tonyPick (1237) on Sunday May 28 2017, @09:26AM (#516692) Homepage Journal
    • (Score: 3, Insightful) by anubi on Sunday May 28 2017, @09:31AM (3 children)

      by anubi (2828) on Sunday May 28 2017, @09:31AM (#516694) Journal

      They may have well outsourced their hardware engineers too.

      --
      "Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
      • (Score: 3, Touché) by c0lo on Sunday May 28 2017, @10:35AM (2 children)

        by c0lo (156) Subscriber Badge on Sunday May 28 2017, @10:35AM (#516703) Journal

        They may have well outsourced their hardware engineers too.

        In the same country, known for its reliable power [wikipedia.org]?

        --
        https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
        • (Score: 2) by tomtomtom on Sunday May 28 2017, @11:49AM (1 child)

          by tomtomtom (340) on Sunday May 28 2017, @11:49AM (#516713)

          If their site is large enough to have its own substation (which seems pretty likely), then they likely have a significant single point of failure right there. Substations do fail reasonably frequently and the effects can be surprising - for example I have personally lost water supply for c.1 day as a result of such a failure previously - and a water pumping station is reasonably simple to cold start. Even if they had a backup substation it would be likely to be fed from the same part of the national grid (it is very expensive to get national grid to install a backup spur from a separate part of the network if its even possible) so there is still an issue of single point of failure potentially.

          • (Score: 2) by kaszz on Sunday May 28 2017, @01:45PM

            by kaszz (4211) on Sunday May 28 2017, @01:45PM (#516738) Journal

            The solution is usually diesel-generator backup.

            But oh.. that cost money. Can't have the 20M$ yacht then.. horror!

    • (Score: 3, Insightful) by kaszz on Sunday May 28 2017, @01:53PM

      by kaszz (4211) on Sunday May 28 2017, @01:53PM (#516741) Journal

      Will the saved costs make up for the compensations galore and part of the customers taking business elsewhere?

      I'll hope people getting the outsource shaft does the "oops I didn't mention how to fix script /root/path/deep/daily_fixing.sh, the backup machine or that the transformer needs a dust cleaning regularly". Almost like a logic bomb ;)

      Makes me wonder what other airlines that has outsourced recently? the "avoid these incompetent airlines list".

    • (Score: 4, Interesting) by MrGuy on Sunday May 28 2017, @02:01PM (4 children)

      by MrGuy (1007) on Sunday May 28 2017, @02:01PM (#516742)

      United, Delta, and SWA have all had similar massive IT failures causing similar disruption over the past few years. All of them have significant onshore development and IT groups.

      https://www.washingtonpost.com/news/the-switch/wp/2016/08/08/what-you-need-to-know-about-the-massive-delta-computer-outage/ [washingtonpost.com]
      https://nypost.com/2017/01/22/united-airlines-grounds-all-flights-due-to-computer-outage/ [nypost.com]
      https://www.dallasnews.com/business/southwest-airlines/2016/07/25/2300-canceled-flights-later-southwest-airlines-recovers-technical-outage [dallasnews.com]

      Travel is a massively failure-prone industry. Most of the "under the hood" technology that powers many major airlines is either greenscreen mainframe, or code that has to integrate with greenscreen mainframe sufficiently well that it has to follow some paradigms that make the flow of data very fragile and subject to massive disruption from failures in multiple places.

      I don't love the current massive outsourcing trend either, but it's hardly obvious that full time domestic employees would have fared any better.

      • (Score: 3, Informative) by Nerdfest on Sunday May 28 2017, @02:34PM (3 children)

        by Nerdfest (80) on Sunday May 28 2017, @02:34PM (#516753)

        I've worked with a lot of these systems, more interfacing with them, than coding, but still. Like many other early adopter industries, they're mainly mainframe COBOL spaghetti that's just hanging together. It doesn't take much for a change to bring the pile crashing down.

        • (Score: 2) by kaszz on Sunday May 28 2017, @02:44PM (2 children)

          by kaszz (4211) on Sunday May 28 2017, @02:44PM (#516757) Journal

          What stops them from ditching the mainframe COBOL spaghetti and start over with more modern tools?
          (provided all manager style thinking and webmonkeys are kept at a safe distance)

          • (Score: 3, Informative) by Nerdfest on Sunday May 28 2017, @03:32PM

            by Nerdfest (80) on Sunday May 28 2017, @03:32PM (#516769)

            I've found that they tend to reject almost any improvements out of fear it will break something ... a valid fear based on the quality of the system they'd be building on. These are the Everest of technical debt. That, combined with the cost (they usually think just throw money at IBM for an all or nothing rewrite rather than extracting business logic, reporting, presentation, interfaces, etc). That, and the usual "Death by MBA" mindset that doesn't see that it will be a huge cost savings in computing power, maintenance, etc. Keep in mind lots of these guys are buying IBM mainframes that cost tens of millions of dollars, and are then paying them to use the processing power on it. But hey, that's an "operational cost".

          • (Score: 4, Interesting) by MrGuy on Sunday May 28 2017, @04:03PM

            by MrGuy (1007) on Sunday May 28 2017, @04:03PM (#516774)

            A few things are problematic.

            One of the most important is a negative network externality. Most travel (not just airfares) is distributed through global distribution systems [wikipedia.org], which are basically the same underlying technology and data structures. This is how different groups like travel agencies and various suppliers talk to each other. Revamping YOUR internal systems to be different from how everyone else does it would potentially cut you off from a lot of customers (even major airlines sell about 2/3rds of their tickets through third parties, with a few exceptions like Southwest in the US which self distributes). While you could in theory change your internal model and build some "backwards compatibility integrations" to keep those channels open, in practice it would put a lot of limitations on how things work. If everyone else switched to a new, better model for reservations, then it would make a lot of sense for any airline to move towards it. But there's significant cost to being first.

            Also, travel is still a fairly heavily regulated industry, and there are a lot of rules around how tickets work, how airlines need to account for unused value, etc. A lot of those regulations grew up around the current model for how airlines regulate inventory, pricing, and reservations. There's a ton of potential downside risk from compliance violations from trying to rearchiect too much.

            Finally, there's a massive inertia in the industry. Right now, the focus is on new products, new ways to communicate with customers, etc., all of which have a decent short-term ROI, and all of which can be (and are being) bolted on to the existing deep down structures. To really fix the issue at the core, a company would need to spend years blowing up all their core technology, and then making sure they didn't break anything attached to it that works right now, etc. Which also means standing still while your competitors introduce new things. That's a ton of cost. And that in a industry which doesn't have a massive return from technology - airlines make their money moving planes around, which they can already do to an acceptable degree, which means there's not a huge immediate return from a rewrite. There's a lot of cost to avoid (e.g. in MRO, crew and plane scheduling, predictive analytics, etc), but the airlines can already do a lot of this on top of the (bad) technical stack they already have. To top it off, any massive technical migration is likely to cause its own outages in the short term (since they'd be replacing a "known working" tech stack with an unproven one.

            Bottom line - while massive outages like this cost a lot of money, it's still seen as an acceptable amount of loss when compared to the investment it would take to make the underlying tech more reliable. An occasional big outage is seen as acceptable. Or, at least, sufficiently unavoidable to not justify the cost of reducing/eliminating them. Whether you believe airlines are long term rational in that assessment is left as an exercise to the reader.

    • (Score: 3, Insightful) by Whoever on Sunday May 28 2017, @03:17PM

      by Whoever (4524) on Sunday May 28 2017, @03:17PM (#516766) Journal

      I hope someone goes to the shareholders' meeting and asks if the cost of this disruption was greater than the savings through outsourcing IT.

      If you can't operate your business when IT is down, then it should be a core competency.

    • (Score: 2) by quietus on Sunday May 28 2017, @04:49PM

      by quietus (6328) on Sunday May 28 2017, @04:49PM (#516793) Journal

      I wouldn't be so dismissive of the power supply issue.

      Without going into details, I know of a couple of organizations which faced the same kind of problems, starting about one and a half year ago. At least one of those organizations had redundancy galore and didn't spare any costs, to the point of replicating their core infrastructure, purely to be used for training purposes.

      An anonymous coward on El Reg [theregister.co.uk] noted that in the past few days 2 major UK organizations had the same problems. I'd hold my breath before judging here. Maybe there will be a Nanog paper in future about what really happened.

    • (Score: 0) by Anonymous Coward on Monday May 29 2017, @02:43AM

      by Anonymous Coward on Monday May 29 2017, @02:43AM (#516980)

      This comes about a year after significant outsourcing of IT where BA replaced the UK workers to cut costs...

      Now they're really saving money on things like fuel and airplane wear-and-tear and all those previously busy employees.

  • (Score: 3, Funny) by pkrasimirov on Sunday May 28 2017, @10:08AM

    by pkrasimirov (3358) Subscriber Badge on Sunday May 28 2017, @10:08AM (#516699)

    > "our teams are working as hard as they can"
    He means they are facing problem and kindly doing the needful.

  • (Score: 2, Informative) by Pax on Sunday May 28 2017, @12:39PM (1 child)

    by Pax (5056) on Sunday May 28 2017, @12:39PM (#516722)

    EU rules you see, up to 600 euros each plus hotels and accomodation fees for those held up overnight!

    If your flight has been delayed by at least three hours or cancelled then you have the right to compensation under European law.

    Under EU Regulation 261/2004, passengers are entitled to up to €600 (£509) in compensation when their flight lands at their destination more than three hours late.

    But airlines don't always have to pay out and can avoid doing so if the delay is caused by an extraordinary circumstance, such as bad weather or crew strikes.

    If your flight has been delayed or cancelled you should be able to claim compensation
    +2

    If your flight has been delayed or cancelled you should be able to claim compensation

    Previously, airlines routinely refused to pay out for delays caused by technical faults, claiming they counted as extraordinary events.

    But in 2014 two landmark Supreme Court rulings declared that carriers should pay out when a delay was caused by a technical fault

    baboom!
    http://www.thisismoney.co.uk/money/holidays/article-2271213/How-claim-EU-flight-delay-compensation-EC-261-2004.html [thisismoney.co.uk]
    http://europa.eu/youreurope/citizens/travel/passenger-rights/air/index_en.htm [europa.eu]
    https://en.wikipedia.org/wiki/Flight_Compensation_Regulation_261/2004 [wikipedia.org]

    • (Score: 2, Insightful) by Anonymous Coward on Sunday May 28 2017, @02:04PM

      by Anonymous Coward on Sunday May 28 2017, @02:04PM (#516743)

      That is exactly the sort of meddling from unelected foreign bureaucrats that turned Britain into the shithole it is today.

  • (Score: 1) by SemperOSS on Monday May 29 2017, @04:25PM

    by SemperOSS (5072) on Monday May 29 2017, @04:25PM (#517198)

    It is a major failure of BA that they do not seem to have contingency plans that have been regularly tested to prevent this from being a short-term nuisance instead of the IT disaster it became. Heads should roll. The CTO at least and probably also the people responsible for the contingency plans.

    I am working as a consultant Enterprise Solutions Architect and I do not think I would have lasted this long if I did not design in contingency plans for severely critical systems like this. I know the consequences are more severe by several degrees, but how would you feel if you were in a hotel/shopping centre/industrial complex/... on fire without proper evacuation routes, procedures and plans?

    --
    I don't need a signature to draw attention to myself.
    Maybe I should add a sarcasm warning now and again?
(1)