Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 19 submissions in the queue.
posted by Fnord666 on Friday July 26 2019, @06:07AM   Printer-friendly
from the have-you-tried-turning-it-off-and-back-on-again? dept.

Submitted via IRC for Bytram

Airbus A350 software bug forces airlines to turn planes off and on every 149 hours

Some models of Airbus A350 airliners still need to be hard rebooted after exactly 149 hours, despite warnings from the EU Aviation Safety Agency (EASA) first issued two years ago.

In a mandatory airworthiness directive (AD) reissued earlier this week, EASA urged operators to turn their A350s off and on again to prevent "partial or total loss of some avionics systems or functions".

The revised AD, effective from tomorrow (26 July), exempts only those new A350-941s which have had modified software pre-loaded on the production line. For all other A350-941s, operators need to completely power the airliner down before it reaches 149 hours of continuous power-on time.

Concerningly, the original 2017 AD was brought about by "in-service events where a loss of communication occurred between some avionics systems and avionics network" (sic). The impact of the failures ranged from "redundancy loss" to "complete loss on a specific function hosted on common remote data concentrator and core processing input/output modules".

In layman's English, this means that prior to 2017, at least some A350s flying passengers were suffering unexplained failures of potentially flight-critical digital systems.

Not a power of two. I wonder why 149 hours?


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by sshelton76 on Friday July 26 2019, @11:08AM (3 children)

    by sshelton76 (7978) on Friday July 26 2019, @11:08AM (#871414)

    It's like 3 AM when I posted, the math isn't going to be exact and doesn't need to be. It's just something I encountered once and seemed relevant. 140 to 160 hour reboot cycle it's going to be milliseconds accumulating in an int somewhere. The rest is how I know that :)

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Informative) by coolgopher on Friday July 26 2019, @01:07PM (2 children)

    by coolgopher (1157) on Friday July 26 2019, @01:07PM (#871434)

    Not ragging on you, I just got curious about the number discrepancy and when I looked at it an 8kHz multiplier fell out of it. Which, on top of milliseconds of course makes it an 8MHz clock, which is rather common for middle-of-the-range mcus. Seems eminently plausible to me.

    Pretty sure anyone who has done any serious amount of work on embedded systems has had to fix up somebody else's timer/counter code that didn't handle wrap around correctly. And their own. Though I had certainly expected better from the avionics industry.

    • (Score: 0) by Anonymous Coward on Friday July 26 2019, @09:50PM

      by Anonymous Coward on Friday July 26 2019, @09:50PM (#871633)

      Ok. That makes more sense. I was thinking of a tick rate of one per some standard or base 10 unit of time. Any other rate didn't occur to me. But yes 2^32 events divided by 8000 events per second is equal to 149 hours, 7 minutes, 50 seconds, and 114/125ths (7296 events).

      Given that, the explanation given makes sense, something counts up somewhere and then divides by zero after rolling over. Although I could have sworn C had a way to detect overflows (or maybe C++ or C#? Don't know, am a mostly-Python guy), I always do a check on ctypes to make sure that the greater than or less than relationship I'm expecting holds. The same holds with the languages I know that do check that anyway. So, I can't believe they would miss something so obvious; but, then again, I've seen people cause all sorts of problems by not understanding the basics (/ returns a float, // returns floored integer, but the code uses both (usually verbatim copies from the internet)).

    • (Score: 2) by sshelton76 on Saturday July 27 2019, @03:19AM

      by sshelton76 (7978) on Saturday July 27 2019, @03:19AM (#871740)

      This is exactly correct.

      I should also state that these are distinct hardware components.
      There was an ambient light sensor attached to a box that sampled the sensor and turned it into data for the bus.
      Then there was the receiving end which fetched data off the bus and took action to adjust the brightness and contrast of the sign.

      The fetch rate of the receiver was once per ms. The receiver had the divide by zero error.
      The transmitter had the rollover issue.

      So by the time anyone thought to read data off the bus, we would just see a continuously incrementing (or decrementing) counter value being broadcast and literally no one suspected the receiver wasn't prepared to handle a roll over.