Submitted via IRC for Bytram
Airbus A350 software bug forces airlines to turn planes off and on every 149 hours
Some models of Airbus A350 airliners still need to be hard rebooted after exactly 149 hours, despite warnings from the EU Aviation Safety Agency (EASA) first issued two years ago.
In a mandatory airworthiness directive (AD) reissued earlier this week, EASA urged operators to turn their A350s off and on again to prevent "partial or total loss of some avionics systems or functions".
The revised AD, effective from tomorrow (26 July), exempts only those new A350-941s which have had modified software pre-loaded on the production line. For all other A350-941s, operators need to completely power the airliner down before it reaches 149 hours of continuous power-on time.
Concerningly, the original 2017 AD was brought about by "in-service events where a loss of communication occurred between some avionics systems and avionics network" (sic). The impact of the failures ranged from "redundancy loss" to "complete loss on a specific function hosted on common remote data concentrator and core processing input/output modules".
In layman's English, this means that prior to 2017, at least some A350s flying passengers were suffering unexplained failures of potentially flight-critical digital systems.
Not a power of two. I wonder why 149 hours?
(Score: 3, Insightful) by Rich on Friday July 26 2019, @10:51AM (1 child)
I was maintaining/porting the software for a device (regulated market, too). Since its inception in the 80s, it had the issue that it couldn't cross midnight, because the time slices it scheduled its operation in were based on past midnight. To comply with regulations, such stuff gets documented, but not fixed (as with the article's Airbus reboot). I decided, because the port would need full validation anyway, that this had to be fixed. The guy who did the original code used floats for some internal calculations, which I could just re-use as I rebased the timeslice calculation to the system's epoch (in that case, January 1, 1904; guess the reason for porting). It worked perfectly well, passed tests with flying colours, and was shipped to customers happy that they got rid of the reboot routine.
A few months later, reports came in that the timeline display of the schedule started getting weird, mostly scheduled actions being off by one slot.
I investigated and found that the float code I continued to use started to lose precision, because the 2^23 time slices a single float could hold were just the time span between January 1904 and a few months after we shipped...
I should've known better, because that was just the kind of bug that blew up the Ariane 5 rocket on its first launch a few years earlier.
(Score: 3, Insightful) by Rich on Friday July 26 2019, @10:57AM
... especially if they amount to more than 50 years.
It's 2^24 for IEEE-754 single, of course. 1 sign, 7 exponent, 24 mantissa.