Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 12 submissions in the queue.
posted by CoolHand on Thursday May 07 2015, @07:03AM   Printer-friendly
from the not-as-promoted-as-y2k-bug dept.

A surprisingly simple bug afflicts computers controlling planes, spacecraft and more – they get confused by big numbers. As Chris Baraniuk discovers, the glitch has led to explosions, missing space probes and more.

Tuesday, 4 June 1996 will forever be remembered as a dark day for the European Space Agency (Esa). The first flight of the crewless Ariane 5 rocket, carrying with it four very expensive scientific satellites, ended after 39 seconds in an unholy ball of smoke and fire. It's estimated that the explosion resulted in a loss of $370m (£240m).

What happened? It wasn't a mechanical failure or an act of sabotage. No, the launch ended in disaster thanks to a simple software bug. A computer getting its maths wrong – essentially getting overwhelmed by a number bigger than it expected.

How is it possible that computers get befuddled by numbers in this way? It turns out such errors are answerable for a series of disasters and mishaps in recent years, destroying rockets, making space probes go missing, and sending missiles off-target. So what are these bugs, and why do they happen?

Imagine trying to represent a value of, say, 105,350 miles on an odometer that has a maximum value of 99,999. The counter would "roll over" to 00,000 and then count up to 5,350, the remaining value. This is the same species of inaccuracy that doomed the 1996 Ariane 5 launch. More technically, it's called "integer overflow", essentially meaning that numbers are too big to be stored in a computer system, and sometimes this can cause malfunction.

Such glitches emerge with surprising frequency. It's suspected that the reason why Nasa lost contact with the Deep Impact space probe in 2013 was an integer limit being reached.

And just last week it was reported that Boeing 787 aircraft may suffer from a similar issue. The control unit managing the delivery of power to the plane's engines will automatically enter a failsafe mode – and shut down the engines – if it has been left on for over 248 days.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by Anonymous Coward on Thursday May 07 2015, @09:01AM

    by Anonymous Coward on Thursday May 07 2015, @09:01AM (#179816)

    Except that the Ariane did test for overflow. However the correct flight path of Ariane 5 simply was guaranteed to overflow the modules designed for Ariane 4. The modules then gave error messages instead of flight data values. Since that was true for all the modules (which of course all used the same code), the redundancy didn't help.

    Starting Score:    0  points
    Moderation   +3  
       Informative=3, Total=3
    Extra 'Informative' Modifier   0  

    Total Score:   3  
  • (Score: 3, Interesting) by PiMuNu on Thursday May 07 2015, @11:57AM

    by PiMuNu (3823) on Thursday May 07 2015, @11:57AM (#179855)

    The mantra my colleague in nuclear industry has is "diversity and redundancy". So the coolant water has to have a backup pump and you need a backup air coolant system as well. Multiple failures then require some common failure mode like a Tsunami to break the system...

    • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @02:27PM

      by Anonymous Coward on Thursday May 07 2015, @02:27PM (#179922)

      Redundancy does not help, when the data really IS outside the allowed range, because someone reused the code from an older generation rocket with less power.