Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Thursday May 07 2015, @07:03AM   Printer-friendly
from the not-as-promoted-as-y2k-bug dept.

A surprisingly simple bug afflicts computers controlling planes, spacecraft and more – they get confused by big numbers. As Chris Baraniuk discovers, the glitch has led to explosions, missing space probes and more.

Tuesday, 4 June 1996 will forever be remembered as a dark day for the European Space Agency (Esa). The first flight of the crewless Ariane 5 rocket, carrying with it four very expensive scientific satellites, ended after 39 seconds in an unholy ball of smoke and fire. It's estimated that the explosion resulted in a loss of $370m (£240m).

What happened? It wasn't a mechanical failure or an act of sabotage. No, the launch ended in disaster thanks to a simple software bug. A computer getting its maths wrong – essentially getting overwhelmed by a number bigger than it expected.

How is it possible that computers get befuddled by numbers in this way? It turns out such errors are answerable for a series of disasters and mishaps in recent years, destroying rockets, making space probes go missing, and sending missiles off-target. So what are these bugs, and why do they happen?

Imagine trying to represent a value of, say, 105,350 miles on an odometer that has a maximum value of 99,999. The counter would "roll over" to 00,000 and then count up to 5,350, the remaining value. This is the same species of inaccuracy that doomed the 1996 Ariane 5 launch. More technically, it's called "integer overflow", essentially meaning that numbers are too big to be stored in a computer system, and sometimes this can cause malfunction.

Such glitches emerge with surprising frequency. It's suspected that the reason why Nasa lost contact with the Deep Impact space probe in 2013 was an integer limit being reached.

And just last week it was reported that Boeing 787 aircraft may suffer from a similar issue. The control unit managing the delivery of power to the plane's engines will automatically enter a failsafe mode – and shut down the engines – if it has been left on for over 248 days.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Informative) by Anonymous Coward on Thursday May 07 2015, @07:38AM

    by Anonymous Coward on Thursday May 07 2015, @07:38AM (#179795)

    As far as I know, Ariane was not a bug. When coding for rockets, everything is checked and double checked. That includes inputs. For example, if the rockets can accelerate the rockets by N m/s^2, the inputs from the speed measuring equipment is checked against this value, and if a speed measuring device returns a much higher value, this device is marked as defective. This is not a problem, because there are three or four of everything.

    The problem was (again, as far as I know) that the code in question was written for Ariane 4, not 5. The Ariane 5 was more powerful, and could just accelerate faster. The computer received an out of range value from one of the speed measuring devices, and marked it as defective. It then received an out of range value from the next device, and marked it as defective too. Another two readings, and all four devices were marked as defective, leaving the system with no way to measure the speed.

    In short: Not a bug, everything behaved as designed. Unfortunately, somebody decided to reuse the design without adjusting the valid range.

    Starting Score:    0  points
    Moderation   +4  
       Informative=4, Total=4
    Extra 'Informative' Modifier   0  

    Total Score:   4  
  • (Score: 5, Insightful) by maxwell demon on Thursday May 07 2015, @08:18AM

    by maxwell demon (1608) on Thursday May 07 2015, @08:18AM (#179802) Journal

    Design bugs are bugs, too. And the design bug here was to reuse the Ariane 4 module unchanged.

    --
    The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @09:41AM

    by Anonymous Coward on Thursday May 07 2015, @09:41AM (#179825)

    There is only a dual modular redundancy for the Inertial Reference System (SRI) in Ariane 5.
    See: http://esamultimedia.esa.int/docs/esa-x-1819eng.pdf [esa.int] section 2.1 page 3 for the report.

    As far as I know, dual modular redundancy (https://en.wikipedia.org/wiki/Dual_modular_redundant [wikipedia.org]) seems to be a standard practice in European space systems, except maybe where voting is needed/desired (e.g. thermal measurements...)

  • (Score: 3, Informative) by bootsy on Thursday May 07 2015, @12:15PM

    by bootsy (3440) on Thursday May 07 2015, @12:15PM (#179861)

    I had always been told this was due to the Ada language the software was written in throwing an exception and this exception overwriting an area of memory that held the rocket direction variables. Since the whole thing was embedded there was only a small working area of memory. If you've ever programmed in Ada you will know it is very fussy ( read type safe ) and I believe it was the first language to implement exceptions although I'm sure a reply will turn up showing an example before this.

    • (Score: 2) by darkfeline on Thursday May 07 2015, @08:17PM

      by darkfeline (1030) on Thursday May 07 2015, @08:17PM (#180055) Homepage

      Like all revolutionary programming paradigms, exception handling was first implemented/invented in Lisp.

      https://en.wikipedia.org/wiki/Exception_handling#Exception_handling_in_software [wikipedia.org]

      --
      Join the SDF Public Access UNIX System today!
      • (Score: 2) by bootsy on Friday May 08 2015, @08:21AM

        by bootsy (3440) on Friday May 08 2015, @08:21AM (#180238)

        Thanks for link. I love this quote about Ada from it, so relevant.

        "...a plethora of features and notational conventions, many of them unnecessary and some of them, like exception handling, even dangerous. [...] Do not allow this language in its present state to be used in applications where reliability is critical[...]. The next rocket to go astray as a result of a programming language error may not be an exploratory space rocket on a harmless trip to Venus: It may be a nuclear warhead exploding over one of our own cities."