Stories
Slash Boxes
Comments

SoylentNews is people

posted by CoolHand on Thursday May 07 2015, @07:03AM   Printer-friendly
from the not-as-promoted-as-y2k-bug dept.

A surprisingly simple bug afflicts computers controlling planes, spacecraft and more – they get confused by big numbers. As Chris Baraniuk discovers, the glitch has led to explosions, missing space probes and more.

Tuesday, 4 June 1996 will forever be remembered as a dark day for the European Space Agency (Esa). The first flight of the crewless Ariane 5 rocket, carrying with it four very expensive scientific satellites, ended after 39 seconds in an unholy ball of smoke and fire. It's estimated that the explosion resulted in a loss of $370m (£240m).

What happened? It wasn't a mechanical failure or an act of sabotage. No, the launch ended in disaster thanks to a simple software bug. A computer getting its maths wrong – essentially getting overwhelmed by a number bigger than it expected.

How is it possible that computers get befuddled by numbers in this way? It turns out such errors are answerable for a series of disasters and mishaps in recent years, destroying rockets, making space probes go missing, and sending missiles off-target. So what are these bugs, and why do they happen?

Imagine trying to represent a value of, say, 105,350 miles on an odometer that has a maximum value of 99,999. The counter would "roll over" to 00,000 and then count up to 5,350, the remaining value. This is the same species of inaccuracy that doomed the 1996 Ariane 5 launch. More technically, it's called "integer overflow", essentially meaning that numbers are too big to be stored in a computer system, and sometimes this can cause malfunction.

Such glitches emerge with surprising frequency. It's suspected that the reason why Nasa lost contact with the Deep Impact space probe in 2013 was an integer limit being reached.

And just last week it was reported that Boeing 787 aircraft may suffer from a similar issue. The control unit managing the delivery of power to the plane's engines will automatically enter a failsafe mode – and shut down the engines – if it has been left on for over 248 days.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Insightful) by bradley13 on Thursday May 07 2015, @11:53AM

    by bradley13 (3053) on Thursday May 07 2015, @11:53AM (#179853) Homepage Journal

    The workings of integers in CPUs is unchanged since earliest days; except for very specialized cases, it is not going to change.

    The main factor at work here is programmer competence (remember our discussion of a few days ago? [soylentnews.org]). I teach students about the behavior of integers no later than the second week of the very first programming course. This comes up again from time-to-time (along with other practical "gotchas").

    Any programmer who creates an integer variable and doesn't consider "what is the largest value that this integer will ever have" has made a fundamental mistake. If not, then ieas like having the hardware throw an exception will not help, because the programmer won't think to build in exception handling. If the integer value is something that counts upwards forever, then the obvious question is "when is forever finished?". A counter that will overflow in 20 years - one may deliberately decide to accept that risk, as long as it is documented (lots of software will still be used in 20 years). A counter that overflows after 248 days? That's maybe not so smart.

    This is also an obvious area where a bit of whitebox testing would catch the problem: look inside the program and find weak spots, like counters that might overflow. Lots of places think blackbox testing is all that is required - in lots of cases, that's true enough. But for anything really important, you need both.

    tl;dr - You can't fix stupid.

    --
    Everyone is somebody else's weirdo.
    Starting Score:    1  point
    Moderation   +2  
       Insightful=2, Total=2
    Extra 'Insightful' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 2) by c0lo on Thursday May 07 2015, @12:50PM

    by c0lo (156) Subscriber Badge on Thursday May 07 2015, @12:50PM (#179870) Journal

    If the integer value is something that counts upwards forever, then the obvious question is "when is forever finished?"

    Huh! Elementary!

    for(uint i=maxVal-1; i>=0; i--) {
       // do something
    }

    -----
    (a bug that eat my soul for 2 days in my early years in software).

    --
    https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
    • (Score: 3, Funny) by PiMuNu on Thursday May 07 2015, @01:25PM

      by PiMuNu (3823) on Thursday May 07 2015, @01:25PM (#179886)

      > (a bug that eat my soul for 2 days in my early years in software).

      if (condition);
              do_something();

      • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @01:39PM

        by Anonymous Coward on Thursday May 07 2015, @01:39PM (#179894)
        retry
  • (Score: 2) by Jesus_666 on Thursday May 07 2015, @06:48PM

    by Jesus_666 (3044) on Thursday May 07 2015, @06:48PM (#180020)
    Then again, a counter that overflows after 248 days when you can assume that any sane person will shut down the system every ten days or so is probably not that terrible a design decision. An airplane that is kept active for 248 days non-stop without any kind of major maintenance is probably going to fall out of the sky long before that counter overflows. So in this case forever can be reasonably assumed to be finished long before overflow becomes an issue.

    Of course you still want the system to fail gracefully just in case you ever make that counter faster.
  • (Score: 2) by tangomargarine on Thursday May 07 2015, @07:33PM

    by tangomargarine (667) on Thursday May 07 2015, @07:33PM (#180041)

    BigInteger [oracle.com]

    So from the above Arianne story, if they lose all detectors the vehicle self-destructs? Hrm...what could possibly go wrong...

    --
    "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"