Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Thursday February 16 2017, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by meustrus on Thursday February 16 2017, @05:34PM

    by meustrus (4961) on Thursday February 16 2017, @05:34PM (#467889)

    In my Computer Science classes, we were taught that "undefined behavior" means anything can happen. It could do what you want, or it could throw an exception. Or it could spin the disk drive to unsafe speeds until the disc flies out of the computer and kills the operator. You just don't know.

    This was in the context of Java documentation, where "undefined behavior" means "avoid this like the plague". Or at least it should, if the Java developers didn't have some fixation on overburdening their interfaces. The classic example is Iterator: some implementations support deleting elements during traversal, and some don't, so the behavior of Iterator#Delete on the interface is undefined. It's safe to use it if you know that you are using a specific implementation that supports it.

    But that gets to the real problem, which is bad language design. What we call "undefined behavior" is about hidden state. Neither the compiler nor the runtime know whether this implementation of Iterator is going to work; that detail is left up to the programmer to get right. And that's bullshit. Sure, if I was programming machine code for fixed hardware like Mel [utah.edu] then that would be acceptable. Difficult, sure, but within the capabilities of a rock star to get it right. But Java is not machine code for fixed hardware. Java will optimize your dead code, hide the memory model from you, and periodically interrupt your program to collect the garbage that it won't let you clean up yourself. More people can make useful programs with Java. But nobody can completely understand all the details of what will happen when their program is run.

    It looks to me like `undef` and `poison` in LLVM IR at least are safe within the bounds of their limited scope. They are not "anything can happen". The bounds of what could happen based on those keywords are knowable before runtime. They speak of a bygone era when the programmer could understand the exact procedure will happen when their code runs, and provide options to model when an operation is safe vs unsafe. That makes sense, because the people who like that level of control can only find it these days writing the compilers.

    So no, a defined undefined behavior is not unsafe. It's not the same thing as truly undefined behavior where anything can happen. That kind of unsafe undefined behavior comes from languages which give you strict contracts they can't enforce, then tell you in code comments that the contract might be a lie.

    --
    If there isn't at least one reference or primary source, it's not +1 Informative. Maybe the underused +1 Interesting?
    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by DannyB on Thursday February 16 2017, @08:15PM

    by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @08:15PM (#467935) Journal
    Since you're talking about Java, iterators and Iterator:delete, I'll point out that you can implicitly use iterators in a for() loop without realizing you're using an iterator.  The iterator may no longer be able to traverse the collection if you call a remove() method on the collection.

    List<President> presidents =  . . . ;
    for( final President president : presidents ) {
       if( president.isAnIdiot()  &&  president.getFaceColor().equals( Colors.ORANGE ) ) {
          presidents.remove( president );  // make idiots un-presidented
       }
    }

    Depending on the collection implementation, after the remove() call, the implicit iterator created by the for() may now be unable to continue traversing the collection.
    --
    To transfer files: right-click on file, pick Copy. Unplug mouse, plug mouse into other computer. Right-click, paste.
    • (Score: 2) by NCommander on Thursday February 16 2017, @10:14PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @10:14PM (#467990) Homepage Journal

      Far too many languages let you do this and leave it to the runtime to decide to crash or not. As far as I know, Rust is the only language off the top my head that specifically checks if such an operation is safe at compile time, and dies with a compiler error if you would invalidate the iterator while you're in it.

      --
      Still always moving
  • (Score: 2) by Wootery on Friday February 17 2017, @09:32AM

    by Wootery (2341) on Friday February 17 2017, @09:32AM (#468151)

    But Java isn't like C. I'm pretty sure Java has no real 'undefined behaviour' (in the C sense), and that this StackOverflow answer is accurate. [stackexchange.com]