Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Thursday February 16 2017, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Insightful) by Anonymous Coward on Thursday February 16 2017, @04:23PM

    by Anonymous Coward on Thursday February 16 2017, @04:23PM (#467849)

    NEVER assume you are more clever than the language designers, unless you know specifically what a piece of code does, you should always program with the mindset that it causes the user's computer to explode with Lovecraft tentacles.

    Undefined behavior is as it's name suggests, undefined. It could cause manageable pointer corruptions, but it can just as easily corrupt memory or trip exceptions. What happens during one can change between different same-architecture hardware, compiler versions, OS releases. Not to mention it makes porting extremely difficult.

    If you really have to optimize by assuming certain behavior for a given hardware architecture, then always wrap that code in macro conditionals or template wizardry.

    Starting Score:    0  points
    Moderation   +3  
       Insightful=3, Total=3
    Extra 'Insightful' Modifier   0  

    Total Score:   3  
  • (Score: 2) by NCommander on Thursday February 16 2017, @08:39PM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @08:39PM (#467947) Homepage Journal

    A program should *never* depend on undefined behavior at all. I saw a program that once used negative array aliasing (aka array[-3] to do "clever" magic) that blew up skyhigh when the compiler was swapped out. The single exception to this is a closed platform where you'll never have updates or changed code; i.e., a burned ROM for a cartiage game system (which used things like undefined op-codes to do magic), but only in cases where you're pushing hardware to the edge. For 99.9% of people, just say no.

    What's worse is that at least in the case of C++, a lot of things are undefined behavior all over the STL that happen a lot. The most common one I know of is when iterating in a vector, and modifying that vector to push or po items. Specification says when a vector is changed, any and all iterators pointing to it are invalidated. In practice, depending on how the STL is implemented, it will work just fine or might crash out with a very hard to debug error. This is because a vector might have to realloc() itself into a larger memory block and change the underlying pointer; sometimes the iterator is pointed at a stale copy of the array, sometimes it dangles. Since you have no enforcement of dangling pointers in the language, well, boom.

    --
    Still always moving
    • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @08:53PM

      by Anonymous Coward on Thursday February 16 2017, @08:53PM (#467955)

      Um, negative indices are defined behavior in C and C++. Accessing memory outside of the array is what is undefined. If I had a pointer that pointed to the 4 element of an array, this is perfectly defined: p[-3].

      • (Score: 2) by NCommander on Thursday February 16 2017, @09:14PM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @09:14PM (#467965) Homepage Journal

        I didn't describe it well. Basically it was something like this.

        int location_one;
        int location_two;
        int location_three;
        int location_end;

        printf("%d", location_end[-2]);

        As far as I could tell, the entire point of it was to avoid having to do update calculations (i.e., have an array, and a macro with the length of an array). The location_end pointer was shared across to other code modules (due to being a flat memory model/no protection). I never understood the point of it, but in a lot of ways, that wasn't even the most WTFy thing I've seen in that codebase. Then again, a lot of microcontroller code is serious WTF.

        --
        Still always moving
        • (Score: 2) by fnj on Friday February 17 2017, @02:18PM

          by fnj (1654) on Friday February 17 2017, @02:18PM (#468213)

          That call to printf will fail no matter what index you use; even 0. location_end is an int, not an int*. In fact the expression won't even compile.

          gcc says "error: subscripted value is neither array nor pointer nor vector"

    • (Score: 2) by lgw on Thursday February 16 2017, @08:59PM

      by lgw (2836) on Thursday February 16 2017, @08:59PM (#467958)

      Sure, it's undefined, but I think all the compiler vendors actually do the same thing - check for reallocation in debug, and let it blow up with debug off.

      It's odd though, and always bugged me, that there's not a "slow but safe" choice in this case. Offering both the safe way and the fast way makes sense for fundamental library actions, as with index-based element access.

      • (Score: 2) by c0lo on Thursday February 16 2017, @11:07PM

        by c0lo (156) Subscriber Badge on Thursday February 16 2017, @11:07PM (#468005) Journal

        It's odd though, and always bugged me, that there's not a "slow but safe" choice in this case. Offering both the safe way and the fast way makes sense for fundamental library actions, as with index-based element access.

        C++/STL does have the safe but slow - at(size_type ) [cppreference.com] .

        --
        https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford