Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Thursday February 16 2017, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by fnj on Friday February 17 2017, @02:11PM

    by fnj (1654) on Friday February 17 2017, @02:11PM (#468209)

    If you want your language to work on different substrates, then you need some implementation-defined behaviour.

    It's not clear what you mean. The C specification (just to pick one example) chose to make sizeof char, short, int, and long loosely defined. They didn't have to. The Free Pascal specification says that sizeof Byte and ShortInt are exactly 1, SmallInt and Word are exactly 2, Integer is either 2 or 4 depending on mode, LongInt and LongWord are exactly 4.

    Even C99 formalized typedefs (in stdint.h) for int8_t (1), int16_t (2), int32_t (4), int64_t (8) and permutations of each for unsigned and other variations. Those are not implementation-defined. They are standard-defined. The programmer can choose to use them or not.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by TheRaven on Friday February 17 2017, @05:52PM

    by TheRaven (270) on Friday February 17 2017, @05:52PM (#468274) Journal
    How big is a pointer in Pascal? C has supported 16-bit, 32-bit, and 64-bit pointers that are represented purely as integers, as 36-bit values including a segment id, as fat pointers including a base and a range, and so on. In some languages, such as Java, these details are not exposed through the abstract machine and so the fact that it's implementation defined is hidden from programmers, but the more that you want to expose, the harder it is.
    --
    sudo mod me up