Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Thursday February 16 2017, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @06:24PM

    by Anonymous Coward on Thursday February 16 2017, @06:24PM (#467905)

    I think you completely misunderstand the meaning of "undefined" and "implementation defined".
    Just because it is not documented does not turn "implementation defined" into "undefined".
    "implementation defined" means it has a specific, reproducible behaviour. So if you exhaustively test that your code behaves correctly with a certain implementation, you can know you are fine. "implementation defined" also usually is attached to a RESULT, which means the absolute worst case is that you cannot know what the result will be, but you do know there is a result and the surrounding code will work (e.g. if you clamp the result into 0 - 1 range you know it will be in that range afterwards).
    "Undefined" is a completely different thing. There are NO guarantees about undefined behaviour. Your program may crash, abort, start deleting random files, that's all perfectly valid behaviour.
    In particular, there is also NO guarantee that the the code BEFORE whatever triggers the undefined behaviour will be executed, do what it was meant to do or anything like that.
    C code like this:

    char c[10];
    int a = 12;
    int valid = a sizeof(c);
    char *dummy = c + a;
    return valid;

    Is undefined behaviour, and the compiler would be allowed to just replace it by "return true" for example. The fact that the out-of-bounds address is never used, that it has nothing to do with the calculation of "valid" etc. does not matter.
    If it was "implementation defined" anything might happen if you e.g. tried to dereference dummy, but merely calculating c + a would not matter if you never used the result (or worst case, if allowed, it might crash right there. But it cannot result in everything working perfectly except that later in the code 1+2 evalutes to 5).

  • (Score: 2) by DannyB on Thursday February 16 2017, @07:19PM

    by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @07:19PM (#467919) Journal

    I understand exactly what you describe as undefined and implementation defined behavior. I have understood it for decades, across different languages and compilers.

    I think a language specification that leaves anything undefined is a bad idea. That is an opinion.

    I think a language specification that leaves anything implementation defined is also a bad idea. Almost as bad as undefined.

    I hope that is sufficiently clear.

    --
    People today are educated enough to repeat what they are taught but not to question what they are taught.
    • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @08:59PM

      by Anonymous Coward on Thursday February 16 2017, @08:59PM (#467959)

      Well, have fun with your toy languages. Any real language will have corners that are undefined, implementation-specific, or unspecified. It's just the nature of the beast.

      • (Score: 2) by DannyB on Friday February 17 2017, @02:05PM

        by DannyB (5839) Subscriber Badge on Friday February 17 2017, @02:05PM (#468205) Journal

        No language is perfect, or everyone would be using it. But some languages have sharp edges where they should not.

        --
        People today are educated enough to repeat what they are taught but not to question what they are taught.
    • (Score: 2) by TheRaven on Friday February 17 2017, @12:20PM

      by TheRaven (270) on Friday February 17 2017, @12:20PM (#468179) Journal

      I think a language specification that leaves anything undefined is a bad idea. That is an opinion.

      It's also very hard if you want good or deterministic performance. To give a simple example, using a pointer after it has been free'd is undefined behaviour in C. If this were not, then the compiler would be required to do something specific in the case of a use-after free. This would require that it check before each dereference that a pointer is still valid. You basically need garbage collection.

      The same is true for out-of-bounds accesses. By making them undefined, the compiler is free to assume that all accesses are in bounds and so doesn't need to do any checking. Again, this gives much better code.

      I think a language specification that leaves anything implementation defined is also a bad idea

      The same applies. For example, in C the size of long is implementation defined. When C was created, typically C was 1 byte, int and short were 2 bytes, long was 4 bytes. Now, typically long is 8 bytes. If you want your language to work on different substrates, then you need some implementation-defined behaviour.

      --
      sudo mod me up
      • (Score: 2) by fnj on Friday February 17 2017, @02:11PM

        by fnj (1654) on Friday February 17 2017, @02:11PM (#468209)

        If you want your language to work on different substrates, then you need some implementation-defined behaviour.

        It's not clear what you mean. The C specification (just to pick one example) chose to make sizeof char, short, int, and long loosely defined. They didn't have to. The Free Pascal specification says that sizeof Byte and ShortInt are exactly 1, SmallInt and Word are exactly 2, Integer is either 2 or 4 depending on mode, LongInt and LongWord are exactly 4.

        Even C99 formalized typedefs (in stdint.h) for int8_t (1), int16_t (2), int32_t (4), int64_t (8) and permutations of each for unsigned and other variations. Those are not implementation-defined. They are standard-defined. The programmer can choose to use them or not.

        • (Score: 2) by TheRaven on Friday February 17 2017, @05:52PM

          by TheRaven (270) on Friday February 17 2017, @05:52PM (#468274) Journal
          How big is a pointer in Pascal? C has supported 16-bit, 32-bit, and 64-bit pointers that are represented purely as integers, as 36-bit values including a segment id, as fat pointers including a base and a range, and so on. In some languages, such as Java, these details are not exposed through the abstract machine and so the fact that it's implementation defined is hidden from programmers, but the more that you want to expose, the harder it is.
          --
          sudo mod me up
  • (Score: 2) by c0lo on Thursday February 16 2017, @11:00PM

    by c0lo (156) Subscriber Badge on Thursday February 16 2017, @11:00PM (#468002) Journal

    "Undefined" is a completely different thing. There are NO guarantees about undefined behaviour. Your program may crash, abort, start deleting random files, that's all perfectly valid behaviour.

    See also nasal demons [catb.org]

    --
    https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
  • (Score: 2) by fnj on Friday February 17 2017, @01:58PM

    by fnj (1654) on Friday February 17 2017, @01:58PM (#468200)

    You've got something missing between a and sizeof, perhaps a < or >

    Fix it.