Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 11 submissions in the queue.
posted by cmn32480 on Thursday February 16, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

Display Options Threshold/Breakthrough

Reply to Article

Mark All as Read

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • Defined Behavior is more often needed (Score: 2) by DannyB (5839)

    by DannyB (5839) on Thursday February 16, @04:22PM (#467848)

    In order to write a program, you need defined behavior. Every "Hello World" program ever written assumes that the language and underlying system provide certain defined behavior guarantees that under normal operating conditions will result the famous greeting.

    When the programmer writes even a simple assignment, such as x := y; it is assumed that the behavior is defined.

    Now, I can see a case for pushing potentially optimized operations up into the language so that they are additional tools in the hands of a programmer who knows how to use them. Most of the time I want an addition operation to spectacularly fail with an exception if it overflows. But there may be times where I don't care what happens in the event of overflow because I can guarantee before the addition is done that overflow simply cannot occur. The simple example is that the operands are already restricted to a smaller range making overflow impossible in the data type that the addition will use. (eg, adding two bytes that are widened to ints) And depending on the purpose I may not even care about any overflow bits. Maybe wanting "mod 256" arithmetic widened to ints when the addition is performed.

    I agree with the title that undefined behavior does not mean that programming is unsafe. But most of the time you don't want undefined behavior. Therefore, if you're using operations that have weird, undefined or surprising behavior, those functions or operations ought to have unusual names. The well known functions or operators such as '+' should have no surprising or undefined behaviors.

    Another approach might be to have compiler switches or annotations that can be used locally on certain statements to indicate to the compiler that on the next line I simply don't care about what happens for integer overflow. If the compiler is able to use that information to do a more optimized addition operation on a certain instruction set, then great. If not, then fine. And even if the compiler ignores the annotation and simply compiles the addition with all of the checking and guard code around it, that is acceptable. It merely indicates the lower quality of the compiler. Yet the compiler still ensures correctness.

    As for making an ordinary common operator have undefined behavior, I think that is a stupid idea. It simply means that generations of programmers, for decades of time, will have to invent and re-invent their own defenses around what should be a simple common operation. Or, they will simply ignore the problem completely. And we end up with obscure bugs, even security vulnerabilities hidden in code that are due to the combination of the programmer, the particular machine instruction set, and how the compiler, or this version of the compiler (!) chose to emit code for that operation.

    • Use gcc -ftrapv (Score: 2) by Pino P (4721)

      by Pino P (4721) Subscriber Badge on Thursday February 16, @04:56PM (#467865)

      Most of the time I want an addition operation to spectacularly fail with an exception if it overflows. But there may be times where I don't care what happens in the event of overflow because I can guarantee before the addition is done that overflow simply cannot occur. The simple example is that the operands are already restricted to a smaller range making overflow impossible in the data type that the addition will use. (eg, adding two bytes that are widened to ints) And depending on the purpose I may not even care about any overflow bits. Maybe wanting "mod 256" arithmetic widened to ints when the addition is performed.

      I just searched for gcc trap add overflow on Google, and the second result states that -ftrapv in GCC is supposed to enable behavior similar to what you describe. But it was broken until 2014 when GCC 4.8.4 fixed a serious bug.

      aUsing -ftrapv in GCC 4.8.4 or later enables the following rules:

      • Results of arithmetic on unsigned integers are reduced modulo 2^number of bits. The C standard requires this modulo behavior.
      • Arithmetic on signed integers is performed with overflow trapping. The C standard treats this as undefined behavior; the -ftrapv option turns it into an abort.
    • Re:Defined Behavior is more often needed (Score: 0) by Anonymous Coward

      by Anonymous Coward on Thursday February 16, @05:03PM (#467869)

      You are confusing implementation-specific behaviors with undefined. They are NOT the same. Printing "hello world" is not undefined.

      • Re:Defined Behavior is more often needed (Score: 2) by DannyB (5839)

        by DannyB (5839) on Thursday February 16, @05:23PM (#467885)

        I used Hello World to point out that programmers have expectations of well defined behaviors being defined to achieve the desired result. Common operations should not have undefined behaviors. If it is useful to do so, then introduce an annotation or differently named operator which has undefined behaviors for a potential gain in performance.

        Implementation specific behaviors come in two flavors that I can think of.

        1. The specification says that standing on one foot, jumping three times while shouting Foo has an implementation defined behavior.

        2. The specification says that standing on one foot, jumping three times while shouting Bar has an undefined behavior.

        In case 1, the implementer typically documents the behavior. (or not! making it effectively undefined)

        In case 2, the implementer may or may not document it, but the programmer cannot depend on the behavior because it is undefined. The implementation could change in a subsequent release. Of course an implementation change could happen in case 1, but is usually more public. The spec says it's implementation specific, and programmers ask, so what does my implementation do?

        I would say implementation specific specifications are almost as bad as specifications that define something as undefined.

        I am of the opinion that portability across compilers, let alone operating systems is something that language specifications should strive for. Predictability. Repeatability. Programmers should be able to rely on the language and its compilers to always do one thing. Compiler vendors, or better the language specification, could include optional annotation directives that allow possible optimizations, some of which may rely on undefined edge case behavior.

        • Re:Defined Behavior is more often needed (Score: 0) by Anonymous Coward

          by Anonymous Coward on Thursday February 16, @06:24PM (#467905)

          I think you completely misunderstand the meaning of "undefined" and "implementation defined".
          Just because it is not documented does not turn "implementation defined" into "undefined".
          "implementation defined" means it has a specific, reproducible behaviour. So if you exhaustively test that your code behaves correctly with a certain implementation, you can know you are fine. "implementation defined" also usually is attached to a RESULT, which means the absolute worst case is that you cannot know what the result will be, but you do know there is a result and the surrounding code will work (e.g. if you clamp the result into 0 - 1 range you know it will be in that range afterwards).
          "Undefined" is a completely different thing. There are NO guarantees about undefined behaviour. Your program may crash, abort, start deleting random files, that's all perfectly valid behaviour.
          In particular, there is also NO guarantee that the the code BEFORE whatever triggers the undefined behaviour will be executed, do what it was meant to do or anything like that.
          C code like this:

          char c[10];
          int a = 12;
          int valid = a sizeof(c);
          char *dummy = c + a;
          return valid;

          Is undefined behaviour, and the compiler would be allowed to just replace it by "return true" for example. The fact that the out-of-bounds address is never used, that it has nothing to do with the calculation of "valid" etc. does not matter.
          If it was "implementation defined" anything might happen if you e.g. tried to dereference dummy, but merely calculating c + a would not matter if you never used the result (or worst case, if allowed, it might crash right there. But it cannot result in everything working perfectly except that later in the code 1+2 evalutes to 5).

          • Re:Defined Behavior is more often needed (Score: 2) by DannyB (5839)

            by DannyB (5839) on Thursday February 16, @07:19PM (#467919)

            I understand exactly what you describe as undefined and implementation defined behavior. I have understood it for decades, across different languages and compilers.

            I think a language specification that leaves anything undefined is a bad idea. That is an opinion.

            I think a language specification that leaves anything implementation defined is also a bad idea. Almost as bad as undefined.

            I hope that is sufficiently clear.

            • Re:Defined Behavior is more often needed (Score: 0) by Anonymous Coward

              by Anonymous Coward on Thursday February 16, @08:59PM (#467959)

              Well, have fun with your toy languages. Any real language will have corners that are undefined, implementation-specific, or unspecified. It's just the nature of the beast.

              • Re:Defined Behavior is more often needed (Score: 2) by DannyB (5839)

                by DannyB (5839) on Friday February 17, @02:05PM (#468205)

                No language is perfect, or everyone would be using it. But some languages have sharp edges where they should not.

            • Re:Defined Behavior is more often needed (Score: 2) by TheRaven (270)

              by TheRaven (270) on Friday February 17, @12:20PM (#468179)

              I think a language specification that leaves anything undefined is a bad idea. That is an opinion.

              It's also very hard if you want good or deterministic performance. To give a simple example, using a pointer after it has been free'd is undefined behaviour in C. If this were not, then the compiler would be required to do something specific in the case of a use-after free. This would require that it check before each dereference that a pointer is still valid. You basically need garbage collection.

              The same is true for out-of-bounds accesses. By making them undefined, the compiler is free to assume that all accesses are in bounds and so doesn't need to do any checking. Again, this gives much better code.

              I think a language specification that leaves anything implementation defined is also a bad idea

              The same applies. For example, in C the size of long is implementation defined. When C was created, typically C was 1 byte, int and short were 2 bytes, long was 4 bytes. Now, typically long is 8 bytes. If you want your language to work on different substrates, then you need some implementation-defined behaviour.

              --
              sudo mod me up
              • Re:Defined Behavior is more often needed (Score: 2) by fnj (1654)

                by fnj (1654) on Friday February 17, @02:11PM (#468209)

                If you want your language to work on different substrates, then you need some implementation-defined behaviour.

                It's not clear what you mean. The C specification (just to pick one example) chose to make sizeof char, short, int, and long loosely defined. They didn't have to. The Free Pascal specification says that sizeof Byte and ShortInt are exactly 1, SmallInt and Word are exactly 2, Integer is either 2 or 4 depending on mode, LongInt and LongWord are exactly 4.

                Even C99 formalized typedefs (in stdint.h) for int8_t (1), int16_t (2), int32_t (4), int64_t (8) and permutations of each for unsigned and other variations. Those are not implementation-defined. They are standard-defined. The programmer can choose to use them or not.

                • Re:Defined Behavior is more often needed (Score: 2) by TheRaven (270)

                  by TheRaven (270) on Friday February 17, @05:52PM (#468274)
                  How big is a pointer in Pascal? C has supported 16-bit, 32-bit, and 64-bit pointers that are represented purely as integers, as 36-bit values including a segment id, as fat pointers including a base and a range, and so on. In some languages, such as Java, these details are not exposed through the abstract machine and so the fact that it's implementation defined is hidden from programmers, but the more that you want to expose, the harder it is.
                  --
                  sudo mod me up
          • Re:Defined Behavior is more often needed (Score: 2) by c0lo (156)

            by c0lo (156) on Thursday February 16, @11:00PM (#468002)

            "Undefined" is a completely different thing. There are NO guarantees about undefined behaviour. Your program may crash, abort, start deleting random files, that's all perfectly valid behaviour.

            See also nasal demons

          • Re:Defined Behavior is more often needed (Score: 2) by fnj (1654)

            by fnj (1654) on Friday February 17, @01:58PM (#468200)

            You've got something missing between a and sizeof, perhaps a < or >

            Fix it.

  • Assumptions are bad (Score: 3, Insightful) by Anonymous Coward

    by Anonymous Coward on Thursday February 16, @04:23PM (#467849)

    NEVER assume you are more clever than the language designers, unless you know specifically what a piece of code does, you should always program with the mindset that it causes the user's computer to explode with Lovecraft tentacles.

    Undefined behavior is as it's name suggests, undefined. It could cause manageable pointer corruptions, but it can just as easily corrupt memory or trip exceptions. What happens during one can change between different same-architecture hardware, compiler versions, OS releases. Not to mention it makes porting extremely difficult.

    If you really have to optimize by assuming certain behavior for a given hardware architecture, then always wrap that code in macro conditionals or template wizardry.

    • Re:Assumptions are bad (Score: 2) by NCommander (2)

      by NCommander (2) Subscriber Badge on Thursday February 16, @08:39PM (#467947)

      A program should *never* depend on undefined behavior at all. I saw a program that once used negative array aliasing (aka array[-3] to do "clever" magic) that blew up skyhigh when the compiler was swapped out. The single exception to this is a closed platform where you'll never have updates or changed code; i.e., a burned ROM for a cartiage game system (which used things like undefined op-codes to do magic), but only in cases where you're pushing hardware to the edge. For 99.9% of people, just say no.

      What's worse is that at least in the case of C++, a lot of things are undefined behavior all over the STL that happen a lot. The most common one I know of is when iterating in a vector, and modifying that vector to push or po items. Specification says when a vector is changed, any and all iterators pointing to it are invalidated. In practice, depending on how the STL is implemented, it will work just fine or might crash out with a very hard to debug error. This is because a vector might have to realloc() itself into a larger memory block and change the underlying pointer; sometimes the iterator is pointed at a stale copy of the array, sometimes it dangles. Since you have no enforcement of dangling pointers in the language, well, boom.

      --
      Still always moving
      • Re:Assumptions are bad (Score: 0) by Anonymous Coward

        by Anonymous Coward on Thursday February 16, @08:53PM (#467955)

        Um, negative indices are defined behavior in C and C++. Accessing memory outside of the array is what is undefined. If I had a pointer that pointed to the 4 element of an array, this is perfectly defined: p[-3].

        • Re:Assumptions are bad (Score: 2) by NCommander (2)

          by NCommander (2) Subscriber Badge on Thursday February 16, @09:14PM (#467965)

          I didn't describe it well. Basically it was something like this.

          int location_one;
          int location_two;
          int location_three;
          int location_end;

          printf("%d", location_end[-2]);

          As far as I could tell, the entire point of it was to avoid having to do update calculations (i.e., have an array, and a macro with the length of an array). The location_end pointer was shared across to other code modules (due to being a flat memory model/no protection). I never understood the point of it, but in a lot of ways, that wasn't even the most WTFy thing I've seen in that codebase. Then again, a lot of microcontroller code is serious WTF.

          --
          Still always moving
          • Re:Assumptions are bad (Score: 2) by fnj (1654)

            by fnj (1654) on Friday February 17, @02:18PM (#468213)

            That call to printf will fail no matter what index you use; even 0. location_end is an int, not an int*. In fact the expression won't even compile.

            gcc says "error: subscripted value is neither array nor pointer nor vector"

      • Re:Assumptions are bad (Score: 2) by lgw (2836)

        by lgw (2836) on Thursday February 16, @08:59PM (#467958)

        Sure, it's undefined, but I think all the compiler vendors actually do the same thing - check for reallocation in debug, and let it blow up with debug off.

        It's odd though, and always bugged me, that there's not a "slow but safe" choice in this case. Offering both the safe way and the fast way makes sense for fundamental library actions, as with index-based element access.

        • Re:Assumptions are bad (Score: 2) by c0lo (156)

          by c0lo (156) on Thursday February 16, @11:07PM (#468005)

          It's odd though, and always bugged me, that there's not a "slow but safe" choice in this case. Offering both the safe way and the fast way makes sense for fundamental library actions, as with index-based element access.

          C++/STL does have the safe but slow - at(size_type ) .

  • Just looking at it.. (Score: 2) by tibman (134)

    by tibman (134) Subscriber Badge on Thursday February 16, @04:48PM (#467861)

    Just looking at it there seems to be an assumption that undefined values == undefined behavior when that is not the case. The compiler is preventing undefined behavior by introducing undefined values and non-signaling NaNs. I guess that is the point of what they are saying? But equating the two seems wrong to me. Permitting undefined behavior is still unsafe. Like throwing @ signs around bugs in PHP or try/catches around random errors in java/c#. All kinds of side-effects and dealing with the resulting UB is not fun. Relying on the compiler to fix undefined behavior seems like a bad idea? If you overflow a number then it should blow-up in your face and not invent some "safe" value to continue. Seems like an area where there are a lot of opinions though.

    --
    SN won't survive on lurkers alone. Write comments.
    • Re:undefined values != undefined behavior (Score: 2) by meustrus (4961)

      by meustrus (4961) on Thursday February 16, @05:04PM (#467870)

      I thought the same thing looking at `undef`, but when it gets to `poison` that's where undefined behavior comes in. It's still not the "external undefined behavior" we are supposed to avoid, however.

  • Oh GOD (Score: 0) by Anonymous Coward

    by Anonymous Coward on Thursday February 16, @05:06PM (#467873)

    Programmers really are getting dumber and dumber. Undefined behavior IS dangerous and stupid, and is NOT a design decision.

    • Re:Oh GOD (Score: 2) by DannyB (5839)

      by DannyB (5839) on Thursday February 16, @05:29PM (#467888)

      Sometimes undefined behavior is a decision made in the language specification. IMO that is a bad idea for portability. I think a language specification that leaves things "implementation defined" are almost as bad as undefined.

      • Re:Oh GOD (Score: 0) by Anonymous Coward

        by Anonymous Coward on Thursday February 16, @08:49PM (#467951)

        No sh*t. I don't know how any of what you said disputes what I said. At least with implementation defined, the vendor is required to document the behavior. If a language standard marks something as undefined behavior, you should NOT do it.

      • Re:Oh GOD (Score: 2) by NCommander (2)

        by NCommander (2) Subscriber Badge on Thursday February 16, @08:50PM (#467953)

        In C (and IMHO, some low level languages), there's an exception to implementation defined that actually makes things easier. My canonical "go to" on to this that actually makes sense and that the volatile keyword (it's also a fun trivia question since most programmers can't describe it).

        Specifically, the C standard uses a model reference to describe how specific operations works; load/storage/etc. This is generally true to real life in the case of flat-memory model such as protected/long mode x86, but if you're dealing with a non-liner or segmented memory model, pointers actually become very complex because you have different types and selectors. It's perfectly legal for a C program map a segmented memory model to a flat one at compile one so pointer arithmetic works as you'd expect it. The canonical example of this is real mode x86 which requires a fat pointer. However, they're expensive to use and calculate, and most of the time the compiler can use a near or far pointer safely. As such, the specific behavior of how load/stores are done in C is actually implementation defined, and why its legal for the optimizer to eliminate variables and such.

        Going back to the example, Volatile is used to mark a position of memory that may change outside the operation of the program. More specifically, the standard says that the C compiler can't use the model described in the specification, and must use the variable "as defined" by the programmer. As such, you need it in any place that talks to memory-mapped registers, global variables in multithreaded applications, and in interrupt service handlers.

        --
        Still always moving
        • Re:Oh GOD (Score: 2) by lgw (2836)

          by lgw (2836) on Thursday February 16, @09:11PM (#467963)

          Is the correct behavior for volatile for multi-threaded code in the C standard now? People were using it as if it would work in C, C++, Java, and C#, but none of the language standards required that - changing outside of control of the program is different from changing outside of control of the thread. All the major compilers did the expected thing (except very early Java IIRC), and I've heard that all the standards now require that behavior, except C. But maybe I missed it.

          • Re:Oh GOD (Score: 3, Interesting) by NCommander (2)

            by NCommander (2) Subscriber Badge on Thursday February 16, @09:33PM (#467971)

            I haven't seen the ANSI C specification in many years, so I don't know if the literal definition has changed but I doubt it. More specifically, C doesn't handle threads at all in a language specification level. In the specification, violate means that the pointer has to be treated exactly as coded, and not interpreted by the compiler. Specifically, it means it can't assume the contents of a pointer is the same across an operation when optimizing.

            Take the following code block:

            void some_function(int *random_pointer) {
                *random_pointer = 7
                printf("%d\n", random_pointer);
            }

            i.e., without violate, the compiler is free to do the following:

            void some_function(int *random_pointer) {
                *random_pointer = 7
                printf("%d\n", 7); // but random_pointer could have changed if a context switch happened this moment
            }

            Which saves a load instruction. Compilers tend to do this agressively even on O0 because on some architectures, loads can cause a cache miss or context change (i.e., selector change on real mode). Declaring random_pointer violate forces the compile to always to do the load. The easiest way to think of it is if a variable can change without a context switch. If I have a memory mapped register, then that block can change at any time.

            From a processor point of view, a thread may or may not be a separate context. Userland threads (i.e original linuxthreads) and coroutines would be the same application since changes always happen within an application; a coroutine always runs within the parent's context, and never in its own; think longjmp/setjmp. However, most OSes handle threading on a kernel level, and as such its possible that two separate contexts within an application are going at a same time if the kernel runs both at the same time. It's also possible on some machines to do threading without the kernel; protected mode supports the TSS system which does hardware level threading; OS/2 used this, and I suspect early Windows did as well.

            --
            Still always moving
            • Re:Oh GOD (Score: 2) by TheRaven (270)

              by TheRaven (270) on Friday February 17, @06:07PM (#468279)

              More specifically, C doesn't handle threads at all in a language specification level

              It must be traumatic for you to be waking up in 2017 after six or more years asleep. I hope that Trump, Brexit, and so on are not too much of a shock. Once you've recovered from that, I think that you might be interested to know that in 2011 there was a new version of the C specification released (and well supported by compilers). This version includes threads, atomic operations, and a memory model for synchronisation.

              Your example is missing a * to dereference random_pointer in the printf argument, but is otherwise correct. Note, however, that the compiler is still allowed to reorder accesses to different volatile variables relative to each other, which makes it unsuitable for most uses in multithreaded programming (but fine for its intended purpose of communicating with memory-mapped I/O devices).

              --
              sudo mod me up
              • Re:Oh GOD (Score: 2) by NCommander (2)

                by NCommander (2) Subscriber Badge on Saturday February 18, @02:27AM (#468465)

                Teachs me not to test compile my code before posting it. The point stands though.

                If I understand your specific case correctly, you can actually declare the value of a variable to also be violate, i.e.:

                int violate * violate x.

                That should force it to deference and then load in that order, and tell the optimizer to GTFO. There are also pragmas to that effect. That being said, when I do multithreaded with C, it's mutexes and locks all the way down if I have any choice. Violate only gets used for MMIO.

                --
                Still always moving
                • Re:Oh GOD (Score: 2) by TheRaven (270)

                  by TheRaven (270) on Saturday February 18, @11:40AM (#468551)

                  int violate * violate x.
                  That should force it to deference and then load in that order, and tell the optimizer to GTFO

                  Ignoring your highly amusing autocorrect problem, that only works when there is a direct dependency between the objects (i.e. there is no way to reorder the load of x after the load of *x, because you must load x to be able to load *x). This will also result in redundant loads of x, which is probably not what you wanted. I was talking about cases like this:

                  volatile int x;
                  volatile int y;
                  printf("%d\n", x);
                  printf("%d\n", y);

                  The compiler is entirely free to first load y, and then load x, store the results of both on the stack, and then issue the printf calls. This would not be violating the C memory model. The same is not true of this code:

                  _Atomic(int) x;
                  _Atomic(int) y;
                  printf("%d\n", x);
                  printf("%d\n", y);

                  In this example, the load of x and y are both sequentially consistent and so any reordering that would violate that guarantee is not permitted. The compiler must both load x before y and must emit enough barrier instructions to ensure that there is no global ordering of memory operations that would appear as if the load of y happened first.

                  --
                  sudo mod me up
          • Re:Oh GOD (Score: 2) by TheRaven (270)

            by TheRaven (270) on Friday February 17, @12:22PM (#468180)

            Is the correct behavior for volatile for multi-threaded code in the C standard now?

            If you are using volatile for sharing between threads in C, then you're doing it wrong. Volatile exists for memory-mapped device I/O and nothing else. You want _Atomic.

            --
            sudo mod me up
    • Undefined behavior can be useful (Score: 1) by curril (5717)

      by curril (5717) on Friday February 17, @01:04AM (#468027)

      For example, suppose in a given language the evaluation order of a function's arguments is undefined. This gives the compiler plenty of opportunities to optimize how to most efficiently evaluate the arguments. Now if the language allows function arguments to have side effects, then programmers who don't take care can get some weird bugs depending on how the arguments are evaluated. But in a language like Haskell where they don't have side effects, then it doesn't matter and leaving the evaluation order undefined is the better choice.

  • Undefined behavior is an optimization trap (Score: 3, Interesting) by Immerman (3985)

    by Immerman (3985) on Thursday February 16, @05:17PM (#467883)

    I recall reading an article a while back that pointed out that even relatively benign undefined behavior can become a serious problem when it encounters the compiler's optimization engine, especially at more aggressive optimization levels. "Undefined" means the compiler can make very wrong assumptions, and may end up reordering or completely eliminating critical sections of code, creating "invisible errors" that are completely impossible to identify from the source code, except by noticing that there is an "undefined behavior" leak somewhere nearby.

    Here's one such paper. https://dspace.mit.edu/openaccess-disseminate/1721.1/86980
    One of the examples they list is:
    Thing* danger = GetPointerOrNull();
    alert = danger->data; // undefined behavior
    if(!danger)
            DoCleanup();

    In which case the compiler may eliminate the null pointer cleanup entirely, since dereferencing "danger" allows it to assume that at that point the pointer is definitely not null.

    Basically, modern compilers infer a lot of non-explicit information from the code, and even relatively safe undefined code can imply extremely false information.

    • Re:Undefined behavior is an optimization trap (Score: 2) by meustrus (4961)

      by meustrus (4961) on Thursday February 16, @05:38PM (#467891)

      If you want to do that, write it a level lower than C. Undefined behavior in C is machine-dependent, so if you are optimizing for a particular machine's behavior you should really just be writing machine code for it to begin with.

      Or you stop trying to optimize yourself, write code that describes your intent, and let the compiler optimize it.

      • Re:Undefined behavior is an optimization trap (Score: 2) by Immerman (3985)

        by Immerman (3985) on Thursday February 16, @09:43PM (#467975)

        If you write any lower than C, my understanding is you're probably not going to be getting much compiler optimization anyway.

        And the point is that undefined behavior is not only machine dependent, but also compiler (and compiler settings) dependent, and can spill across considerable distances within your code. As in this example, critical code that clearly should be run can be optimized out entirely. And in a more complicated scenario, there could potentially be pages of code between that undefined memory access and the if statement it causes to be "erased"

        Also, why exactly would you want to do such a thing intentionally? Accessing the target of a null pointer can potentially cause all sorts of problems.

        • Re:Undefined behavior is an optimization trap (Score: 2) by NCommander (2)

          by NCommander (2) Subscriber Badge on Thursday February 16, @10:56PM (#467999)

          C is about as close to metal as you can get short of assembly; it's a reasonably good abstraction of the function of any turning complete machine; when you get down to it, C is basically just load, store and math operations with labels and stack management. On some architectures, it's completely possible to IPL and service interrupts get running without needing assembly code (notably, you can do this on ARM, short of setting up the MMU).

          --
          Still always moving
          • Re:Undefined behavior is an optimization trap (Score: 2) by Immerman (3985)

            by Immerman (3985) on Friday February 17, @04:20PM (#468251)

            Indeed. My understanding is that it was designed from the beginning to be almost as efficient as Assembly, even with the piss-poor compiler optimizations of the time.

            I think you mis-characterize the simplicity of C though - the language itself is quite sophisticated, even if it lacks the expansive standard libraries included with more modern languages.

            • Re:Undefined behavior is an optimization trap (Score: 2) by NCommander (2)

              by NCommander (2) Subscriber Badge on Friday February 17, @05:09PM (#468264)

              I was actually referring to the core language schematics itself, and what you have if you have no external libraries :). I've done a fair bit of bare metal programming with no libc.

              As far as using it for general purpose programming, well there's a reason C continues to truck on after so many years. Simple, (relatively) easy to understand, and its own way elegant. I've mostly migrated over to using rust for most of my static needs, but I don't mind C as a programming language. C++ on the other hand ...

              --
              Still always moving
              • Re:Undefined behavior is an optimization trap (Score: 2) by Immerman (3985)

                by Immerman (3985) on Saturday February 18, @07:36PM (#468691)

                Huh, and I'm actually quite a fan of C++.

                So how is Rust doing these days? I've considered learning it, but my time is limited and Rust has a reputation for changing the language fairly frequently. Which gives me high hopes for it's long-term potential, but not much interest in using it for non-trivial projects at this point.

                • Re:Undefined behavior is an optimization trap (Score: 2) by NCommander (2)

                  by NCommander (2) Subscriber Badge on Sunday February 19, @01:04AM (#468795)

                  Rust as a language was stabilized when 1.0 came about six months ago. My biggest issue in learning it and such is a lot of things like stackoverflowed questions refer to older versions of Rust.

                  Conversely, a few crates require features that haven't landed yet and require you do a bit of magic to get them working on stable (serde which is a serializer/deserialize framework is one of these). That being said, I started my current project on Rust 1.12 and it hasn't once broken across three compiler upgrades.

                  My problem with C++ is the language is so stupidly complex, and it creates a lot of pain and headaches for anything beside the language coder. I'm too tired to write up my full C++ rant, but the C++ FQA goes into a lot details on the low level technical issues I've run into. Having to actively port and debug a large chunk of the Ubuntu archive on ARM drove into me a serious hatred of the language overall.

                  --
                  Still always moving
    • Re:Undefined behavior is an optimization trap (Score: 0) by Anonymous Coward

      by Anonymous Coward on Thursday February 16, @06:03PM (#467900)

      Was the article named "what every C programmer should know about undefined behaviour?"

  • Define "Undefined" (Score: 3, Interesting) by meustrus (4961)

    by meustrus (4961) on Thursday February 16, @05:34PM (#467889)

    In my Computer Science classes, we were taught that "undefined behavior" means anything can happen. It could do what you want, or it could throw an exception. Or it could spin the disk drive to unsafe speeds until the disc flies out of the computer and kills the operator. You just don't know.

    This was in the context of Java documentation, where "undefined behavior" means "avoid this like the plague". Or at least it should, if the Java developers didn't have some fixation on overburdening their interfaces. The classic example is Iterator: some implementations support deleting elements during traversal, and some don't, so the behavior of Iterator#Delete on the interface is undefined. It's safe to use it if you know that you are using a specific implementation that supports it.

    But that gets to the real problem, which is bad language design. What we call "undefined behavior" is about hidden state. Neither the compiler nor the runtime know whether this implementation of Iterator is going to work; that detail is left up to the programmer to get right. And that's bullshit. Sure, if I was programming machine code for fixed hardware like Mel then that would be acceptable. Difficult, sure, but within the capabilities of a rock star to get it right. But Java is not machine code for fixed hardware. Java will optimize your dead code, hide the memory model from you, and periodically interrupt your program to collect the garbage that it won't let you clean up yourself. More people can make useful programs with Java. But nobody can completely understand all the details of what will happen when their program is run.

    It looks to me like `undef` and `poison` in LLVM IR at least are safe within the bounds of their limited scope. They are not "anything can happen". The bounds of what could happen based on those keywords are knowable before runtime. They speak of a bygone era when the programmer could understand the exact procedure will happen when their code runs, and provide options to model when an operation is safe vs unsafe. That makes sense, because the people who like that level of control can only find it these days writing the compilers.

    So no, a defined undefined behavior is not unsafe. It's not the same thing as truly undefined behavior where anything can happen. That kind of unsafe undefined behavior comes from languages which give you strict contracts they can't enforce, then tell you in code comments that the contract might be a lie.

    • Re:Define "Undefined" (Score: 2) by DannyB (5839)

      by DannyB (5839) on Thursday February 16, @08:15PM (#467935)
      Since you're talking about Java, iterators and Iterator:delete, I'll point out that you can implicitly use iterators in a for() loop without realizing you're using an iterator.  The iterator may no longer be able to traverse the collection if you call a remove() method on the collection.

      List<President> presidents =  . . . ;
      for( final President president : presidents ) {
         if( president.isAnIdiot()  &&  president.getFaceColor().equals( Colors.ORANGE ) ) {
            presidents.remove( president );  // make idiots un-presidented
         }
      }

      Depending on the collection implementation, after the remove() call, the implicit iterator created by the for() may now be unable to continue traversing the collection.
      • Re:Define "Undefined" (Score: 2) by NCommander (2)

        by NCommander (2) Subscriber Badge on Thursday February 16, @10:14PM (#467990)

        Far too many languages let you do this and leave it to the runtime to decide to crash or not. As far as I know, Rust is the only language off the top my head that specifically checks if such an operation is safe at compile time, and dies with a compiler error if you would invalidate the iterator while you're in it.

        --
        Still always moving
    • Re:Define "Undefined" (Score: 2) by Wootery (2341)

      by Wootery (2341) on Friday February 17, @09:32AM (#468151)

      But Java isn't like C. I'm pretty sure Java has no real 'undefined behaviour' (in the C sense), and that this StackOverflow answer is accurate.

  • Embrace chaos! (Score: -1, Troll) by Anonymous Coward

    by Anonymous Coward on Thursday February 16, @05:42PM (#467892)

    I program in Trump++

    • Re:Embrace chaos! (Score: 0) by Anonymous Coward

      by Anonymous Coward on Thursday February 16, @05:46PM (#467894)

      I agree with your sentiment, but trump is actually a highly ordered pile of shit.

      • Re:Embrace chaos! (Score: 2) by DannyB (5839)

        by DannyB (5839) on Thursday February 16, @08:19PM (#467938)

        It's the most highly ordered. I promise. Trust me! I have more entropy than anyone else. And believe me, I know about entropy!

        Entropy definition:
        2. lack of order or predictability; gradual decline into disorder.

      • Re:Embrace chaos! (Score: 2) by c0lo (156)

        by c0lo (156) on Thursday February 16, @11:20PM (#468007)

        I agree with your sentiment, but trump is actually a highly ordered pile of shit.

        I have to disagree. For sure, it is not a pile.

  • Confusiong two things (Score: 3, Insightful) by maxwell demon (1608)

    by maxwell demon (1608) Subscriber Badge on Thursday February 16, @07:21PM (#467921)

    A language having constructs with undefined behaviour doesn't make that language inherently unsafe. Code that triggers undefined behaviour is inherently unsafe.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • Re:Confusiong two things (Score: 2) by DannyB (5839)

      by DannyB (5839) on Thursday February 16, @08:25PM (#467941)

      Absolutely true.

      However, how unsafe I would consider a language is directly related to how often and how easy it is to use those constructs with undefined behavior.

      • Comment Below Threshold

        Re:Confusiong two things (Score: -1, Troll) by Anonymous Coward

        by Anonymous Coward on Thursday February 16, @09:05PM (#467960)

        Sure, if you're an idiot that needs a slow, safe language to hold your hand.

      • Re:Confusiong two things (Score: 2) by bob_super (1357)

        by bob_super (1357) on Friday February 17, @01:07AM (#468030)

        C is highly unsafe, and so is ASM.

        • Re:Confusiong two things (Score: 2) by DannyB (5839)

          by DannyB (5839) on Friday February 17, @02:15PM (#468210)

          Yep. That is what makes C and ASM a great system language for an OS or for microcontrollers or device drivers. But such a bad choice as an application programming language.