Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Thursday February 16 2017, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

Related Stories

An 18-part Series on Building a Swift HTTP Framework 3 comments

Software engineer, Dave DeLong, has written an 18-part series on building an HTTP framework in Swift. Apple's Swift programming language is a general-purpose, open source, compiled programming language intended to replace Objective-C. It is licensed under the Apache 2.0 license. In his series, Dave covers an Intro to HTTP, Basic Structures, Request Bodies, Loading Requests, Testing and Mocking, Chaining Loaders, Dynamically Modifying Requests, Request Options, Resetting, Cancellation, Throttling, Retrying, Basic Authentication, OAuth Setup, OAuth, and Composite Loaders.

Over the course of this series, we've started with a simple idea and taken it to some pretty fascinating places. The idea we started with is that a network layer can be abstracted out to the idea of "I send this request, and eventually I get a response".

I started working on this approach after reading Rob Napier's blog post on protocols on protocols. In it, he makes the point that we seem to misunderstand the seminal "Protocol Oriented Programming" idea introduced by Dave Abrahams Crusty at WWDC 2015. We especially miss the point when it comes to networking, and Rob's subsequent posts go in to this idea further.

One of the things I hope you've realized throughout this blog post series is that nowhere in this series did I ever talk about Codable. Nothing in this series is generic (with the minor exception of making it easy to specify a request body). There is no mention of deserialization or JSON or decoding responses or anything. This is extremely deliberate.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by DannyB on Thursday February 16 2017, @04:22PM

    by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @04:22PM (#467848) Journal

    In order to write a program, you need defined behavior. Every "Hello World" program ever written assumes that the language and underlying system provide certain defined behavior guarantees that under normal operating conditions will result the famous greeting.

    When the programmer writes even a simple assignment, such as x := y; it is assumed that the behavior is defined.

    Now, I can see a case for pushing potentially optimized operations up into the language so that they are additional tools in the hands of a programmer who knows how to use them. Most of the time I want an addition operation to spectacularly fail with an exception if it overflows. But there may be times where I don't care what happens in the event of overflow because I can guarantee before the addition is done that overflow simply cannot occur. The simple example is that the operands are already restricted to a smaller range making overflow impossible in the data type that the addition will use. (eg, adding two bytes that are widened to ints) And depending on the purpose I may not even care about any overflow bits. Maybe wanting "mod 256" arithmetic widened to ints when the addition is performed.

    I agree with the title that undefined behavior does not mean that programming is unsafe. But most of the time you don't want undefined behavior. Therefore, if you're using operations that have weird, undefined or surprising behavior, those functions or operations ought to have unusual names. The well known functions or operators such as '+' should have no surprising or undefined behaviors.

    Another approach might be to have compiler switches or annotations that can be used locally on certain statements to indicate to the compiler that on the next line I simply don't care about what happens for integer overflow. If the compiler is able to use that information to do a more optimized addition operation on a certain instruction set, then great. If not, then fine. And even if the compiler ignores the annotation and simply compiles the addition with all of the checking and guard code around it, that is acceptable. It merely indicates the lower quality of the compiler. Yet the compiler still ensures correctness.

    As for making an ordinary common operator have undefined behavior, I think that is a stupid idea. It simply means that generations of programmers, for decades of time, will have to invent and re-invent their own defenses around what should be a simple common operation. Or, they will simply ignore the problem completely. And we end up with obscure bugs, even security vulnerabilities hidden in code that are due to the combination of the programmer, the particular machine instruction set, and how the compiler, or this version of the compiler (!) chose to emit code for that operation.

    --
    The lower I set my standards the more accomplishments I have.
    • (Score: 2) by Pino P on Thursday February 16 2017, @04:56PM

      by Pino P (4721) on Thursday February 16 2017, @04:56PM (#467865) Journal

      Most of the time I want an addition operation to spectacularly fail with an exception if it overflows. But there may be times where I don't care what happens in the event of overflow because I can guarantee before the addition is done that overflow simply cannot occur. The simple example is that the operands are already restricted to a smaller range making overflow impossible in the data type that the addition will use. (eg, adding two bytes that are widened to ints) And depending on the purpose I may not even care about any overflow bits. Maybe wanting "mod 256" arithmetic widened to ints when the addition is performed.

      I just searched for gcc trap add overflow on Google, and the second result [robertelder.org] states that -ftrapv in GCC is supposed to enable behavior similar to what you describe. But it was broken until 2014 when GCC 4.8.4 fixed a serious bug [gnu.org].

      aUsing -ftrapv in GCC 4.8.4 or later enables the following rules:

      • Results of arithmetic on unsigned integers are reduced modulo 2^number of bits. The C standard requires this modulo behavior.
      • Arithmetic on signed integers is performed with overflow trapping. The C standard treats this as undefined behavior; the -ftrapv option turns it into an abort.
    • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @05:03PM

      by Anonymous Coward on Thursday February 16 2017, @05:03PM (#467869)

      You are confusing implementation-specific behaviors with undefined. They are NOT the same. Printing "hello world" is not undefined.

      • (Score: 2) by DannyB on Thursday February 16 2017, @05:23PM

        by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @05:23PM (#467885) Journal

        I used Hello World to point out that programmers have expectations of well defined behaviors being defined to achieve the desired result. Common operations should not have undefined behaviors. If it is useful to do so, then introduce an annotation or differently named operator which has undefined behaviors for a potential gain in performance.

        Implementation specific behaviors come in two flavors that I can think of.

        1. The specification says that standing on one foot, jumping three times while shouting Foo has an implementation defined behavior.

        2. The specification says that standing on one foot, jumping three times while shouting Bar has an undefined behavior.

        In case 1, the implementer typically documents the behavior. (or not! making it effectively undefined)

        In case 2, the implementer may or may not document it, but the programmer cannot depend on the behavior because it is undefined. The implementation could change in a subsequent release. Of course an implementation change could happen in case 1, but is usually more public. The spec says it's implementation specific, and programmers ask, so what does my implementation do?

        I would say implementation specific specifications are almost as bad as specifications that define something as undefined.

        I am of the opinion that portability across compilers, let alone operating systems is something that language specifications should strive for. Predictability. Repeatability. Programmers should be able to rely on the language and its compilers to always do one thing. Compiler vendors, or better the language specification, could include optional annotation directives that allow possible optimizations, some of which may rely on undefined edge case behavior.

        --
        The lower I set my standards the more accomplishments I have.
        • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @06:24PM

          by Anonymous Coward on Thursday February 16 2017, @06:24PM (#467905)

          I think you completely misunderstand the meaning of "undefined" and "implementation defined".
          Just because it is not documented does not turn "implementation defined" into "undefined".
          "implementation defined" means it has a specific, reproducible behaviour. So if you exhaustively test that your code behaves correctly with a certain implementation, you can know you are fine. "implementation defined" also usually is attached to a RESULT, which means the absolute worst case is that you cannot know what the result will be, but you do know there is a result and the surrounding code will work (e.g. if you clamp the result into 0 - 1 range you know it will be in that range afterwards).
          "Undefined" is a completely different thing. There are NO guarantees about undefined behaviour. Your program may crash, abort, start deleting random files, that's all perfectly valid behaviour.
          In particular, there is also NO guarantee that the the code BEFORE whatever triggers the undefined behaviour will be executed, do what it was meant to do or anything like that.
          C code like this:

          char c[10];
          int a = 12;
          int valid = a sizeof(c);
          char *dummy = c + a;
          return valid;

          Is undefined behaviour, and the compiler would be allowed to just replace it by "return true" for example. The fact that the out-of-bounds address is never used, that it has nothing to do with the calculation of "valid" etc. does not matter.
          If it was "implementation defined" anything might happen if you e.g. tried to dereference dummy, but merely calculating c + a would not matter if you never used the result (or worst case, if allowed, it might crash right there. But it cannot result in everything working perfectly except that later in the code 1+2 evalutes to 5).

          • (Score: 2) by DannyB on Thursday February 16 2017, @07:19PM

            by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @07:19PM (#467919) Journal

            I understand exactly what you describe as undefined and implementation defined behavior. I have understood it for decades, across different languages and compilers.

            I think a language specification that leaves anything undefined is a bad idea. That is an opinion.

            I think a language specification that leaves anything implementation defined is also a bad idea. Almost as bad as undefined.

            I hope that is sufficiently clear.

            --
            The lower I set my standards the more accomplishments I have.
            • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @08:59PM

              by Anonymous Coward on Thursday February 16 2017, @08:59PM (#467959)

              Well, have fun with your toy languages. Any real language will have corners that are undefined, implementation-specific, or unspecified. It's just the nature of the beast.

              • (Score: 2) by DannyB on Friday February 17 2017, @02:05PM

                by DannyB (5839) Subscriber Badge on Friday February 17 2017, @02:05PM (#468205) Journal

                No language is perfect, or everyone would be using it. But some languages have sharp edges where they should not.

                --
                The lower I set my standards the more accomplishments I have.
            • (Score: 2) by TheRaven on Friday February 17 2017, @12:20PM

              by TheRaven (270) on Friday February 17 2017, @12:20PM (#468179) Journal

              I think a language specification that leaves anything undefined is a bad idea. That is an opinion.

              It's also very hard if you want good or deterministic performance. To give a simple example, using a pointer after it has been free'd is undefined behaviour in C. If this were not, then the compiler would be required to do something specific in the case of a use-after free. This would require that it check before each dereference that a pointer is still valid. You basically need garbage collection.

              The same is true for out-of-bounds accesses. By making them undefined, the compiler is free to assume that all accesses are in bounds and so doesn't need to do any checking. Again, this gives much better code.

              I think a language specification that leaves anything implementation defined is also a bad idea

              The same applies. For example, in C the size of long is implementation defined. When C was created, typically C was 1 byte, int and short were 2 bytes, long was 4 bytes. Now, typically long is 8 bytes. If you want your language to work on different substrates, then you need some implementation-defined behaviour.

              --
              sudo mod me up
              • (Score: 2) by fnj on Friday February 17 2017, @02:11PM

                by fnj (1654) on Friday February 17 2017, @02:11PM (#468209)

                If you want your language to work on different substrates, then you need some implementation-defined behaviour.

                It's not clear what you mean. The C specification (just to pick one example) chose to make sizeof char, short, int, and long loosely defined. They didn't have to. The Free Pascal specification says that sizeof Byte and ShortInt are exactly 1, SmallInt and Word are exactly 2, Integer is either 2 or 4 depending on mode, LongInt and LongWord are exactly 4.

                Even C99 formalized typedefs (in stdint.h) for int8_t (1), int16_t (2), int32_t (4), int64_t (8) and permutations of each for unsigned and other variations. Those are not implementation-defined. They are standard-defined. The programmer can choose to use them or not.

                • (Score: 2) by TheRaven on Friday February 17 2017, @05:52PM

                  by TheRaven (270) on Friday February 17 2017, @05:52PM (#468274) Journal
                  How big is a pointer in Pascal? C has supported 16-bit, 32-bit, and 64-bit pointers that are represented purely as integers, as 36-bit values including a segment id, as fat pointers including a base and a range, and so on. In some languages, such as Java, these details are not exposed through the abstract machine and so the fact that it's implementation defined is hidden from programmers, but the more that you want to expose, the harder it is.
                  --
                  sudo mod me up
          • (Score: 2) by c0lo on Thursday February 16 2017, @11:00PM

            by c0lo (156) Subscriber Badge on Thursday February 16 2017, @11:00PM (#468002) Journal

            "Undefined" is a completely different thing. There are NO guarantees about undefined behaviour. Your program may crash, abort, start deleting random files, that's all perfectly valid behaviour.

            See also nasal demons [catb.org]

            --
            https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
          • (Score: 2) by fnj on Friday February 17 2017, @01:58PM

            by fnj (1654) on Friday February 17 2017, @01:58PM (#468200)

            You've got something missing between a and sizeof, perhaps a < or >

            Fix it.

  • (Score: 3, Insightful) by Anonymous Coward on Thursday February 16 2017, @04:23PM

    by Anonymous Coward on Thursday February 16 2017, @04:23PM (#467849)

    NEVER assume you are more clever than the language designers, unless you know specifically what a piece of code does, you should always program with the mindset that it causes the user's computer to explode with Lovecraft tentacles.

    Undefined behavior is as it's name suggests, undefined. It could cause manageable pointer corruptions, but it can just as easily corrupt memory or trip exceptions. What happens during one can change between different same-architecture hardware, compiler versions, OS releases. Not to mention it makes porting extremely difficult.

    If you really have to optimize by assuming certain behavior for a given hardware architecture, then always wrap that code in macro conditionals or template wizardry.

    • (Score: 2) by NCommander on Thursday February 16 2017, @08:39PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @08:39PM (#467947) Homepage Journal

      A program should *never* depend on undefined behavior at all. I saw a program that once used negative array aliasing (aka array[-3] to do "clever" magic) that blew up skyhigh when the compiler was swapped out. The single exception to this is a closed platform where you'll never have updates or changed code; i.e., a burned ROM for a cartiage game system (which used things like undefined op-codes to do magic), but only in cases where you're pushing hardware to the edge. For 99.9% of people, just say no.

      What's worse is that at least in the case of C++, a lot of things are undefined behavior all over the STL that happen a lot. The most common one I know of is when iterating in a vector, and modifying that vector to push or po items. Specification says when a vector is changed, any and all iterators pointing to it are invalidated. In practice, depending on how the STL is implemented, it will work just fine or might crash out with a very hard to debug error. This is because a vector might have to realloc() itself into a larger memory block and change the underlying pointer; sometimes the iterator is pointed at a stale copy of the array, sometimes it dangles. Since you have no enforcement of dangling pointers in the language, well, boom.

      --
      Still always moving
      • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @08:53PM

        by Anonymous Coward on Thursday February 16 2017, @08:53PM (#467955)

        Um, negative indices are defined behavior in C and C++. Accessing memory outside of the array is what is undefined. If I had a pointer that pointed to the 4 element of an array, this is perfectly defined: p[-3].

        • (Score: 2) by NCommander on Thursday February 16 2017, @09:14PM

          by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @09:14PM (#467965) Homepage Journal

          I didn't describe it well. Basically it was something like this.

          int location_one;
          int location_two;
          int location_three;
          int location_end;

          printf("%d", location_end[-2]);

          As far as I could tell, the entire point of it was to avoid having to do update calculations (i.e., have an array, and a macro with the length of an array). The location_end pointer was shared across to other code modules (due to being a flat memory model/no protection). I never understood the point of it, but in a lot of ways, that wasn't even the most WTFy thing I've seen in that codebase. Then again, a lot of microcontroller code is serious WTF.

          --
          Still always moving
          • (Score: 2) by fnj on Friday February 17 2017, @02:18PM

            by fnj (1654) on Friday February 17 2017, @02:18PM (#468213)

            That call to printf will fail no matter what index you use; even 0. location_end is an int, not an int*. In fact the expression won't even compile.

            gcc says "error: subscripted value is neither array nor pointer nor vector"

      • (Score: 2) by lgw on Thursday February 16 2017, @08:59PM

        by lgw (2836) on Thursday February 16 2017, @08:59PM (#467958)

        Sure, it's undefined, but I think all the compiler vendors actually do the same thing - check for reallocation in debug, and let it blow up with debug off.

        It's odd though, and always bugged me, that there's not a "slow but safe" choice in this case. Offering both the safe way and the fast way makes sense for fundamental library actions, as with index-based element access.

        • (Score: 2) by c0lo on Thursday February 16 2017, @11:07PM

          by c0lo (156) Subscriber Badge on Thursday February 16 2017, @11:07PM (#468005) Journal

          It's odd though, and always bugged me, that there's not a "slow but safe" choice in this case. Offering both the safe way and the fast way makes sense for fundamental library actions, as with index-based element access.

          C++/STL does have the safe but slow - at(size_type ) [cppreference.com] .

          --
          https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
  • (Score: 2) by tibman on Thursday February 16 2017, @04:48PM

    by tibman (134) Subscriber Badge on Thursday February 16 2017, @04:48PM (#467861)

    Just looking at it there seems to be an assumption that undefined values == undefined behavior when that is not the case. The compiler is preventing undefined behavior by introducing undefined values and non-signaling NaNs. I guess that is the point of what they are saying? But equating the two seems wrong to me. Permitting undefined behavior is still unsafe. Like throwing @ signs around bugs in PHP or try/catches around random errors in java/c#. All kinds of side-effects and dealing with the resulting UB is not fun. Relying on the compiler to fix undefined behavior seems like a bad idea? If you overflow a number then it should blow-up in your face and not invent some "safe" value to continue. Seems like an area where there are a lot of opinions though.

    --
    SN won't survive on lurkers alone. Write comments.
    • (Score: 2) by meustrus on Thursday February 16 2017, @05:04PM

      by meustrus (4961) on Thursday February 16 2017, @05:04PM (#467870)

      I thought the same thing looking at `undef`, but when it gets to `poison` that's where undefined behavior comes in. It's still not the "external undefined behavior" we are supposed to avoid, however.

      --
      If there isn't at least one reference or primary source, it's not +1 Informative. Maybe the underused +1 Interesting?
  • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @05:06PM

    by Anonymous Coward on Thursday February 16 2017, @05:06PM (#467873)

    Programmers really are getting dumber and dumber. Undefined behavior IS dangerous and stupid, and is NOT a design decision.

    • (Score: 2) by DannyB on Thursday February 16 2017, @05:29PM

      by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @05:29PM (#467888) Journal

      Sometimes undefined behavior is a decision made in the language specification. IMO that is a bad idea for portability. I think a language specification that leaves things "implementation defined" are almost as bad as undefined.

      --
      The lower I set my standards the more accomplishments I have.
      • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @08:49PM

        by Anonymous Coward on Thursday February 16 2017, @08:49PM (#467951)

        No sh*t. I don't know how any of what you said disputes what I said. At least with implementation defined, the vendor is required to document the behavior. If a language standard marks something as undefined behavior, you should NOT do it.

      • (Score: 2) by NCommander on Thursday February 16 2017, @08:50PM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @08:50PM (#467953) Homepage Journal

        In C (and IMHO, some low level languages), there's an exception to implementation defined that actually makes things easier. My canonical "go to" on to this that actually makes sense and that the volatile keyword (it's also a fun trivia question since most programmers can't describe it).

        Specifically, the C standard uses a model reference to describe how specific operations works; load/storage/etc. This is generally true to real life in the case of flat-memory model such as protected/long mode x86, but if you're dealing with a non-liner or segmented memory model, pointers actually become very complex because you have different types and selectors. It's perfectly legal for a C program map a segmented memory model to a flat one at compile one so pointer arithmetic works as you'd expect it. The canonical example of this is real mode x86 which requires a fat pointer. However, they're expensive to use and calculate, and most of the time the compiler can use a near or far pointer safely. As such, the specific behavior of how load/stores are done in C is actually implementation defined, and why its legal for the optimizer to eliminate variables and such.

        Going back to the example, Volatile is used to mark a position of memory that may change outside the operation of the program. More specifically, the standard says that the C compiler can't use the model described in the specification, and must use the variable "as defined" by the programmer. As such, you need it in any place that talks to memory-mapped registers, global variables in multithreaded applications, and in interrupt service handlers.

        --
        Still always moving
        • (Score: 2) by lgw on Thursday February 16 2017, @09:11PM

          by lgw (2836) on Thursday February 16 2017, @09:11PM (#467963)

          Is the correct behavior for volatile for multi-threaded code in the C standard now? People were using it as if it would work in C, C++, Java, and C#, but none of the language standards required that - changing outside of control of the program is different from changing outside of control of the thread. All the major compilers did the expected thing (except very early Java IIRC), and I've heard that all the standards now require that behavior, except C. But maybe I missed it.

          • (Score: 3, Interesting) by NCommander on Thursday February 16 2017, @09:33PM

            by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @09:33PM (#467971) Homepage Journal

            I haven't seen the ANSI C specification in many years, so I don't know if the literal definition has changed but I doubt it. More specifically, C doesn't handle threads at all in a language specification level. In the specification, violate means that the pointer has to be treated exactly as coded, and not interpreted by the compiler. Specifically, it means it can't assume the contents of a pointer is the same across an operation when optimizing.

            Take the following code block:

            void some_function(int *random_pointer) {
                *random_pointer = 7
                printf("%d\n", random_pointer);
            }

            i.e., without violate, the compiler is free to do the following:

            void some_function(int *random_pointer) {
                *random_pointer = 7
                printf("%d\n", 7); // but random_pointer could have changed if a context switch happened this moment
            }

            Which saves a load instruction. Compilers tend to do this agressively even on O0 because on some architectures, loads can cause a cache miss or context change (i.e., selector change on real mode). Declaring random_pointer violate forces the compile to always to do the load. The easiest way to think of it is if a variable can change without a context switch. If I have a memory mapped register, then that block can change at any time.

            From a processor point of view, a thread may or may not be a separate context. Userland threads (i.e original linuxthreads) and coroutines would be the same application since changes always happen within an application; a coroutine always runs within the parent's context, and never in its own; think longjmp/setjmp. However, most OSes handle threading on a kernel level, and as such its possible that two separate contexts within an application are going at a same time if the kernel runs both at the same time. It's also possible on some machines to do threading without the kernel; protected mode supports the TSS system which does hardware level threading; OS/2 used this, and I suspect early Windows did as well.

            --
            Still always moving
            • (Score: 2) by TheRaven on Friday February 17 2017, @06:07PM

              by TheRaven (270) on Friday February 17 2017, @06:07PM (#468279) Journal

              More specifically, C doesn't handle threads at all in a language specification level

              It must be traumatic for you to be waking up in 2017 after six or more years asleep. I hope that Trump, Brexit, and so on are not too much of a shock. Once you've recovered from that, I think that you might be interested to know that in 2011 there was a new version of the C specification released (and well supported by compilers). This version includes threads, atomic operations, and a memory model for synchronisation.

              Your example is missing a * to dereference random_pointer in the printf argument, but is otherwise correct. Note, however, that the compiler is still allowed to reorder accesses to different volatile variables relative to each other, which makes it unsuitable for most uses in multithreaded programming (but fine for its intended purpose of communicating with memory-mapped I/O devices).

              --
              sudo mod me up
              • (Score: 2) by NCommander on Saturday February 18 2017, @02:27AM

                by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Saturday February 18 2017, @02:27AM (#468465) Homepage Journal

                Teachs me not to test compile my code before posting it. The point stands though.

                If I understand your specific case correctly, you can actually declare the value of a variable to also be violate, i.e.:

                int violate * violate x.

                That should force it to deference and then load in that order, and tell the optimizer to GTFO. There are also pragmas to that effect. That being said, when I do multithreaded with C, it's mutexes and locks all the way down if I have any choice. Violate only gets used for MMIO.

                --
                Still always moving
                • (Score: 2) by TheRaven on Saturday February 18 2017, @11:40AM

                  by TheRaven (270) on Saturday February 18 2017, @11:40AM (#468551) Journal

                  int violate * violate x.
                  That should force it to deference and then load in that order, and tell the optimizer to GTFO

                  Ignoring your highly amusing autocorrect problem, that only works when there is a direct dependency between the objects (i.e. there is no way to reorder the load of x after the load of *x, because you must load x to be able to load *x). This will also result in redundant loads of x, which is probably not what you wanted. I was talking about cases like this:

                  volatile int x;
                  volatile int y;
                  printf("%d\n", x);
                  printf("%d\n", y);

                  The compiler is entirely free to first load y, and then load x, store the results of both on the stack, and then issue the printf calls. This would not be violating the C memory model. The same is not true of this code:

                  _Atomic(int) x;
                  _Atomic(int) y;
                  printf("%d\n", x);
                  printf("%d\n", y);

                  In this example, the load of x and y are both sequentially consistent and so any reordering that would violate that guarantee is not permitted. The compiler must both load x before y and must emit enough barrier instructions to ensure that there is no global ordering of memory operations that would appear as if the load of y happened first.

                  --
                  sudo mod me up
          • (Score: 2) by TheRaven on Friday February 17 2017, @12:22PM

            by TheRaven (270) on Friday February 17 2017, @12:22PM (#468180) Journal

            Is the correct behavior for volatile for multi-threaded code in the C standard now?

            If you are using volatile for sharing between threads in C, then you're doing it wrong. Volatile exists for memory-mapped device I/O and nothing else. You want _Atomic.

            --
            sudo mod me up
    • (Score: 1) by curril on Friday February 17 2017, @01:04AM

      by curril (5717) on Friday February 17 2017, @01:04AM (#468027)

      For example, suppose in a given language the evaluation order of a function's arguments is undefined. This gives the compiler plenty of opportunities to optimize how to most efficiently evaluate the arguments. Now if the language allows function arguments to have side effects, then programmers who don't take care can get some weird bugs depending on how the arguments are evaluated. But in a language like Haskell where they don't have side effects, then it doesn't matter and leaving the evaluation order undefined is the better choice.

  • (Score: 3, Interesting) by Immerman on Thursday February 16 2017, @05:17PM

    by Immerman (3985) on Thursday February 16 2017, @05:17PM (#467883)

    I recall reading an article a while back that pointed out that even relatively benign undefined behavior can become a serious problem when it encounters the compiler's optimization engine, especially at more aggressive optimization levels. "Undefined" means the compiler can make very wrong assumptions, and may end up reordering or completely eliminating critical sections of code, creating "invisible errors" that are completely impossible to identify from the source code, except by noticing that there is an "undefined behavior" leak somewhere nearby.

    Here's one such paper. https://dspace.mit.edu/openaccess-disseminate/1721.1/86980 [mit.edu]
    One of the examples they list is:
    Thing* danger = GetPointerOrNull();
    alert = danger->data; // undefined behavior
    if(!danger)
            DoCleanup();

    In which case the compiler may eliminate the null pointer cleanup entirely, since dereferencing "danger" allows it to assume that at that point the pointer is definitely not null.

    Basically, modern compilers infer a lot of non-explicit information from the code, and even relatively safe undefined code can imply extremely false information.

    • (Score: 2) by meustrus on Thursday February 16 2017, @05:38PM

      by meustrus (4961) on Thursday February 16 2017, @05:38PM (#467891)

      If you want to do that, write it a level lower than C. Undefined behavior in C is machine-dependent, so if you are optimizing for a particular machine's behavior you should really just be writing machine code for it to begin with.

      Or you stop trying to optimize yourself, write code that describes your intent, and let the compiler optimize it.

      --
      If there isn't at least one reference or primary source, it's not +1 Informative. Maybe the underused +1 Interesting?
      • (Score: 2) by Immerman on Thursday February 16 2017, @09:43PM

        by Immerman (3985) on Thursday February 16 2017, @09:43PM (#467975)

        If you write any lower than C, my understanding is you're probably not going to be getting much compiler optimization anyway.

        And the point is that undefined behavior is not only machine dependent, but also compiler (and compiler settings) dependent, and can spill across considerable distances within your code. As in this example, critical code that clearly should be run can be optimized out entirely. And in a more complicated scenario, there could potentially be pages of code between that undefined memory access and the if statement it causes to be "erased"

        Also, why exactly would you want to do such a thing intentionally? Accessing the target of a null pointer can potentially cause all sorts of problems.

        • (Score: 2) by NCommander on Thursday February 16 2017, @10:56PM

          by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @10:56PM (#467999) Homepage Journal

          C is about as close to metal as you can get short of assembly; it's a reasonably good abstraction of the function of any turning complete machine; when you get down to it, C is basically just load, store and math operations with labels and stack management. On some architectures, it's completely possible to IPL and service interrupts get running without needing assembly code (notably, you can do this on ARM, short of setting up the MMU).

          --
          Still always moving
          • (Score: 2) by Immerman on Friday February 17 2017, @04:20PM

            by Immerman (3985) on Friday February 17 2017, @04:20PM (#468251)

            Indeed. My understanding is that it was designed from the beginning to be almost as efficient as Assembly, even with the piss-poor compiler optimizations of the time.

            I think you mis-characterize the simplicity of C though - the language itself is quite sophisticated, even if it lacks the expansive standard libraries included with more modern languages.

            • (Score: 2) by NCommander on Friday February 17 2017, @05:09PM

              by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday February 17 2017, @05:09PM (#468264) Homepage Journal

              I was actually referring to the core language schematics itself, and what you have if you have no external libraries :). I've done a fair bit of bare metal programming with no libc.

              As far as using it for general purpose programming, well there's a reason C continues to truck on after so many years. Simple, (relatively) easy to understand, and its own way elegant. I've mostly migrated over to using rust for most of my static needs, but I don't mind C as a programming language. C++ on the other hand ...

              --
              Still always moving
              • (Score: 2) by Immerman on Saturday February 18 2017, @07:36PM

                by Immerman (3985) on Saturday February 18 2017, @07:36PM (#468691)

                Huh, and I'm actually quite a fan of C++.

                So how is Rust doing these days? I've considered learning it, but my time is limited and Rust has a reputation for changing the language fairly frequently. Which gives me high hopes for it's long-term potential, but not much interest in using it for non-trivial projects at this point.

                • (Score: 2) by NCommander on Sunday February 19 2017, @01:04AM

                  by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Sunday February 19 2017, @01:04AM (#468795) Homepage Journal

                  Rust as a language was stabilized when 1.0 came about six months ago. My biggest issue in learning it and such is a lot of things like stackoverflowed questions refer to older versions of Rust.

                  Conversely, a few crates require features that haven't landed yet and require you do a bit of magic to get them working on stable (serde which is a serializer/deserialize framework is one of these). That being said, I started my current project on Rust 1.12 and it hasn't once broken across three compiler upgrades.

                  My problem with C++ is the language is so stupidly complex, and it creates a lot of pain and headaches for anything beside the language coder. I'm too tired to write up my full C++ rant, but the C++ FQA [yosefk.com] goes into a lot details on the low level technical issues I've run into. Having to actively port and debug a large chunk of the Ubuntu archive on ARM drove into me a serious hatred of the language overall.

                  --
                  Still always moving
    • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @06:03PM

      by Anonymous Coward on Thursday February 16 2017, @06:03PM (#467900)

      Was the article named "what every C programmer should know about undefined behaviour?"

  • (Score: 3, Interesting) by meustrus on Thursday February 16 2017, @05:34PM

    by meustrus (4961) on Thursday February 16 2017, @05:34PM (#467889)

    In my Computer Science classes, we were taught that "undefined behavior" means anything can happen. It could do what you want, or it could throw an exception. Or it could spin the disk drive to unsafe speeds until the disc flies out of the computer and kills the operator. You just don't know.

    This was in the context of Java documentation, where "undefined behavior" means "avoid this like the plague". Or at least it should, if the Java developers didn't have some fixation on overburdening their interfaces. The classic example is Iterator: some implementations support deleting elements during traversal, and some don't, so the behavior of Iterator#Delete on the interface is undefined. It's safe to use it if you know that you are using a specific implementation that supports it.

    But that gets to the real problem, which is bad language design. What we call "undefined behavior" is about hidden state. Neither the compiler nor the runtime know whether this implementation of Iterator is going to work; that detail is left up to the programmer to get right. And that's bullshit. Sure, if I was programming machine code for fixed hardware like Mel [utah.edu] then that would be acceptable. Difficult, sure, but within the capabilities of a rock star to get it right. But Java is not machine code for fixed hardware. Java will optimize your dead code, hide the memory model from you, and periodically interrupt your program to collect the garbage that it won't let you clean up yourself. More people can make useful programs with Java. But nobody can completely understand all the details of what will happen when their program is run.

    It looks to me like `undef` and `poison` in LLVM IR at least are safe within the bounds of their limited scope. They are not "anything can happen". The bounds of what could happen based on those keywords are knowable before runtime. They speak of a bygone era when the programmer could understand the exact procedure will happen when their code runs, and provide options to model when an operation is safe vs unsafe. That makes sense, because the people who like that level of control can only find it these days writing the compilers.

    So no, a defined undefined behavior is not unsafe. It's not the same thing as truly undefined behavior where anything can happen. That kind of unsafe undefined behavior comes from languages which give you strict contracts they can't enforce, then tell you in code comments that the contract might be a lie.

    --
    If there isn't at least one reference or primary source, it's not +1 Informative. Maybe the underused +1 Interesting?
    • (Score: 2) by DannyB on Thursday February 16 2017, @08:15PM

      by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @08:15PM (#467935) Journal
      Since you're talking about Java, iterators and Iterator:delete, I'll point out that you can implicitly use iterators in a for() loop without realizing you're using an iterator.  The iterator may no longer be able to traverse the collection if you call a remove() method on the collection.

      List<President> presidents =  . . . ;
      for( final President president : presidents ) {
         if( president.isAnIdiot()  &&  president.getFaceColor().equals( Colors.ORANGE ) ) {
            presidents.remove( president );  // make idiots un-presidented
         }
      }

      Depending on the collection implementation, after the remove() call, the implicit iterator created by the for() may now be unable to continue traversing the collection.
      --
      The lower I set my standards the more accomplishments I have.
      • (Score: 2) by NCommander on Thursday February 16 2017, @10:14PM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @10:14PM (#467990) Homepage Journal

        Far too many languages let you do this and leave it to the runtime to decide to crash or not. As far as I know, Rust is the only language off the top my head that specifically checks if such an operation is safe at compile time, and dies with a compiler error if you would invalidate the iterator while you're in it.

        --
        Still always moving
    • (Score: 2) by Wootery on Friday February 17 2017, @09:32AM

      by Wootery (2341) on Friday February 17 2017, @09:32AM (#468151)

      But Java isn't like C. I'm pretty sure Java has no real 'undefined behaviour' (in the C sense), and that this StackOverflow answer is accurate. [stackexchange.com]

  • (Score: -1, Troll) by Anonymous Coward on Thursday February 16 2017, @05:42PM

    by Anonymous Coward on Thursday February 16 2017, @05:42PM (#467892)

    I program in Trump++

    • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @05:46PM

      by Anonymous Coward on Thursday February 16 2017, @05:46PM (#467894)

      I agree with your sentiment, but trump is actually a highly ordered pile of shit.

      • (Score: 2) by DannyB on Thursday February 16 2017, @08:19PM

        by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @08:19PM (#467938) Journal

        It's the most highly ordered. I promise. Trust me! I have more entropy than anyone else. And believe me, I know about entropy!

        Entropy definition:
        2. lack of order or predictability; gradual decline into disorder.

        --
        The lower I set my standards the more accomplishments I have.
      • (Score: 2) by c0lo on Thursday February 16 2017, @11:20PM

        by c0lo (156) Subscriber Badge on Thursday February 16 2017, @11:20PM (#468007) Journal

        I agree with your sentiment, but trump is actually a highly ordered pile of shit.

        I have to disagree. For sure, it is not a pile [wikipedia.org].

        --
        https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
  • (Score: 3, Insightful) by maxwell demon on Thursday February 16 2017, @07:21PM

    by maxwell demon (1608) on Thursday February 16 2017, @07:21PM (#467921) Journal

    A language having constructs with undefined behaviour doesn't make that language inherently unsafe. Code that triggers undefined behaviour is inherently unsafe.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by DannyB on Thursday February 16 2017, @08:25PM

      by DannyB (5839) Subscriber Badge on Thursday February 16 2017, @08:25PM (#467941) Journal

      Absolutely true.

      However, how unsafe I would consider a language is directly related to how often and how easy it is to use those constructs with undefined behavior.

      --
      The lower I set my standards the more accomplishments I have.
      • (Score: -1, Troll) by Anonymous Coward on Thursday February 16 2017, @09:05PM

        by Anonymous Coward on Thursday February 16 2017, @09:05PM (#467960)

        Sure, if you're an idiot that needs a slow, safe language to hold your hand.

      • (Score: 2) by bob_super on Friday February 17 2017, @01:07AM

        by bob_super (1357) on Friday February 17 2017, @01:07AM (#468030)

        C is highly unsafe, and so is ASM.

        • (Score: 2) by DannyB on Friday February 17 2017, @02:15PM

          by DannyB (5839) Subscriber Badge on Friday February 17 2017, @02:15PM (#468210) Journal

          Yep. That is what makes C and ASM a great system language for an OS or for microcontrollers or device drivers. But such a bad choice as an application programming language.

          --
          The lower I set my standards the more accomplishments I have.