Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Thursday February 16 2017, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by NCommander on Thursday February 16 2017, @08:50PM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @08:50PM (#467953) Homepage Journal

    In C (and IMHO, some low level languages), there's an exception to implementation defined that actually makes things easier. My canonical "go to" on to this that actually makes sense and that the volatile keyword (it's also a fun trivia question since most programmers can't describe it).

    Specifically, the C standard uses a model reference to describe how specific operations works; load/storage/etc. This is generally true to real life in the case of flat-memory model such as protected/long mode x86, but if you're dealing with a non-liner or segmented memory model, pointers actually become very complex because you have different types and selectors. It's perfectly legal for a C program map a segmented memory model to a flat one at compile one so pointer arithmetic works as you'd expect it. The canonical example of this is real mode x86 which requires a fat pointer. However, they're expensive to use and calculate, and most of the time the compiler can use a near or far pointer safely. As such, the specific behavior of how load/stores are done in C is actually implementation defined, and why its legal for the optimizer to eliminate variables and such.

    Going back to the example, Volatile is used to mark a position of memory that may change outside the operation of the program. More specifically, the standard says that the C compiler can't use the model described in the specification, and must use the variable "as defined" by the programmer. As such, you need it in any place that talks to memory-mapped registers, global variables in multithreaded applications, and in interrupt service handlers.

    --
    Still always moving
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by lgw on Thursday February 16 2017, @09:11PM

    by lgw (2836) on Thursday February 16 2017, @09:11PM (#467963)

    Is the correct behavior for volatile for multi-threaded code in the C standard now? People were using it as if it would work in C, C++, Java, and C#, but none of the language standards required that - changing outside of control of the program is different from changing outside of control of the thread. All the major compilers did the expected thing (except very early Java IIRC), and I've heard that all the standards now require that behavior, except C. But maybe I missed it.

    • (Score: 3, Interesting) by NCommander on Thursday February 16 2017, @09:33PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @09:33PM (#467971) Homepage Journal

      I haven't seen the ANSI C specification in many years, so I don't know if the literal definition has changed but I doubt it. More specifically, C doesn't handle threads at all in a language specification level. In the specification, violate means that the pointer has to be treated exactly as coded, and not interpreted by the compiler. Specifically, it means it can't assume the contents of a pointer is the same across an operation when optimizing.

      Take the following code block:

      void some_function(int *random_pointer) {
          *random_pointer = 7
          printf("%d\n", random_pointer);
      }

      i.e., without violate, the compiler is free to do the following:

      void some_function(int *random_pointer) {
          *random_pointer = 7
          printf("%d\n", 7); // but random_pointer could have changed if a context switch happened this moment
      }

      Which saves a load instruction. Compilers tend to do this agressively even on O0 because on some architectures, loads can cause a cache miss or context change (i.e., selector change on real mode). Declaring random_pointer violate forces the compile to always to do the load. The easiest way to think of it is if a variable can change without a context switch. If I have a memory mapped register, then that block can change at any time.

      From a processor point of view, a thread may or may not be a separate context. Userland threads (i.e original linuxthreads) and coroutines would be the same application since changes always happen within an application; a coroutine always runs within the parent's context, and never in its own; think longjmp/setjmp. However, most OSes handle threading on a kernel level, and as such its possible that two separate contexts within an application are going at a same time if the kernel runs both at the same time. It's also possible on some machines to do threading without the kernel; protected mode supports the TSS system which does hardware level threading; OS/2 used this, and I suspect early Windows did as well.

      --
      Still always moving
      • (Score: 2) by TheRaven on Friday February 17 2017, @06:07PM

        by TheRaven (270) on Friday February 17 2017, @06:07PM (#468279) Journal

        More specifically, C doesn't handle threads at all in a language specification level

        It must be traumatic for you to be waking up in 2017 after six or more years asleep. I hope that Trump, Brexit, and so on are not too much of a shock. Once you've recovered from that, I think that you might be interested to know that in 2011 there was a new version of the C specification released (and well supported by compilers). This version includes threads, atomic operations, and a memory model for synchronisation.

        Your example is missing a * to dereference random_pointer in the printf argument, but is otherwise correct. Note, however, that the compiler is still allowed to reorder accesses to different volatile variables relative to each other, which makes it unsuitable for most uses in multithreaded programming (but fine for its intended purpose of communicating with memory-mapped I/O devices).

        --
        sudo mod me up
        • (Score: 2) by NCommander on Saturday February 18 2017, @02:27AM

          by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Saturday February 18 2017, @02:27AM (#468465) Homepage Journal

          Teachs me not to test compile my code before posting it. The point stands though.

          If I understand your specific case correctly, you can actually declare the value of a variable to also be violate, i.e.:

          int violate * violate x.

          That should force it to deference and then load in that order, and tell the optimizer to GTFO. There are also pragmas to that effect. That being said, when I do multithreaded with C, it's mutexes and locks all the way down if I have any choice. Violate only gets used for MMIO.

          --
          Still always moving
          • (Score: 2) by TheRaven on Saturday February 18 2017, @11:40AM

            by TheRaven (270) on Saturday February 18 2017, @11:40AM (#468551) Journal

            int violate * violate x.
            That should force it to deference and then load in that order, and tell the optimizer to GTFO

            Ignoring your highly amusing autocorrect problem, that only works when there is a direct dependency between the objects (i.e. there is no way to reorder the load of x after the load of *x, because you must load x to be able to load *x). This will also result in redundant loads of x, which is probably not what you wanted. I was talking about cases like this:

            volatile int x;
            volatile int y;
            printf("%d\n", x);
            printf("%d\n", y);

            The compiler is entirely free to first load y, and then load x, store the results of both on the stack, and then issue the printf calls. This would not be violating the C memory model. The same is not true of this code:

            _Atomic(int) x;
            _Atomic(int) y;
            printf("%d\n", x);
            printf("%d\n", y);

            In this example, the load of x and y are both sequentially consistent and so any reordering that would violate that guarantee is not permitted. The compiler must both load x before y and must emit enough barrier instructions to ensure that there is no global ordering of memory operations that would appear as if the load of y happened first.

            --
            sudo mod me up
    • (Score: 2) by TheRaven on Friday February 17 2017, @12:22PM

      by TheRaven (270) on Friday February 17 2017, @12:22PM (#468180) Journal

      Is the correct behavior for volatile for multi-threaded code in the C standard now?

      If you are using volatile for sharing between threads in C, then you're doing it wrong. Volatile exists for memory-mapped device I/O and nothing else. You want _Atomic.

      --
      sudo mod me up