Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Thursday February 16 2017, @03:36PM   Printer-friendly
from the for-all-you-code-writing-types-out-there dept.

John Regehr, Professor of Computer Science, University of Utah, writes:

Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.

Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by Immerman on Thursday February 16 2017, @05:17PM

    by Immerman (3985) on Thursday February 16 2017, @05:17PM (#467883)

    I recall reading an article a while back that pointed out that even relatively benign undefined behavior can become a serious problem when it encounters the compiler's optimization engine, especially at more aggressive optimization levels. "Undefined" means the compiler can make very wrong assumptions, and may end up reordering or completely eliminating critical sections of code, creating "invisible errors" that are completely impossible to identify from the source code, except by noticing that there is an "undefined behavior" leak somewhere nearby.

    Here's one such paper. https://dspace.mit.edu/openaccess-disseminate/1721.1/86980 [mit.edu]
    One of the examples they list is:
    Thing* danger = GetPointerOrNull();
    alert = danger->data; // undefined behavior
    if(!danger)
            DoCleanup();

    In which case the compiler may eliminate the null pointer cleanup entirely, since dereferencing "danger" allows it to assume that at that point the pointer is definitely not null.

    Basically, modern compilers infer a lot of non-explicit information from the code, and even relatively safe undefined code can imply extremely false information.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 2) by meustrus on Thursday February 16 2017, @05:38PM

    by meustrus (4961) on Thursday February 16 2017, @05:38PM (#467891)

    If you want to do that, write it a level lower than C. Undefined behavior in C is machine-dependent, so if you are optimizing for a particular machine's behavior you should really just be writing machine code for it to begin with.

    Or you stop trying to optimize yourself, write code that describes your intent, and let the compiler optimize it.

    --
    If there isn't at least one reference or primary source, it's not +1 Informative. Maybe the underused +1 Interesting?
    • (Score: 2) by Immerman on Thursday February 16 2017, @09:43PM

      by Immerman (3985) on Thursday February 16 2017, @09:43PM (#467975)

      If you write any lower than C, my understanding is you're probably not going to be getting much compiler optimization anyway.

      And the point is that undefined behavior is not only machine dependent, but also compiler (and compiler settings) dependent, and can spill across considerable distances within your code. As in this example, critical code that clearly should be run can be optimized out entirely. And in a more complicated scenario, there could potentially be pages of code between that undefined memory access and the if statement it causes to be "erased"

      Also, why exactly would you want to do such a thing intentionally? Accessing the target of a null pointer can potentially cause all sorts of problems.

      • (Score: 2) by NCommander on Thursday February 16 2017, @10:56PM

        by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Thursday February 16 2017, @10:56PM (#467999) Homepage Journal

        C is about as close to metal as you can get short of assembly; it's a reasonably good abstraction of the function of any turning complete machine; when you get down to it, C is basically just load, store and math operations with labels and stack management. On some architectures, it's completely possible to IPL and service interrupts get running without needing assembly code (notably, you can do this on ARM, short of setting up the MMU).

        --
        Still always moving
        • (Score: 2) by Immerman on Friday February 17 2017, @04:20PM

          by Immerman (3985) on Friday February 17 2017, @04:20PM (#468251)

          Indeed. My understanding is that it was designed from the beginning to be almost as efficient as Assembly, even with the piss-poor compiler optimizations of the time.

          I think you mis-characterize the simplicity of C though - the language itself is quite sophisticated, even if it lacks the expansive standard libraries included with more modern languages.

          • (Score: 2) by NCommander on Friday February 17 2017, @05:09PM

            by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Friday February 17 2017, @05:09PM (#468264) Homepage Journal

            I was actually referring to the core language schematics itself, and what you have if you have no external libraries :). I've done a fair bit of bare metal programming with no libc.

            As far as using it for general purpose programming, well there's a reason C continues to truck on after so many years. Simple, (relatively) easy to understand, and its own way elegant. I've mostly migrated over to using rust for most of my static needs, but I don't mind C as a programming language. C++ on the other hand ...

            --
            Still always moving
            • (Score: 2) by Immerman on Saturday February 18 2017, @07:36PM

              by Immerman (3985) on Saturday February 18 2017, @07:36PM (#468691)

              Huh, and I'm actually quite a fan of C++.

              So how is Rust doing these days? I've considered learning it, but my time is limited and Rust has a reputation for changing the language fairly frequently. Which gives me high hopes for it's long-term potential, but not much interest in using it for non-trivial projects at this point.

              • (Score: 2) by NCommander on Sunday February 19 2017, @01:04AM

                by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Sunday February 19 2017, @01:04AM (#468795) Homepage Journal

                Rust as a language was stabilized when 1.0 came about six months ago. My biggest issue in learning it and such is a lot of things like stackoverflowed questions refer to older versions of Rust.

                Conversely, a few crates require features that haven't landed yet and require you do a bit of magic to get them working on stable (serde which is a serializer/deserialize framework is one of these). That being said, I started my current project on Rust 1.12 and it hasn't once broken across three compiler upgrades.

                My problem with C++ is the language is so stupidly complex, and it creates a lot of pain and headaches for anything beside the language coder. I'm too tired to write up my full C++ rant, but the C++ FQA [yosefk.com] goes into a lot details on the low level technical issues I've run into. Having to actively port and debug a large chunk of the Ubuntu archive on ARM drove into me a serious hatred of the language overall.

                --
                Still always moving
  • (Score: 0) by Anonymous Coward on Thursday February 16 2017, @06:03PM

    by Anonymous Coward on Thursday February 16 2017, @06:03PM (#467900)

    Was the article named "what every C programmer should know about undefined behaviour?"