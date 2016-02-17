from the for-all-you-code-writing-types-out-there dept.
John Regehr, Professor of Computer Science, University of Utah, writes:
Undefined behavior (UB) in C and C++ is a clear and present danger to developers, especially when they are writing code that will execute near a trust boundary. A less well-known kind of undefined behavior exists in the intermediate representation (IR) for most optimizing, ahead-of-time compilers. For example, LLVM IR has undef and poison in addition to true explodes-in-your-face C-style UB. When people become aware of this, a typical reaction is: "Ugh, why? LLVM IR is just as bad as C!" This piece explains why that is not the correct reaction.
Undefined behavior is the result of a design decision: the refusal to systematically trap program errors at one particular level of a system. The responsibility for avoiding these errors is delegated to a higher level of abstraction. For example, it is obvious that a safe programming language can be compiled to machine code, and it is also obvious that the unsafety of machine code in no way compromises the high-level guarantees made by the language implementation. Swift and Rust are compiled to LLVM IR; some of their safety guarantees are enforced by dynamic checks in the emitted code, other guarantees are made through type checking and have no representation at the LLVM level. Either way, UB at the LLVM level is not a problem for, and cannot be detected by, code in the safe subsets of Swift and Rust. Even C can be used safely if some tool in the development environment ensures that it will not execute UB. The L4.verified project does exactly this.
Defined Behavior is more often needed (Score:2)
In order to write a program, you need defined behavior. Every "Hello World" program ever written assumes that the language and underlying system provide certain defined behavior guarantees that under normal operating conditions will result the famous greeting.
When the programmer writes even a simple assignment, such as x := y; it is assumed that the behavior is defined.
Now, I can see a case for pushing potentially optimized operations up into the language so that they are additional tools in the hands of a programmer who knows how to use them. Most of the time I want an addition operation to spectacularly fail with an exception if it overflows. But there may be times where I don't care what happens in the event of overflow because I can guarantee before the addition is done that overflow simply cannot occur. The simple example is that the operands are already restricted to a smaller range making overflow impossible in the data type that the addition will use. (eg, adding two bytes that are widened to ints) And depending on the purpose I may not even care about any overflow bits. Maybe wanting "mod 256" arithmetic widened to ints when the addition is performed.
I agree with the title that undefined behavior does not mean that programming is unsafe. But most of the time you don't want undefined behavior. Therefore, if you're using operations that have weird, undefined or surprising behavior, those functions or operations ought to have unusual names. The well known functions or operators such as '+' should have no surprising or undefined behaviors.
Another approach might be to have compiler switches or annotations that can be used locally on certain statements to indicate to the compiler that on the next line I simply don't care about what happens for integer overflow. If the compiler is able to use that information to do a more optimized addition operation on a certain instruction set, then great. If not, then fine. And even if the compiler ignores the annotation and simply compiles the addition with all of the checking and guard code around it, that is acceptable. It merely indicates the lower quality of the compiler. Yet the compiler still ensures correctness.
As for making an ordinary common operator have undefined behavior, I think that is a stupid idea. It simply means that generations of programmers, for decades of time, will have to invent and re-invent their own defenses around what should be a simple common operation. Or, they will simply ignore the problem completely. And we end up with obscure bugs, even security vulnerabilities hidden in code that are due to the combination of the programmer, the particular machine instruction set, and how the compiler, or this version of the compiler (!) chose to emit code for that operation.
Assumptions are bad (Score:0)
NEVER assume you are more clever than the language designers, unless you know specifically what a piece of code does, you should always program with the mindset that it causes the user's computer to explode with Lovecraft tentacles.
Undefined behavior is as it's name suggests, undefined. It could cause manageable pointer corruptions, but it can just as easily corrupt memory or trip exceptions. What happens during one can change between different same-architecture hardware, compiler versions, OS releases. Not to mention it makes porting extremely difficult.
If you really have to optimize by assuming certain behavior for a given hardware architecture, then always wrap that code in macro conditionals or template wizardry.
Just looking at it.. (Score:2)
Just looking at it there seems to be an assumption that undefined values == undefined behavior when that is not the case. The compiler is preventing undefined behavior by introducing undefined values and non-signaling NaNs. I guess that is the point of what they are saying? But equating the two seems wrong to me. Permitting undefined behavior is still unsafe. Like throwing @ signs around bugs in PHP or try/catches around random errors in java/c#. All kinds of side-effects and dealing with the resulting UB is not fun. Relying on the compiler to fix undefined behavior seems like a bad idea? If you overflow a number then it should blow-up in your face and not invent some "safe" value to continue. Seems like an area where there are a lot of opinions though.
Oh GOD (Score:0)
Programmers really are getting dumber and dumber. Undefined behavior IS dangerous and stupid, and is NOT a design decision.
