The "jump threading" compiler optimization (aka -fthread-jump) turns conditional into unconditional branches on certain paths at the expense of code size. For hardware with branch prediction, speculative execution, and prefetching, this can greatly improve performance. However, there is no scientific publication or documentation at all. The Wikipedia article is very short and incomplete.
The linked article has an illustrated treatment of common code structures and how these optimizations work.
(Score: 2) by Alfred on Monday November 02 2015, @08:32PM
Branch misprediction is expensive but modern branch prediction is good enough to pay for itself in cpu cycles saved.
Even the PlayStation ONE cpu had a set of branch likely instructions. It seems to me that would be the easiest way, just tell the chip which way to prefetch and only fail once per loop block or less than half per conditional branch. I thought modern chips did the same but I'm not sure, I know the mips3000 in the ps1 did.
Maybe that is what you were trying to say. I think the implementation in the article could be achieved by unrolling the first pass of a loop or other opcode bloating optimization. It is really just interesting graph theory.