The "jump threading" compiler optimization (aka -fthread-jump) turns conditional into unconditional branches on certain paths at the expense of code size. For hardware with branch prediction, speculative execution, and prefetching, this can greatly improve performance. However, there is no scientific publication or documentation at all. The Wikipedia article is very short and incomplete.
The linked article has an illustrated treatment of common code structures and how these optimizations work.
(Score: 3, Informative) by pe1rxq on Monday November 02 2015, @09:13AM
If you are bit banging on a CPU which does branch prediction you are probably already doing it wrong.
Bit banging is a nice trick if you have a deterministic CPU like an avr or pic with no OS to interfer.
But on just about anything else you should invest in a little bit of circuitry to implement the protocol the proper way.
I have done my share of real-time programming, but bit-banging on a general purpose CPU is not a typical problem.
(Score: 2) by Alfred on Monday November 02 2015, @07:51PM
The XMOS line of chips gravis mentions are deterministic. Of course he misses the boat. A deterministic chip is not something you run a word processor or web browser on. it is great for processing audio streams or a constant rate of data with the same function over and over. Gravis also mentions bit banging which XMOS has pretty much built into the hardware.