Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Monday November 02 2015, @04:31AM   Printer-friendly
from the next-up:-jump-jiving dept.

The "jump threading" compiler optimization (aka -fthread-jump) turns conditional into unconditional branches on certain paths at the expense of code size. For hardware with branch prediction, speculative execution, and prefetching, this can greatly improve performance. However, there is no scientific publication or documentation at all. The Wikipedia article is very short and incomplete.

The linked article has an illustrated treatment of common code structures and how these optimizations work.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Interesting) by Anonymous Coward on Monday November 02 2015, @04:14PM

    by Anonymous Coward on Monday November 02 2015, @04:14PM (#257554)

    Seems similar to a technique used for in-order processors. Since conditional branch mispredictions are expensive (a misprediction has to discard instructions and refetch the correct branch), you execute both sides of a branch and simply "select" the correct result at the end. The "if" at the end simply forwards the correct result to the subsequent code, so no re-fetch required. At worst, you have a one cycle stall as you dump the last assignment (you should hint or structure the compare such that the processor always pre-fetches the last assignment). At best, you have a "SELECT" instruction so no stall or branching to deal with (ex. PowerPC fsel or Intel CMOV).

    if (x)
        result = foo(x);
    else
        result = bar(x);

    turns into

    altresult = foo(x);
    result = bar(x);
    if (x) result = altresult;

    explaining why the following is sub-optimal is left to the student

    result = bar(x);
    if (x) result = foo(x);

    Starting Score:    0  points
    Moderation   +1  
       Interesting=1, Total=1
    Extra 'Interesting' Modifier   0  

    Total Score:   1  
  • (Score: 2) by Alfred on Monday November 02 2015, @08:32PM

    by Alfred (4006) on Monday November 02 2015, @08:32PM (#257670) Journal
    Executing both sides of a branch seems unfulfilling. You still have to select which branch gave the right result which I imagine is dependent on what is in the if() which you were avoiding in the first place. Executing both branches is a worse idea the longer either block gets. I guess it would be effective if the total work for the blocks was less than the work to evaluated the conditional, which I suspect is just about never.

    Branch misprediction is expensive but modern branch prediction is good enough to pay for itself in cpu cycles saved.

    Even the PlayStation ONE cpu had a set of branch likely instructions. It seems to me that would be the easiest way, just tell the chip which way to prefetch and only fail once per loop block or less than half per conditional branch. I thought modern chips did the same but I'm not sure, I know the mips3000 in the ps1 did.

    Maybe that is what you were trying to say. I think the implementation in the article could be achieved by unrolling the first pass of a loop or other opcode bloating optimization. It is really just interesting graph theory.