Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Tuesday May 14 2019, @07:43PM   Printer-friendly
from the gate-twiddling dept.

I'm tired of the dominance of the out-of-order processor. They are large and wasteful, the ever-popular x86 is especially poor, and they are hard to understand. Their voodoo would be more appreciated if they pushed better at the limits of computation, but it's obvious that the problems people solve have a latent inaccessible parallelism far in excess of what an out-of-order core can extract. The future of computing should surely not rest on terawatts of power burnt to pretend a processor is simpler than it is.

There is some hope in the ideas of upstarts, like Mill Computing and Tachyum, as well as research ideas like CG-OoO. I don't know if they will ever find success. I wouldn't bet on it. Heck, the Mill might never even get far enough to have the opportunity to fail. Yet I find them exciting, and much of the offhand "sounds like Itanium" naysay is uninteresting.

This article focuses on architectures in proportion to how much creative, interesting work they've shown in public. This means much of this article comments on the Mill architecture, there is a healthy amount on CG-OoO, and the Tachyum is mentioned only in passing.

https://medium.com/@veedrac/to-reinvent-the-processor-671139a4a034

A commentary on some of the more unusual OoO architectures in the works with focus on Mill Computing's belt machines.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Immerman on Wednesday May 15 2019, @02:59AM (3 children)

    by Immerman (3985) on Wednesday May 15 2019, @02:59AM (#843681)

    Of course the compiler can do branch prediction, you just need to do performance profiling first. Heck, that was something any decent coder used to do by hand as a matter of course: Any performance-relevant branch statement statement should be structured to usually follow the first available option so that the pipeline wasn't disrupted by an unnecessary branch. In essence, performance-critical code should always be written as:
    if(usually true){
    Do most common code-path
    }else{
    unusual code path that will inherently cause a pipeline flush because you took the non-first branch to get here
    }

    Now granted, that can't adapt on the fly to changing probability distributions, but it's fairly rare code where the probability distributions change significantly on the fly.

    As for RISC, as I recall OoO execution and branch prediction thrived there as well, as the name implied RISC was more about the instruction set than the microcode to execute it.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by Immerman on Wednesday May 15 2019, @03:01AM

    by Immerman (3985) on Wednesday May 15 2019, @03:01AM (#843682)

    That should have been
    >...so that the pipeline wasn't disrupted by an unnecessary conditional jump

  • (Score: 2) by maxwell demon on Wednesday May 15 2019, @07:31AM (1 child)

    by maxwell demon (1608) on Wednesday May 15 2019, @07:31AM (#843722) Journal

    Of course the compiler can do branch prediction, you just need to do performance profiling first.

    That's assuming you can accurately predict the data patterns that will go into the program. What if a program is used with two very different patterns? For example, I could imagine that some loops in video encoders show very different behaviour whether they encode live action recordings or 2D cartoons.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by RS3 on Thursday May 16 2019, @07:32PM

      by RS3 (6367) on Thursday May 16 2019, @07:32PM (#844423)

      Yeah, source code and compiler optimizations are great, but CPU branch prediction is a different thing. CPU branch prediction is a special form of parallel processing where alternate circuits in the CPU pre-fetch code and data that might be needed in the branch, while main CPU circuit is doing something. "Super-scalar" CPUs have been pre-fetching data and code for a long time because CPU I/O (external access) is not as fast as internal CPU speeds, so while CPU is crunching 1 thing, while I/O is available, pre-fetch circuits grab what they can (read-ahead cache load).

      I still have not gotten a clear answer but I speculate the problem is that the CPU pulls in code and data for process A, then OS context-switches control to process B, but oops, process A's code and data are still in internal CPU cache, and oops, B owns the CPU and can read A's code and data. The kernel fixes seem to be to flush cache and pre-fetch queues frequently and certainly on context switches, and that helps, but doesn't cover all cases. It seems like the CPU should do that on its own, but I have to think about whether the CPU knows its context; probably doesn't matter. If CPU knows GDT entry from which barrel load came, and new context is different / protected, then flush cache. Gotta think... later...