Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Tuesday May 14 2019, @07:43PM   Printer-friendly
from the gate-twiddling dept.

I'm tired of the dominance of the out-of-order processor. They are large and wasteful, the ever-popular x86 is especially poor, and they are hard to understand. Their voodoo would be more appreciated if they pushed better at the limits of computation, but it's obvious that the problems people solve have a latent inaccessible parallelism far in excess of what an out-of-order core can extract. The future of computing should surely not rest on terawatts of power burnt to pretend a processor is simpler than it is.

There is some hope in the ideas of upstarts, like Mill Computing and Tachyum, as well as research ideas like CG-OoO. I don't know if they will ever find success. I wouldn't bet on it. Heck, the Mill might never even get far enough to have the opportunity to fail. Yet I find them exciting, and much of the offhand "sounds like Itanium" naysay is uninteresting.

This article focuses on architectures in proportion to how much creative, interesting work they've shown in public. This means much of this article comments on the Mill architecture, there is a healthy amount on CG-OoO, and the Tachyum is mentioned only in passing.

https://medium.com/@veedrac/to-reinvent-the-processor-671139a4a034

A commentary on some of the more unusual OoO architectures in the works with focus on Mill Computing's belt machines.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Immerman on Wednesday May 15 2019, @12:44AM (4 children)

    by Immerman (3985) on Wednesday May 15 2019, @12:44AM (#843654)

    Care to give an example?

    The time it takes a CPU to complete an instruction is pretty much written in stone - at least for any given CPU. The time required to retrieve data from RAM is more variable, especially for a parallel processor, but optimizing around some worst-case-scenario assumptions with the full-program contextual information and performance profiling is still likely to be at least competitive with what a CPU can do on the fly.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 0) by Anonymous Coward on Wednesday May 15 2019, @12:55AM (1 child)

    by Anonymous Coward on Wednesday May 15 2019, @12:55AM (#843658)

    You cannot "optimize around" things designed to not be predictable.

    • (Score: 2) by Immerman on Wednesday May 15 2019, @02:38AM

      by Immerman (3985) on Wednesday May 15 2019, @02:38AM (#843677)

      Except that happens entirely invisibly to the program, and unless I'm very much mistaken, should not be relevant to optimization. Putting executables in random places in memory has negligible effect on memory access times (instructions access patterns within an executable or library are the same, only the absolute memory location is changed), nor on instruction execution order.

  • (Score: 0) by Anonymous Coward on Wednesday May 15 2019, @05:00AM

    by Anonymous Coward on Wednesday May 15 2019, @05:00AM (#843698)

    Cache access can be pretty unpredictable for certain applications. Floating point operations when denorms are possible can vary in execution time. Just those off the top of my mind. If scheduling was easy, processors likely would all be modeled off of itanium.

  • (Score: 0) by Anonymous Coward on Friday May 17 2019, @10:49AM

    by Anonymous Coward on Friday May 17 2019, @10:49AM (#844659)

    The time it takes a CPU to complete an instruction is pretty much written in stone - at least for any given CPU.

    You can actually write that and not see the problem already? How many different Intel and AMD CPU families are out there at the moment? How many ARM families? Will those cycles/times always be the same for future generations of your wonderful no OOO CPUs?

    In the real world not many people use stuff like Gentoo and keep recompiling everything for their systems.

    See also: https://www.agner.org/optimize/instruction_tables.pdf [agner.org]

    Some instructions might take the same number of cycles for all families, but will enough of them do so? Just merely comparing CALL and RET cycles and you'll see many have differences.

    CPUs that require and rely on "clever" compilers to extract performance out of their hardware for "general computing" tend to have problems when their new generations have significantly different hardware. It's not a big problem for stuff like hardware support for more specialized stuff like AES or SHA acceleration when > 99% of the time you won't need that acceleration, but you have a problem when >99% of the time you need the compiler cleverness to get the 20% extra speed in "general computing".