Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

A Journey Through the CPU Pipeline

posted by martyb on Saturday October 31 2015, @03:33AM

Phoenix666 writes:

It is good for programmers to understand what goes on inside a processor. The CPU is at the heart of our career.
What goes on inside the CPU? How long does it take for one instruction to run? What does it mean when a new CPU has a 12-stage pipeline, or 18-stage pipeline, or even a "deep" 31-stage pipeline?
Programs generally treat the CPU as a black box. Instructions go into the box in order, instructions come out of the box in order, and some processing magic happens inside.
As a programmer, it is useful to learn what happens inside the box. This is especially true if you will be working on tasks like program optimization. If you don't know what is going on inside the CPU, how can you optimize for it?

A primer for those with a less formal background.

Original Submission

This discussion has been archived. No new comments can be posted.

A Journey Through the CPU Pipeline | Log In/Create an Account | Top | 29 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

The CPU knows (Score: 2) by Rich on Saturday October 31 2015, @05:52AM

by Rich (945) on Saturday October 31 2015, @05:52AM (#256810) Journal

Modern superscalar CPUs seem to have most of the knowledge a solid assembly programmer is able to put into his average project. Point in case: I wrote a dynamic recompiler for an exotic MIPS-like CPU into x86 so we could run our tests in real time on the build rig without resorting to using a cluster of actual hardware. Once the basic issues were sorted (e.g. streamlining the memory mapping), I found that more efforts to make the compilation more efficient had diminishing return.
We didn't have the time to deeply investigate, but I assume that when you have a wide pipeline with a lot of execution units, there's almost always an idle slot left to stick the inefficient instructions that are encountered. Much the same for array bounds checking, by the way. The checking itself will usually be parallelized (without any hazards to affect the remaining pipeline) and the branch predictor soon knows that the condition trigger will never be taken (and in case it does, one 30-cycle stall is your least problem).
It might be that compiler backend writers would want a deep knowledge here not only to squeeze out the last few percent, but also to avoid to run into flaws certain CPU implementations might have. But that is sufficiently advanced to appear like magic even to solid assembly coders. One thing to remember, however, is that going out of cache to main RAM will cost dearly. Which funnily enough perverts the C++ template attitude which says that everything must be done without overhead - it will be more efficient to have a single routine that processes an "overhead" dynamic element size (parallelized) and goes through handlers (branches predicted) than to have multiple specializations that each handle the data in a "perfect" way, because the latter thrashes the caches (i-caches at least).

Starting Score: 1 point

Karma-Bonus Modifier +1

Total Score: 2

Moderator Help

And tomorrow will be like today, only more so. -- Isaiah 56:12, New Standard Version

Starting Score:	1		point
Karma-Bonus Modifier		+1

Total Score:		2

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

A Journey Through the CPU Pipeline

The CPU knows (Score: 2) by Rich on Saturday October 31 2015, @05:52AM