Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Saturday October 31 2015, @03:33AM   Printer-friendly
from the A|Journey|Through|the|CPU|Pipeline dept.

It is good for programmers to understand what goes on inside a processor. The CPU is at the heart of our career.

What goes on inside the CPU? How long does it take for one instruction to run? What does it mean when a new CPU has a 12-stage pipeline, or 18-stage pipeline, or even a "deep" 31-stage pipeline?

Programs generally treat the CPU as a black box. Instructions go into the box in order, instructions come out of the box in order, and some processing magic happens inside.

As a programmer, it is useful to learn what happens inside the box. This is especially true if you will be working on tasks like program optimization. If you don't know what is going on inside the CPU, how can you optimize for it?

A primer for those with a less formal background.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by bzipitidoo on Saturday October 31 2015, @03:07PM

    by bzipitidoo (4388) on Saturday October 31 2015, @03:07PM (#256900) Journal

    I like optimizing, it's fun, and I'm pretty good at assembler, but I have to agree with the original poster that in most cases, it's not the best use of your time to optimize at the assembly language level. That really only pays dividends in heavily used code that has already been optimized in every other way. I've had experiences in which I spent time optimizing assembly code, only to have it all made moot because a new generation of CPUs came out which totally changes the optimization. The switch of x86 from underlying CISC to RISC was just such a seismic shift. Before, you'd look for creative ways to employ every available opcode to eliminate as many instructions as possible, after, you'd avoid those complicated instructions that the CPU designers had sidelined, easy to do as they were obsolete. No one needs packed decimal capability in hardware, there are so many other, better ways to handle numbers than that. You were only interested in the packed decimal instructions in those rare cases where they could save a few lines of assembler code doing sometime totally unrelated to packed decimal. CPUs have had bugs before-- as I recall, there was a minor screwup in the early Pentium 4 in which they had the branch prediction backwards. For those CPUs only, redoing loops in your code to refactor the exit condition to avoid that bad branch prediction yielded quite a bit of speed improvement. Another embarrassment in more than one older CPU was that the native division instruction was so slow that division was often, sometimes always(!) faster if done with shifts and subtracts. Even multiplication, which is easier, had corner cases in which a shift is just plain faster.

    First, the problem should be worth solving. Then, design and algorithmic improvements should be sought. Mostly that is replacing very bad choices with a better balanced choice that may not be the very best algorithm but is good enough while being easy to implement, or readily available in a library that is well-tested, stable, and free of any major bugs. It's not done to make slow code faster, it's done to replace bad code that is so slow and/or resource intensive that it is impractical to run or has a time bomb that will cause failure after a few days or weeks of operation, or other major problems, with code that is just good enough to work. A simple example from SQL: break apart a complicated SQL statement that joins 3 or more tables into simpler separate statements that use temporary storage for intermediate results. A triple join is okay when tables are very small, only a few hundred rows, but it scales horribly, and may become unusably slow at just a few thousand rows, to say nothing of 100,000+ rows. That hand optimization depends upon the database engine too, whether it can optimize multiple joins well, or not.

    One bad programmer I worked with briefly had a weird bit of paranoia about initialization. He'd move initialization code into loops, claiming that computers were so fast it didn't matter that a data structure was being initialized 60,000 times instead of once, and it made him feel better about the reliability of the code. Well, it did matter. Once I figured this out about him, I knew that every time I saw a sudden performance drop in any part of the software, it was time to look at the loops to see what he'd messed up now.

    So, assembler optimization is very low on my list of areas to seek speed improvements. Redoing and undoing bad programming tends to pay much bigger dividends. There is a lot of low hanging fruit out there, a lot of bad code.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2