http://www.righto.com/2023/03/8086-multiplication-microcode.html
While programmers today take multiplication for granted, most microprocessors in the 1970s could only add and subtract — multiplication required a slow and tedious loop implemented in assembly code. One of the nice features of the Intel 8086 processor (1978) was that it provided machine instructions for multiplication,2 able to multiply 8-bit or 16-bit numbers with a single instruction. Internally, the 8086 still performed a loop, but the loop was implemented in microcode: faster and transparent to the programmer. Even so, multiplication was a slow operation, about 24 to 30 times slower than addition.
In this blog post, I explain the multiplication process inside the 8086, analyze the microcode that it used, and discuss the hardware circuitry that helped it out.3 My analysis is based on reverse-engineering the 8086 from die photos. The die photo below shows the chip under a microscope. I've labeled the key functional blocks; the ones that are important to this post are darker. At the left, the ALU (Arithmetic/Logic Unit) performs the arithmetic operations at the heart of multiplication: addition and shifts. Multiplication also uses a few other hardware features: the X register, the F1 flag, and a loop counter.
(Score: 3, Interesting) by stormwyrm on Saturday March 25, @07:14AM
It seems they did almost the whole thing in microcode, which had performance advantages over programming a shift and add multiplier in x86 code only because loading instructions from memory is inherently slower than microcode. A real hardware multiplier circuit in the ALU was likely unfeasible with integrated circuit technology as it existed in the late 1970s (I think a full 16-bit multiplier would need more transistors than the entire x86 processor itself). I used to optimise multiplications by small constants by adding shifted copies of the multiplier based on the 1-bits in the multiplicand: this could be much faster for reasonably small constant values, e.g. one would multiply AX by 3 by doing:
This works because 3 = 2 (one shift) + 1 (no shift). Or by 6 = (0b110), so one shift (×2) and two shifts (×4):
You'd need to watch out for addition carries and overflow with the shifts here though, and the original 8086 instruction set didn't include a shift by more than one constant bit (you'd need to load the shift count into CL). The speedup was especially pronounced when there were only a few 1-bits in the multiplicand. I reserved the expensive MUL/IMUL instruction for multiplications by non-constants or multiplying by larger numbers.
Numquam ponenda est pluralitas sine necessitate.