Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Monday April 10 2023, @12:22PM   Printer-friendly

https://www.righto.com/2020/06/die-shrink-how-intel-scaled-down-8086.html

The revolutionary Intel 8086 microprocessor was introduced 42 years ago this month so I've been studying its die.1 I came across two 8086 dies with different sizes, which reveal details of how a die shrink works. The concept of a die shrink is that as technology improved, a manufacturer could shrink the silicon die, reducing costs and improving performance. But there's more to it than simply scaling down the whole die. Although the internal circuitry can be directly scaled down,2 external-facing features can't shrink as easily. For instance, the bonding pads need a minimum size so wires can be attached, and the power-distribution traces must be large enough for the current. The result is that Intel scaled the interior of the 8086 without change, but the circuitry and pads around the edge of the chip were redesigned.

[...] The photo above shows the two 8086 dies at the same scale. The two chips have identical layout in the interior,7 although they may look different at first. The chip on the right has many dark lines in the middle that don't appear on the left, but this is an artifact. These lines are the polysilicon layer, underneath the metal; the die on the left has the same wiring, but it is very faint. I think the newer chip has a thinner metal layer, making the polysilicon more visible.

The magnified photo below shows the same circuitry on the two dies. There is an exact correspondence between components in the two images, showing the circuitry was reduced in size, not redesigned. (These photos show the metal layer on top of the chip; some polysilicon is visible in the right photo.)

I have decided to combine this part of the 8086 story because, as the author points out, there is a significant overlap with an earlier part which explained the multiplication code. [JR]

https://www.righto.com/2023/04/reverse-engineering-8086-divide-microcode.html

While programmers today take division for granted, most microprocessors in the 1970s could only add and subtract — division required a slow and tedious loop implemented in assembly code. One of the nice features of the Intel 8086 processor (1978) was that it provided machine instructions for integer multiplication and division. Internally, the 8086 still performed a loop, but the loop was implemented in microcode: faster and transparent to the programmer. Even so, division was a slow operation, about 50 times slower than addition.

I recently examined multiplication in the 8086, and now it's time to look at the division microcode.1 (There's a lot of overlap with the multiplication post so apologies for any deja vu.)


Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 2) by bzipitidoo on Monday April 10 2023, @01:18PM (2 children)

    by bzipitidoo (4388) on Monday April 10 2023, @01:18PM (#1300743) Journal

    My understanding of the 8086 division is that it was so slow, it often was faster to do division by hand, so to speak. Program in the shifts and subtracts yourself. Much faster to use shift to divide by a power of 2. The microcode didn't have any detection of faster cases, and would just grind through a divide by 2 same as division by any other number.

    I don't know when or if the algorithm was upgraded to what the Pentium has for floating point division (and what they screwed up with that infamous FDIV bug), but the method is a neat one. One of the iterative methods like Newton's method. I don't recall which one. But they carefully optimized the method, providing starting points that assured reaching an answer at the limit of precision with just 4 passes through the loop. The bug was that a very few of those starting points had been corrupted, resulting in those cases in answers that were close, but not perfect.

    • (Score: 4, Insightful) by owl on Monday April 10 2023, @03:21PM

      by owl (15206) on Monday April 10 2023, @03:21PM (#1300757)

      My understanding of the 8086 division is that it was so slow,

      Yes, the 8086 built in division instructions took a lot of cycles to complete. But they had the advantage that if one was writing 8086 code, one did not have to find, or write from scratch, a general purpose division routine. The CPU already 'knew how' to divide. Which compared to the 8086's other competitors of the era (6502, 6800, Z80, TI 99/4a) was a benefit.

      it often was faster to do division by hand, so to speak.

      Yes, although the answer was often "it depends". For a general purpose, 8 or 16 bit, divide any number by any number (excluding divide by zero obviously) using the CPU instruction was not always slower (unless one's alternate 'by hand' version was of a much more advanced algorithm than the one used by the 8086.

      For division by known amounts (i.e., one is always dividing by 12) then yes, doing the division by hand, usually with some shifts and add or subtract, was often significantly faster. But then in this case one does not have a "general purpose division" routine, one has a "special purpose divide by 12" routine. Not quite an apples to apples comparison either.

      The microcode didn't have any detection of faster cases, and would just grind through a divide by 2 same as division by any other number.

      It did not, which is why division by powers of 2, which are also simply shifts, were much faster to code as shits by hand. Had the microcode checked for dividing by a power of two, and substituted shifts for the full divide algorithm, the CPU would have been much faster at processing its divide instructions, for those divisors. And as dividing by 2 and 4 are often common, it would have been an optimization that would have been valuable.

    • (Score: 2) by stormwyrm on Tuesday April 11 2023, @03:52AM

      by stormwyrm (717) on Tuesday April 11 2023, @03:52AM (#1300875) Journal

      It basically implemented a basic general-purpose division algorithm in microcode, which would only be faster than implementing it in raw code because microcode fetches are faster than memory fetches. As I mentioned in the previous article about the x86's multiplication microcode, if you had a constant multiplication or division, it could be a lot faster to decompose the operation into shifts and adds/subtractions. Back when I was still seriously doing x86 assembly I never did figure out how to do it cleanly for divisions, but I soon figured out that you could multiply by shifting one of the factors to the left with a running total for each 1 bit in the other factor. Multiplying by 160 for example (used frequently to get the address of a character in the IBM text-mode screen buffer, since it was 80 columns and each character on screen is two bytes, one for the ASCII value, and one byte for the colour/mode such as bold, underline, etc.) could be done by shifting to the left five and seven times (since 160 is 1010000 in binary, with bits 5 and 7 being 1) and adding them together. Each shift took two cycles, and an add took three, so it took only around 27 cycles to do such a multiplication, as opposed to 120+ cycles to use the MUL instruction. If you tried to implement a fully general multiplication algorithm for an arbitrary, non-constant number though, you would likely not be able to improve on the MUL instruction's performance.

      To add special-case microcode to speed up things like multiplications and divisions by powers of two was likely infeasible for the technology as it existed in the 1970s. Remember it was 45 years ago: it has been 30 iterations of Moore's Law since then. A full 16-bit Wallace tree multiplier would likely have had a transistor count that would have been comparable to all the rest of the microprocessor put together; impractical for the technology of the time. Floating-point arithmetic was right out: note that the 8087 had several times more transistors than the 8086: floating point is godawful complicated. Shirriff has another article series about that on the same blog. It wasn't until the end of the 1980s that technology had advanced enough to make integrating an FPU feasible as the 80486 and Motorola's 68040 had.

      I think that Newton-Raphson division technique was pioneered in Intel's i860 RISC architecture (also introduced in 1989, same as the 80486). It had no divide instruction, floating point or integer: to do division of either sort they recommended you do floating-point Newton-Raphson iteration, which could be done very quickly with the very fast i860 floating point core. I believe they then attempted to implement the same algorithm in the microcode of the early Pentium but messed it up.

      --
      Numquam ponenda est pluralitas sine necessitate.
(1)