Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Sunday February 05, @06:29PM   Printer-friendly

https://www.righto.com/2023/01/understanding-x86s-decimal-adjust-after.html

I've been looking at the DAA machine instruction on x86 processors, a special instruction for binary-coded decimal arithmetic. Intel's manuals document each instruction in detail, but the DAA description doesn't make much sense. I ran an extensive assembly-language test of DAA on a real machine to determine exactly how the instruction behaves. In this blog post, I explain how the instruction works, in case anyone wants a better understanding.

The DAA (Decimal Adjust AL1 after Addition) instruction is designed for use with packed BCD (Binary-Coded Decimal) numbers. The idea behind BCD is to store decimal numbers in groups of four bits, with each group encoding a digit 0-9 in binary. You can fit two decimal digits in a byte; this format is called packed BCD. For instance, the decimal number 23 would be stored as hex 0x23 (which turns out to be decimal 35).

The 8086 doesn't implement BCD addition directly. Instead, you use regular binary addition and then DAA fixes the result. For instance, suppose you're adding decimal 23 and 45. In BCD these are 0x23 and 0x45 with the binary sum 0x68, so everything seems straightforward. But, there's a problem with carries. For instance, suppose you add decimal 26 and 45 in BCD. Now, 0x26 + 0x45 = 0x6b, which doesn't match the desired answer of 0x71. The problem is that a 4-bit value has a carry at 16, while a decimal digit has a carry at 10. The solution is to add a correction factor of the difference, 6, to get the correct BCD result: 0x6b + 6 = 0x71.

Thus, if a sum has a digit greater than 9, it needs to be corrected by adding 6. However, there's another problem. Consider adding decimal 28 and decimal 49 in BCD: 0x28 + 0x49 = 0x71. Although this looks like a valid BCD result, it is 6 short of the correct answer, 77, and needs a correction factor. The problem is the carry out of the low digit caused the value to wrap around. The solution is for the processor to track the carry out of the low digit, and add a correction if a carry happens. This flag is usually called a half-carry, although Intel calls it the Auxiliary Carry Flag.2

For a packed BCD value, a similar correction must be done for the upper digit. This is accomplished by the DAA (Decimal Adjust AL after Addition) instruction. Thus, to add a packed BCD value, you perform an ADD instruction followed by a DAA instruction.

But read the link to see how it was done and why it was necessary.


Original Submission

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 3, Interesting) by janrinok on Sunday February 05, @06:37PM

    by janrinok (52) Subscriber Badge on Sunday February 05, @06:37PM (#1290381) Journal
    As usual I found the article interesting, but this time I also enjoyed the background history. I didn't realise that BCD can be traced back all the way to the mid 1960s. Thank you.
  • (Score: 3, Insightful) by gawdonblue on Sunday February 05, @08:33PM

    by gawdonblue (412) on Sunday February 05, @08:33PM (#1290396)

    Signed-packed-decimal where the + or - could also be stored in one of those nibbles, possibly either the leading or trailing one.

    Those were the days: Print out the 600-page stackdump, find where the control register was pointing, have a look at that op-code and A and B values, determine if any other registers were involved and what their value(s) were.

    Back then spirits were brave, the stakes were high, men were real men, women were real women and small furry creatures from Alpha Centauri were real small furry creatures from Alpha Centauri.

  • (Score: 4, Insightful) by Mojibake Tengu on Sunday February 05, @08:39PM

    by Mojibake Tengu (8598) on Sunday February 05, @08:39PM (#1290397) Journal

    To the DAA (Decimal Adjust after Addition) opcode 27 there is also a complementary instruction DAS (Decimal Adjust after Subtraction) opcode 2F.

    Both instructions are invalid in 64 long mode so totally obsolete for any late software.

    Their original purpose was intended for porting then existing software from mainframes, where huge numbers financial arithmetic (remember Italian Lira?) was routinely done in BCD/EBCDIC, to the new 16bit PC Intel platform as packed BCD format. Useless as it stands.

    --
    The edge of 太玄 cannot be defined, for it is beyond every aspect of design
  • (Score: 3, Insightful) by bzipitidoo on Monday February 06, @02:32AM (5 children)

    by bzipitidoo (4388) Subscriber Badge on Monday February 06, @02:32AM (#1290416) Journal

    In previous stories about the x86, I held up the existence and implementation of packed decimal math as one of the reasons why the x86 is such a bad architecture. The implementation is crummy, and I learned what I'm sure is the reason why: to avoid a patent.

    However, it is worse that packed decimal arithmetic is in there at all. It's useless cruft. It's there because business and finance people who don't get that math is math had too much influence over architecture design. It's throwing them a bone, and has nothing to do with good design.

    • (Score: 5, Insightful) by KilroySmith on Monday February 06, @03:52AM (4 children)

      by KilroySmith (2113) on Monday February 06, @03:52AM (#1290428)

      I'm gonna assume that you probably weren't even born when the 8086 was designed, much less aware of the engineering and marketing tradeoffs of trying to capture marketshare in 1976. It was the first generation of 16-bit microprocessors, and stood at a proud 29,000 transistors - a wee bit smaller than the 6,000,000,000 in a modern CPU. "Useless cruft" wasn't much tolerated, because there simply wasn't space for it. BCD was likely included because Intel felt it would help them win business from the the Z8000, M68000, and NSC32016 microprocessors, maybe even to take some business from the older 16-bit minicomputers out there. "The implementation is crummy" - everything was crummy in 1976, for pete's sake. The shortest instructions took 2 clock cycles (in addition to the 4 clock cycles necessary to read the instruction from RAM); the longest instructions were 20 or more cycles long. None of this superscalar nonsense for us, no sir.

      Oh, that IBM had chosen the 68008 for the IBM PC, rather than the 8086. Probably wouldn't make any difference now - everything has an instruction decode stage now - but would have made my early engineering career a whole lot simpler.

      • (Score: 4, Interesting) by bzipitidoo on Monday February 06, @04:19PM (3 children)

        by bzipitidoo (4388) Subscriber Badge on Monday February 06, @04:19PM (#1290475) Journal

        Assume that I'm ignorant? Not a nice assumption. As to my age, I'll leave you to guess. And, come on, that's an ugly and incorrect blanket criticism to say everything was crummy in 1976. Not relevant to the discussion of DAA.

        Despite the severe constraints on resources, they nevertheless used some of that valuable space and transistor count for packed decimal math. That's too much tolerance of useless cruft.

        Consider the way they implemented it. So, you can't have a flag for decimal mode like the 6502, because that infringes on a patent. So, why not have a Packed Decimal Add (PDA) instruction, instead of this DAA instruction? Instead of ADD; DAA pairs to do packed decimal addition, it could have been one instruction, PDA. So, why go with DAA? I got an idea why. It could be to avoid another patent, but I think not. I guess the designers did realize packed decimal math instructions were a waste, and chose this ADD;DAA method because that was the minimum way to add decimal math to the architecture. It was the way that increased the transistor count the least, and the heck with the method being slow. They were counting on these instructions being rarely or never used, so that it wouldn't matter that the design was so awful. But their existence gave marketing what they wanted, a feature they could extol to sell the processor to business people.

        • (Score: 3, Interesting) by RS3 on Monday February 06, @06:49PM (2 children)

          by RS3 (6367) on Monday February 06, @06:49PM (#1290492)

          I may be way off base, and I don't know enough to say whether Intel's BCD math was good or bad, but IIRC many small processors were driving 7-segment displays, and BCD math greatly simplified the external circuitry between the uP and display.

          • (Score: 3, Insightful) by bzipitidoo on Tuesday February 07, @12:44AM (1 child)

            by bzipitidoo (4388) Subscriber Badge on Tuesday February 07, @12:44AM (#1290544) Journal

            My guess on this one is to think about where you find 7 segment displays: pocket calculators, digital watches, early electronic cash registers, gas pumps, and pretty much any embedded use case that requires a numeric display. For all those cases, an 8086 is massive overkill. These things use only a few bytes of memory, what need to be able to address an entire megabyte, as the 8086 can?

            Also, converting from the 4 bits of a packed decimal digit to the 7 segments to light up is relatively easy. Takes somewhere around a dozen logic gates. The harder part to handle can be the type of display. An LED display is easy. An LCD display, however, has to be pulsed. For portable devices, LED displays lost out to LCD because LCD takes much less power.

            • (Score: 2) by RS3 on Tuesday February 07, @01:40AM

              by RS3 (6367) on Tuesday February 07, @01:40AM (#1290548)

              Yes, I kind of agree to all. But you may be hyperfocusing on the display aspect. That CPU might be doing a ton of actual CPU stuff.

              If all you're doing is a calculator or something simple, you might not use an 8086/8088, maybe an 8085 (or other 8-bit of the day).

              Also, remember, the 8085, etc., were the new kids on the block- small, microprocessors, doing all kinds of simple functions that nowadays we might use a PIC or Arduino (talk about overkill!). They opened up huge new worlds of system designs that couldn't be done easily / cheaply before microprocessors existed.

              But if you needed to do much more calculation / computation, you might need more CPU power than 8-bit CPUs. For instance, much I/C (instrumentation / controls) systems use 12-bit A/D, which gets messier with an 8-bit CPU. It's a cost tradeoff between hardware costs and software engineering / programmer costs. Also the additional CPU cycles needed to do 12-bit math on 8-bit hardware might have bogged some systems down to the point of not being feasible.

              Also remember the 8088 was the 8-bit I/O version of the 8086, again, optimized for simple 8-bit peripheral chips.

              There are many TTL and CMOS chips to do the binary to 7-segment decode. The idea was to minimize chip count- a big cost savings, esp. if you have some kind of specialized machine (laboratory, test equipment, etc.) that has many 7-segment displays.

              Actually LCD is a very different thing- the display itself is usually special / custom thing, and there are fairly large chips that decode / address / drive the display.

              All that said, LED displays are usually scan / strobed too- addressed in a matrix.

(1)