Slash Boxes

SoylentNews is people

posted by NCommander on Tuesday August 30 2016, @12:14PM   Printer-friendly
from the int-21h-is-how-cool-kids-did-it dept.

I've made no secret that I'd like to bring original content to SoylentNews, and recently polled the community on their feelings for crowdfunding articles. The overall response was somewhat lukewarm mostly on dividing where money and paying authors. As such, taking that into account, I decided to write a series of articles for SN in an attempt to drive more subscriptions and readers to the site, and to scratch a personal itch on doing a retro-computing project. The question then became: What to write?

As part of a conversation on IRC, part of me wondered what a modern day keylogger would have looked running on DOS. In the world of 2016, its no secret that various three letter agencies engage in mass surveillance and cyberwarfare. A keylogger would be part of any basic set of attack tools. The question is what would a potential attack tool have looked like if it was written during the 1980s. Back in 1980, the world was a very different place both from a networking and programming perspective.

For example, in 1988 (the year I was born), the IBM PC/XT and AT would have been a relatively common fixture, and the PS/2 only recently released. Most of the personal computing market ran some version of DOS, networking (which was rare) frequently took the form of Token Ring or ARCNet equipment. Further up the stack, TCP/IP competed with IPX, NetBIOS, and several other protocols for dominance. From the programming side, coding for DOS is very different that any modern platform as you had to deal with Intel's segmented architecture, and interacting directly with both the BIOS, and hardware. As such its an interesting look at how technology has evolved since.

Now obviously, I don't want to release a ready-made attack tool to be abused for the masses especially since DOS is still frequently used in embedded and industry roles. As such, I'm going to target a non-IP based protocol for logging both to explore these technologies, while simultaneously making it as useless as possible. To the extent possible, I will try and keep everything accessible to non-programmers, but this isn't intended as a tutorial for real mode programming. As such I'm not going to go super in-depth in places, but will try to link relevant information. If anyone is confused, post a comment, and I'll answer questions or edit these articles as they go live.

More past the break ...

Looking At Our Target

Back in 1984, IBM released the Personal Computer/AT which can be seen as the common ancestor of all modern PCs. Clone manufacturers copied the basic hardware and software interfaces which made the AT, and created the concept of PC-compatible software. Due to the sheer proliferation of both the AT and its clones, these interfaces became a de-facto standard which continues to this very day. As such, well-written software for the AT can generally be run on modern PCs with a minimum of hassle, and it is completely possible to run ancient versions of DOS and OS/2 on modern hardware due to backwards compatibility.

A typical business PC of the era likely looked something like this:

  • An Intel 8086 or 80286 processor running at 4-6 MHz
  • 256 kilobytes to 1 megabyte of RAM
  • 5-20 MiB HDD + 5.25 floppy disk drive
  • Operating System: DOS 3.x or OS/2 1.x
  • Network: Token Ring connected to a NetWare server, or OS/2 LAN Manager
  • Cost: ~$6000 USD in 1987

To put that in perspective, many of today's microcontrollers have on-par or better specifications than the original PC/AT. From a programming perspective, even taking into account resource limitations, coding for the PC/AT is drastically different from many modern systems due to the segmented memory model used by the 8086 and 80286. Before we dive into the nitty-gritty of a basic 'Hello World' program, we need to take a closer look at the programming model and memory architecture used by the 8086 which was a 16-bit processor.

Real Mode Programming

If the AT is the common ancestor of all PC-compatibles, then the Intel 8086 is processor equivalent. The 8086 was a 16-bit processor that operated at a top clock speed of 10 MHz, had a 20-bit address bus that supported up to 1 megabyte of RAM, and provided fourteen registers. Registers are essentially very fast storage locations physically located within the processor that were used to perform various operations. Four registers (AX, BX, CX, and DX) are general purpose, meaning they can be used for any operation. Eight (described below) are dedicated to working with segments, and the final registers are the processor's current instruction pointer (IP), and state (FLAGS) An important point in understanding the differences between modern programming environments and those used by early PCs deals with the difference between 16-bit and 32/64-bit programming. At the most fundamental level, the number of bits a processor has refers to the size of numbers (or integers) it works with internally. As such, the largest possible unsigned number a 16-bit processor can directly work with is 2 to the power of 16 (minus 1) or 65,535. As the name suggests, 32-bit processors work with larger numbers, with the maximum being 4,294,967,296. Thus, a 16-bit processor can only reference up to 64 KiB of memory at a given time while a 32-bit processor can reference up to 4 GiB, and a 64-bit processor can reference up to 16 exbibytes of memory directly.

At this point, you may be asking yourselves, "if a 16-bit processor could only work with 64 KiB RAM directly, how did the the 8086 support up to 1 megabyte?" The answer comes from the segmented memory model. Instead of directly referencing a location in RAM, addresses were divided into two 16-bit parts, the selector and offset. Segments are 64 kilobyte selections of RAM. They could generally be considered the computing equivalent of a postal code, telling the processor where to look for data. The offset then told the processor where exactly within that segment the data it wanted was located. On the 8086, the selector represented the top 16-bits of an address, and then the offset was added to it to create 20-bits (or 1 megabyte) of addressable memory. Segments and offsets are referenced by the processor in special registers; in short you had the following:

  • Segments
    • CS: Code segment - Application code
    • DS: Data segment - Application data
    • SS: Stack segment - Stack (or working space) location
    • ES: Extra segment - Programmer defined 'spare' segment
  • Offsets
    • SI - Source Index
    • DI - Destination Index
    • BP - Base pointer
    • SP - Stack pointer

As such, memory addresses on the 8086 were written in the form of segment:offset. For example, a given memory address of 0x000FFFFF could be written as F000:FFFF. As a consequence, multiple segment:offset pairs could refer to the same bit of memory; the addresses F555:AAAF, F000:FFFF, and F800:7FFF all refer to the same bit of memory. The segmentation model also had important performance and operational characteristics to consider.

The most important was that since data could be within the same segment, or a different type of segment, you had two different types of pointers to work with them. Near pointers (which is just the 16-bit offset) deal with data within the same segment, and are very fast as no state information has to be changed to reference them. Far pointers pointed to data in a different selector and required multiple operations to work with as you had to not only load and store the two 16-bit components, you had to change the segment registers to the correct values. In practice, that meant far pointers were extremely costly in terms of execution time. The performance hit was bad enough that it eventually lead to one of the greatest (or worst) backward compatibility hacks of all time: the A20 gate, something which I could write a whole article on.

The segmented memory model also meant that any high level programming languages had to incorporate lower-level programming details into it. For example, while C compilers were available for the 8086 (in the form on Microsoft C), the C programming language had to be modified to work with the memory model. This meant that instead of just having the standard C pointer types, you had to deal with near and far pointers, and the layout of data and code within segments to make the whole thing work. This meant that coding for pre-80386 processors required code specifically written for the 8086 and the 80286.

Furthermore, most of the functionality provided by the BIOS and DOS were only available in the form of interrupts. Interrupts are special signals used by the process that something needs immediate attention; for examine, typing a key on a keyboard generates a IRQ 1 interrupt to let DOS and applications know something happened. Interrupts can be generated in software (the 'int' instruction) or hardware. As interrupt handling can generally only be done in raw assembly, many DOS apps of the era were written (in whole or in part) in intel assembly. This brings us to our next topic: the DOS programming model

Disassembling 'Hello World'

Before digging more into the subject, let's look at the traditional 'Hello World' program written for DOS. All code posted here is compiled with NASM

; Hello.asm - Hello World

section .text
org 0x100

 mov ah, 9
 mov dx, str_hello
 int 0x21

section .data
str_hello: db "Hello World",'$'

Pretty, right? Even for those familiar with 32-bit x86 assembly programming may not be able to understand this at first glance what this does. To prevent this from getting too long, I'm going to gloss over the specifics of how DOS loads programs, and simply what this does. For non-programmers, this may be confusing, but I'll try an explain it below.

The first part of the file has the code segment (marked 'section .text' in NASM) and our program's entry point. With COM files such as this, execution begins at the top of file. As such, _entry is where we enter the program. We immediately execute two 'mov' instructions to load values into the top half of AX (AH), and a near pointer to our string into DX. Ignore 9 for now, we'll get to it in a moment. Afterwords, we trip an interrupt, with the number in hex (0x21) after it being the interrupt we want to trip. DOS's functions are exposed as interrupts on 0x20 to 0x2F; 0x21 is roughly equivalent to stdio in C. 0x21 uses the value in AX to determine which subfunction we want, in this case, 9, to write to console. DOS expects a string terminated in $ in DX; it does not use null-terminated strings like you may expect. After we return from the interrupt, we simply exit the program by calling ret.

Under DOS, there is no standard library with nicely named functions to help you out of the box (though many compilers did ship with these such as Watcom C). Instead, you have to load values into registers, and call the correct interrupt to make anything happen. Fortunately, lists of known interrupts are available to make the process less painful. Furthermore, DOS only provides filesystem and network operations. For anything else, you need to talk to the BIOS or hardware directly. The best way to think of DOS from a programming perspective is essentially an extension of the basic input/output functionality that IBM provided in ROM rather than a full operating system.

We'll dig more into the specifics on future articles, but the takeaway here is that if you want to do anything in DOS, interrupts and reference tables are the only way to do so.


As an introduction article, we looked at the basics of how 16-bit real mode programming works and the DOS programming model. While something of a dry read, it's a necessary foundation to understand the basic building blocks of what is to come. In the next article, we'll look more at the DOS API, and terminate-and-stay resident programs, as well as hooking interrupts.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Thexalon on Tuesday August 30 2016, @01:49PM

    by Thexalon (636) on Tuesday August 30 2016, @01:49PM (#395264)

    8086? Luxury! Try the 8088, the original chip on the IBM PC, where they were bragging about being able to address 1 whole megabyte of memory using a weird system based on "segments" where the actual physical address was determined by both a "segment" and an "address" register, e.g. DS Undocumented DOS: A Programmer's Guide to Reserved MS-DOS Functions and Data Structures , which can show you all about how to do things like figure out how all the memory blocks are allocated, how FAT is laid out on disk, and how the APIs all work at a very low level. Knowing this stuff had some real benefits: For example, my dad and I were able to recover a bricked Windows 3.1 system by calculating out the location of the FAT filesystem structures on disk, and then discovering that somebody had managed to delete Config.sys (which still mattered, a lot, on Windows 3.1).

    The only thing that stops a bad guy with a compiler is a good guy with a compiler.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by Thexalon on Tuesday August 30 2016, @01:51PM

    by Thexalon (636) on Tuesday August 30 2016, @01:51PM (#395267)

    Dang it, formatting!

    When talking about addressing, I was referring to DS << 4 + DX, which apparently we need HTML special chars to make work on Soylent.

    The only thing that stops a bad guy with a compiler is a good guy with a compiler.
  • (Score: 2) by NCommander on Tuesday August 30 2016, @02:11PM

    by NCommander (2) Subscriber Badge <> on Tuesday August 30 2016, @02:11PM (#395279) Homepage Journal

    That's essentially how UNDELETE worked. DOS used a special character in the FAT to mark a file as deleted. Thus if you walk the FAT, and things haven't been overriden, undeletion is possible. Format basically worked the same way which is why UNFORMAT was a thing.

    As for using the 8086, well, the 8088 in the XT is not fully forward compatible. DOS-compatible software for the XT will work on AT-compatible systems, but anything that talked to the BIOS (aka almost everything) would generally break on the XT->AT jump. I haven't actually decided if I want to try and make this run on a real AT (via emulation), but it might be a nifty challenge, and then do a follow up showing it running on bare metal on some i7 running DOS 6.22 or something. I think I have a i7 with a NE2000 compatible NIC which should at least in theory work. Funny enough, the i7 was the first processor that simply said "eh, fuck it", and locks A20 to on, which means DOS 3.3 won't run in it due to lacking the wrap around. Later DOS versions should be fine though.

    Still always moving
    • (Score: 2) by sjames on Tuesday August 30 2016, @08:42PM

      by sjames (2882) on Tuesday August 30 2016, @08:42PM (#395427) Journal

      The AT had an 80286. It also had a few odd workarounds to increase compatibility with the 8088 like an AND gate on the A20 address line so a few programs that depended on addresses wrapping at the 1MB mark would work. Extended memory worked by enabling the A20 line and using dirty segment tricks to access beyond 1MB while still in real mode.

      • (Score: 2) by NCommander on Tuesday August 30 2016, @09:14PM

        by NCommander (2) Subscriber Badge <> on Tuesday August 30 2016, @09:14PM (#395438) Homepage Journal

        I must have been dead last night when I wrote story and comments. For some reason I thought the AT had a max of 1 MiB, but wikipedia says it had a max of 16 MiB. I think somewhere between my notes and backslash (admin console) AT and XT got crossed. It's hard to tell based on Google what a typical late 80s AT would have looked like though having 1-2 MiB of RAM probably is in the realm of reasonable.

        Actually looking at the Wikipedia page, an AT with enough RAM could probably have run Windows 3.1 Standard and DOS 5 if you put enough RAM into it. Maybe I'll show off 80286 protected mode if I can think of a reason to enter it; you can bounce back to real mode via the triple-fault check.

        Still always moving
        • (Score: 2) by sjames on Tuesday August 30 2016, @09:43PM

          by sjames (2882) on Tuesday August 30 2016, @09:43PM (#395454) Journal

          IIRC Win 3.1 was a problem on the AT due to the crippled (compared to 386) protected mode. It was also dog slow in protected mode (and so the real mode segment trick to access extended memory).

          In practice, protected mode was avoided as much as possible until the '386 got it right, including the ability to return to real mode without a reset or triple fault.

        • (Score: 2) by dry on Wednesday August 31 2016, @05:51AM

          by dry (223) on Wednesday August 31 2016, @05:51AM (#395589) Journal

          I think that the AT (286) could actually address a GB of virtual memory in protected mode. 32 bit OS/2 limited itself to a GB of address space (512MBs user) so the 16 bit API could address all memory.

          • (Score: 2) by NCommander on Thursday September 01 2016, @01:57AM

            by NCommander (2) Subscriber Badge <> on Thursday September 01 2016, @01:57AM (#395985) Homepage Journal

            The 80286 is not at 32-bit processor; it could max address 24 bits of memory, for 16 MiB of total. Intel jumped to 32-bit with the 80386.

            Still always moving
            • (Score: 2) by NCommander on Thursday September 01 2016, @03:02AM

              by NCommander (2) Subscriber Badge <> on Thursday September 01 2016, @03:02AM (#396002) Homepage Journal

              Sorry, I should sai not a 32-bit clean processor as it didn't have a 32-bit address bus.

              Still always moving
            • (Score: 2) by dry on Thursday September 01 2016, @03:31AM

              by dry (223) on Thursday September 01 2016, @03:31AM (#396011) Journal

              Address 24 bits of physical memory. In protected mode, the segment selector was more versatile allowing 1GB of virtual memory with each task seeing 16MB max. Play with the GDT and it is possible for one process to access the full GB though not very practical on the 286
              See eg []
              To quote the relevant section under Memory Addressing in 80286

              Protected Virtual Addressing Mode (PVAM) - In this we have 1 GByte of virtual memory and 16 Mbyte of physical memory. The address is 24 bit. To enter PVAM mode, Processor Status Word (PSW) is loaded by the instruction LPSW.

              • (Score: 2) by NCommander on Thursday September 01 2016, @04:16AM

                by NCommander (2) Subscriber Badge <> on Thursday September 01 2016, @04:16AM (#396028) Homepage Journal

                Looks like you technically correct. The best kind of correct.

                The (un)fortunate truth is outside of Intel programming manuals which is a last resort due to how blasted dry they are, segmented protected mode is basically undocumented. Its close to unheard of that a period specific 80286 would have even hit the 16 MiB limit of RAM, and not a single online resource I've seen talking about the LGDT actually talk about setting up true segments. They basically set a ring 0/3 segment, and call it good. The LDT gets a footnote at best.

                Since the MMU is enabled in protected mode, and W^X is also a thing, you can't call into real mode code assuming it would work in standard protected mode since DOS expected that any address could be RWX even if you limited the GDT to have a ring 0 segment where DOS would expect it. I'd love a chance to play with segmented protected mode in this article, but I can't think of a real world way it could work; on very low memory systems, you don't have anything beyond convential memory and thus the issue is moot. Newer systems might have RAM above 1 MiB, *but* entering protected mode would break standard DOS applications (and EMM386) unless I did some complicated magic to shunt down to real mode, plus making sure it played nice with anything using a DOS extender.

                Still always moving
  • (Score: 2) by LoRdTAW on Tuesday August 30 2016, @02:21PM

    by LoRdTAW (3755) on Tuesday August 30 2016, @02:21PM (#395284) Journal

    They were both i86 and could run the same code. The big difference was 8088 had only an 8 bit data bus multiplexed on the address bus. Since it could only transfer one byte per I/O read, it was half as slow as the 8086's 16 bit memory bus. This made the part cheaper to produce but a big bottleneck memory wise. It was also I/O signal compatible with the older 8 bit 8085.

  • (Score: 2, Informative) by jimtheowl on Tuesday August 30 2016, @02:27PM

    by jimtheowl (5929) on Tuesday August 30 2016, @02:27PM (#395290)

    The 8088 is just a cheaper model of the 8086 (8 bit external data bus instead of 16 bit), not a different design.

    The 8086 precedes it.

    They did the same thing with later chips, such as the 80386/80386SX