I've made no secret that I'd like to bring original content to SoylentNews, and recently polled the community on their feelings for crowdfunding articles. The overall response was somewhat lukewarm mostly on dividing where money and paying authors. As such, taking that into account, I decided to write a series of articles for SN in an attempt to drive more subscriptions and readers to the site, and to scratch a personal itch on doing a retro-computing project. The question then became: What to write?
As part of a conversation on IRC, part of me wondered what a modern day keylogger would have looked running on DOS. In the world of 2016, its no secret that various three letter agencies engage in mass surveillance and cyberwarfare. A keylogger would be part of any basic set of attack tools. The question is what would a potential attack tool have looked like if it was written during the 1980s. Back in 1980, the world was a very different place both from a networking and programming perspective.
For example, in 1988 (the year I was born), the IBM PC/XT and AT would have been a relatively common fixture, and the PS/2 only recently released. Most of the personal computing market ran some version of DOS, networking (which was rare) frequently took the form of Token Ring or ARCNet equipment. Further up the stack, TCP/IP competed with IPX, NetBIOS, and several other protocols for dominance. From the programming side, coding for DOS is very different that any modern platform as you had to deal with Intel's segmented architecture, and interacting directly with both the BIOS, and hardware. As such its an interesting look at how technology has evolved since.
Now obviously, I don't want to release a ready-made attack tool to be abused for the masses especially since DOS is still frequently used in embedded and industry roles. As such, I'm going to target a non-IP based protocol for logging both to explore these technologies, while simultaneously making it as useless as possible. To the extent possible, I will try and keep everything accessible to non-programmers, but this isn't intended as a tutorial for real mode programming. As such I'm not going to go super in-depth in places, but will try to link relevant information. If anyone is confused, post a comment, and I'll answer questions or edit these articles as they go live.
More past the break ...
Back in 1984, IBM released the Personal Computer/AT which can be seen as the common ancestor of all modern PCs. Clone manufacturers copied the basic hardware and software interfaces which made the AT, and created the concept of PC-compatible software. Due to the sheer proliferation of both the AT and its clones, these interfaces became a de-facto standard which continues to this very day. As such, well-written software for the AT can generally be run on modern PCs with a minimum of hassle, and it is completely possible to run ancient versions of DOS and OS/2 on modern hardware due to backwards compatibility.
A typical business PC of the era likely looked something like this:
To put that in perspective, many of today's microcontrollers have on-par or better specifications than the original PC/AT. From a programming perspective, even taking into account resource limitations, coding for the PC/AT is drastically different from many modern systems due to the segmented memory model used by the 8086 and 80286. Before we dive into the nitty-gritty of a basic 'Hello World' program, we need to take a closer look at the programming model and memory architecture used by the 8086 which was a 16-bit processor.
If the AT is the common ancestor of all PC-compatibles, then the Intel 8086 is processor equivalent. The 8086 was a 16-bit processor that operated at a top clock speed of 10 MHz, had a 20-bit address bus that supported up to 1 megabyte of RAM, and provided fourteen registers. Registers are essentially very fast storage locations physically located within the processor that were used to perform various operations. Four registers (AX, BX, CX, and DX) are general purpose, meaning they can be used for any operation. Eight (described below) are dedicated to working with segments, and the final registers are the processor's current instruction pointer (IP), and state (FLAGS)
An important point in understanding the differences between modern programming environments and those used by early PCs deals with the difference between 16-bit and 32/64-bit programming. At the most fundamental level, the number of bits a processor has refers to the size of numbers (or integers) it works with internally. As such, the largest possible unsigned number a 16-bit processor can directly work with is 2 to the power of 16 (minus 1) or 65,535. As the name suggests, 32-bit processors work with larger numbers, with the maximum being 4,294,967,296. Thus, a 16-bit processor can only reference up to 64 KiB of memory at a given time while a 32-bit processor can reference up to 4 GiB, and a 64-bit processor can reference up to 16 exbibytes of memory directly.
At this point, you may be asking yourselves, "if a 16-bit processor could only work with 64 KiB RAM directly, how did the the 8086 support up to 1 megabyte?" The answer comes from the segmented memory model. Instead of directly referencing a location in RAM, addresses were divided into two 16-bit parts, the selector and offset. Segments are 64 kilobyte selections of RAM. They could generally be considered the computing equivalent of a postal code, telling the processor where to look for data. The offset then told the processor where exactly within that segment the data it wanted was located. On the 8086, the selector represented the top 16-bits of an address, and then the offset was added to it to create 20-bits (or 1 megabyte) of addressable memory. Segments and offsets are referenced by the processor in special registers; in short you had the following:
As such, memory addresses on the 8086 were written in the form of segment:offset. For example, a given memory address of 0x000FFFFF could be written as F000:FFFF. As a consequence, multiple segment:offset pairs could refer to the same bit of memory; the addresses F555:AAAF, F000:FFFF, and F800:7FFF all refer to the same bit of memory. The segmentation model also had important performance and operational characteristics to consider.
The most important was that since data could be within the same segment, or a different type of segment, you had two different types of pointers to work with them. Near pointers (which is just the 16-bit offset) deal with data within the same segment, and are very fast as no state information has to be changed to reference them. Far pointers pointed to data in a different selector and required multiple operations to work with as you had to not only load and store the two 16-bit components, you had to change the segment registers to the correct values. In practice, that meant far pointers were extremely costly in terms of execution time. The performance hit was bad enough that it eventually lead to one of the greatest (or worst) backward compatibility hacks of all time: the A20 gate, something which I could write a whole article on.
The segmented memory model also meant that any high level programming languages had to incorporate lower-level programming details into it. For example, while C compilers were available for the 8086 (in the form on Microsoft C), the C programming language had to be modified to work with the memory model. This meant that instead of just having the standard C pointer types, you had to deal with near and far pointers, and the layout of data and code within segments to make the whole thing work. This meant that coding for pre-80386 processors required code specifically written for the 8086 and the 80286.
Furthermore, most of the functionality provided by the BIOS and DOS were only available in the form of interrupts. Interrupts are special signals used by the process that something needs immediate attention; for examine, typing a key on a keyboard generates a IRQ 1 interrupt to let DOS and applications know something happened. Interrupts can be generated in software (the 'int' instruction) or hardware. As interrupt handling can generally only be done in raw assembly, many DOS apps of the era were written (in whole or in part) in intel assembly. This brings us to our next topic: the DOS programming model
Before digging more into the subject, let's look at the traditional 'Hello World' program written for DOS. All code posted here is compiled with NASM
; Hello.asm - Hello World
mov ah, 9
mov dx, str_hello
str_hello: db "Hello World",'$'
; Hello.asm - Hello World
mov ah, 9
mov dx, str_hello
str_hello: db "Hello World",'$'
Pretty, right? Even for those familiar with 32-bit x86 assembly programming may not be able to understand this at first glance what this does. To prevent this from getting too long, I'm going to gloss over the specifics of how DOS loads programs, and simply what this does. For non-programmers, this may be confusing, but I'll try an explain it below.
The first part of the file has the code segment (marked 'section .text' in NASM) and our program's entry point. With COM files such as this, execution begins at the top of file. As such, _entry is where we enter the program. We immediately execute two 'mov' instructions to load values into the top half of AX (AH), and a near pointer to our string into DX. Ignore 9 for now, we'll get to it in a moment. Afterwords, we trip an interrupt, with the number in hex (0x21) after it being the interrupt we want to trip. DOS's functions are exposed as interrupts on 0x20 to 0x2F; 0x21 is roughly equivalent to stdio in C. 0x21 uses the value in AX to determine which subfunction we want, in this case, 9, to write to console. DOS expects a string terminated in $ in DX; it does not use null-terminated strings like you may expect. After we return from the interrupt, we simply exit the program by calling ret.
Under DOS, there is no standard library with nicely named functions to help you out of the box (though many compilers did ship with these such as Watcom C). Instead, you have to load values into registers, and call the correct interrupt to make anything happen. Fortunately, lists of known interrupts are available to make the process less painful. Furthermore, DOS only provides filesystem and network operations. For anything else, you need to talk to the BIOS or hardware directly. The best way to think of DOS from a programming perspective is essentially an extension of the basic input/output functionality that IBM provided in ROM rather than a full operating system.
We'll dig more into the specifics on future articles, but the takeaway here is that if you want to do anything in DOS, interrupts and reference tables are the only way to do so.
As an introduction article, we looked at the basics of how 16-bit real mode programming works and the DOS programming model. While something of a dry read, it's a necessary foundation to understand the basic building blocks of what is to come. In the next article, we'll look more at the DOS API, and terminate-and-stay resident programs, as well as hooking interrupts.
Look into https://en.wikipedia.org/wiki/Borland_Turbo_C. [wikipedia.org] It came out about then and gave MS C a real run for its money. The libraries were good and the documentation was actually helpful. I bought in at V1.5; V2.0 Professional added Turbo Assembler and Turbo Debugger; I learned a *lot* about how C works, just from stepping code I'd written and compiled in TDB's CPU pane and watching the stack-frame mechanism in action. The debugger in the IDE didn't have that assembly-level detail but it made chasing bugs fun. Versions up to V3.1 have an IDE that'll run in a real-mode system, after that it's Windows-dependent.
Another toolbox was Spontaneous Assembly, SpontAsm. My copy of the V2 manual is dated 1989,1990, so it came out at the late end of your window. It consisted of C-like libraries in assembly, for assembly, again with good documentation; working with it made X86 assembly comprehensible for me. Warning: the textmode windowing code in it can be slo-o-ow, especially on a 12MHz '286, which is what I had then (but I was probably abusing it with constant refreshes in the modem program I wrote).
I actually had tried to use OpenWatcom to get a period-like assembler for writing this but I think the x86_16 bitrotted out of it as when I did mov dx, offset str_hello, I got a garbage pointer (the assembler set dx to the top of the CS). Combined with the fact that WASM16 is essentially undocumented and I gave up in frustration and switched to NASM since I at least know what I'm doing with that.
I knew Borland C had been released as freeware, but I didn't realize the older 16-bit versions were as well, and it appears Turbo Assembler was released as well. I may have to install it on FreeDOS and go really period specific. I need an actual linker to build an EXE so I don't have to deal with relocating things in conventional RAM; writing COMs as a TSR is generally a "bad idea" because they load to 0x100 and unless you relocate it yourself, you can get clobbered really really easily. If I load high into UMA or at least the top of conventional RAM, it makes it that much easier to survive shitty software. Course I might allow myself to pretend I'm on a 80286, with >1 MiB of RAM, and do 80286 protected mode magic, and just leave a thunk in conventional memory to kick me to and from. In that case I need to patch the memory map calls from the BIOS and hide myself in EMS (by marking that region of memory as unavailable).
The tricky bit is going from protected mode on the 80286 back into real mode requires a triple fault and catching the reset vector to go back into real mode. It wasn't until the 80386 until a quick way to go protected->real mode existed.
> writing COMs as a TSR is generally a "bad idea" because they load to 0x100 and unless you relocate it yourself, you can get clobbered really really easily.
Umm, it's been awhile, but IIRC you load it as a normal *.com (64kcode, 64kdata max), it gets loaded into normal memory just like any other program, hooking into the interrupt table as required, and the TSR process fiddles the pointers so that subsequent programs load above it. TSRs is how various drivers get added, and best if they're loaded first thing (from config.sys and autoexec.bat) because they effectively lock up the memory below them by fiddling with the pointers, including the memory footprint of any program running at the time the TSR is loaded.
I currently run BC3.1's textmode IDE in DOSbox (haven't really touched the Borland C in a long while, but it works, and I run old-DOS-OrCAD v3 a lot for schematics so I know that environment is robust); TC2 should run there just fine so you've got normal tooling available for development.
TASM also has a remote capability, in case that proves helpful for your purposes: install its hooks on the target machine, then step/run/examine (across a serial link) using a more comfy console. IIRC it's not quite as flexible as the gdb/gvd combo I use in Linux C, but it can step code down at the assembly level as well as C source level.
> TASM also has a remote capability
Oops, no, that's TDB.
I'm working from very old Microsoft documentation describing how TSRs work in practice, but the damn shit is hard to follow without a copy of the Intel System Reference at hand to explain the 16-bit magic. My understanding is when you call TSR and give it the paragraphs you want to save, it just leaves them where they are in conventional memory. No rebase is implied. I was going to use this weekend with a debugger and NASM working out the exact behavior of TSRs on DOS. I've done x86_16 programming before in firmware, but not much in DOS before. I've found plenty of code examples on how to write a TSR, but most of them lack linker invocation. Given the lack of .org 0x100 in most of the example source I've seen, I assumed they were being linked to EXE and then rebased by the LE loader.
A key to TSRs is to write PIC code. That is, code that uses only relative accesses so that it runs OK at any arbitrary address. Then you can just grab a bit of RAM and copy it in.
If you haven't seen it, get Ralf Brown's Interrupt List [cmu.edu]. It has a lot more than a list of interrupts.
But NCommander, have you looked at bitsavers.org?
Look in the file listing file about a page down. (Recent.txt)
It has a treasure trove of documentation in t going from Pentium Pro era back to Univac (1940s!!!!)
They are adding stuff on a weekly if not daily basis and could use all the help they can get in case any other soylentils have vintage software or hardware documentation, or firmware/oses dating back 20 years or more.
Hope that helps!
Watch out, Borland C has some serious bugs. Stay under 64k of data memory, and you should be safe. Go over that, and you're in trouble.
A minor bug in version 2.0 was that x>>=1 was translated into assembler incorrectly. The compilation would abort and the IDE gave the programmer a dump of assembler code. Was simple enough to workaround by writing x = x >> 1 instead, but still, doesn't give one a feeling of confidence in the compiler.
The killer bug was the inability to handle segmented memory correctly. On programs that reserved more than 64K for data, the compiler would generate code that reused the same 64k segment instead of separate segments. Two different variables would end up trying to use the same address. While the x>>=1 problem was fixed sometime between version 2.0 and 4.5, the mismanagement of segmented memory was still present in 4.5 and I think 5.1. I learned of this problem when I was trying to figure out why a program I'd written just was not working. In the debugger, I put watches on everything and saw an element of an array change at the same time as a loop counter was incremented. I switched to gcc in Linux and the problem went away.
I was planning on doing this pure assembly, no C. I'd like to think Turbo Assembler is relatively bug free; writing a x86_16 assembler isn't exactly rocket science.
TSRs need to be written in assembly anyway because you're operating as ISRs; its a bad idea to try and write those in C unless you're very sure of what your compiler is doing and the code is emitting because you have to push state, disable interrupts, do X, pop state, then iret back into DOS. If I actually want to do anything beside load and store stuff, I'm going to have to override the return vector before I iret, My plan was to put a mini-binary in UMA with the ISR, override the return vector on the stack, and iret into call I control.
That save me a lot of headache as DOS interrupts are not reentrant; I can't safely call one within the ISR unless I'm sure I'm not in DOS (INDOS=0). That way I'm not fighting DOS to do things like operate the network controller. Or in other words, I'm going to make DOS multitask :)
I gave up in frustration and switched to NASMHaving used both you are better off with NASM (which I think has a MASM compat mode). I think I have the docs for the watcom one kicking around here somewhere in book form (I bought it ages ago). If I remember right the 32bit asm compiler pretty much spits out 16 bit codes. You have to go out of your way to get it to spit out 32bit codes and its compat with MASM is crappy. Just stay away from the other registers and it should encode correctly. It will fault out pretty quick if you get something wrong :) MASM would be the one to get if you want to stick to period style coding. It had much better support and everyone used it. NASM is at least semi supported still these days so you can get some help easily. If I were doing any ASM work with DOS NASM is probably the one I would pick.
I wrote a few keyboard hooks myself. I usually used turboc and watcomc. They made it pretty dead easy to do. Basically just write the hook. Call the right dos functions with your function pointer and make sure you call the old hook at the end and you were usually good.
I recommend this book https://www.amazon.com/dp/0201403994 [amazon.com]One of the better ones if you want to understand the PC/DOS architecture. There is supposedly a 4th ed. So you probably could get that.
I personally am trying to do some win3.1 reverse engineering. Have not found a 'good' bit on how the Win16NE format works, Win32PE is very well documented. Very few of the free disassemblers work with it. IDAPro supposedly does but the free ones do not. The win16 platform is interesting as it depends heavily on DOS. But in many cases basically sucks the brains out of it. Win9x was even more brain sucking but at the bottom still depended on DOS for a few things.
Well, I wanted to use free tools if at all possible so others can take the code and play with it. 16-bit MASM hasn't been part of Visual C++ in decades, and I believe it only persists in the Windows DDK. I'm using FreeDOS to test my stuff in VirtualBox with FTPing the files across (annoying but workable).
Part of me is tempted to see if I could get one of the older LANman implementations to talk to it; I think Samba still has support for LANman, though I don't have a linux install on this laptop due to technical reasons (my old laptop committed suicide on my last week, and my current contract work is all Windows specific)
Oh I agree with you. Although, MASM can be handy to have around for those old asm scripts that still crawling around on the net. Not sure it was ever included with the VC kit (maybe the DDK like you said). I always had to get it stand alone. It was not a cheap package for someone on a college budget. By the time they made it 'free' I had little use for it. But yeah NASM is the better choice these days. Not too sure how good that download is without the docs. Which was a large part of the package. 3 VERY thick very well written books that described how to use it. A couple of months ago I threw out some copies as we were shutting down an office that had been around since 1992 and there was a lot of *old* useless software laying around.
Also I would look into what some of the IDEs can help you with. I would bet there is a plugin for eclipse and I know there is probably one for notepad++. If you want to stick strictly 'DOS' the watcom vi editor is one of the best there is for that sort of thing. But it is probably more tied into the watcom build stack. But you may be able to bend it to use NASM as it is very configurable thru scripts.
Pretty sure LANMAN should still work on just about any windows box NT4 and up. If you are using something like win10 you may have to fiddle a couple of settings to get it to join the workgroup correctly. Think it is just a couple of checkboxes in the network settings and setting the workgroup name under the machinename. But it should be basically baked in.
DR-DOS is what I used for years. As that is what came with my computer and I didnt have 80 bucks to buy a copy of MS-DOS. At least for something that basic and it mostly worked anyway... Only had 1 or 2 programs that did not work. So it was 'good enough'.
I used to use a program called HELPPC. http://stanislavs.org/helppc/ [stanislavs.org] I personally like the format better than rbrown stuff. The rbrown stuff seems to be a bit more extensive and more up to date though. I have been using his stuff for some of my win3x spelunking. Should get ahold of my old boss. He was a wizz at that win31 stuff and he could point out what I need.
If you are using a VM for DOS programs you may want to look into some of the TSRs that help with the CPU usage (dosidle, winwait, etc). FreeDOS may already have built that in but it is worth checking into.
... I would bet there is a plugin for eclipse...
Using eclipse to write pure ASM programs (unless it is java bytecote asm) is quite blasphemous. If i remember correctly winasm would be a reasonable choice if you really wanted to use an IDE for i686 ASM
I'm using NASM and Notepad++ since at the moment I'm running primarily in Windows due to unrelated work stuff. I initially test code in DOSbox for sanity before popping it over via FTP to FreeDOS as I have yet to get lanman client setup.
Getting LANMAN to work on XP wasn't too hard. Getting it to work on Win7 was very hard and getting it to work on Win 10 is next to impossible. The problem is the authentication is just too insecure. Even OS/2 ships with a Samba server/client now so it communicate with Windows.