Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by NCommander on Tuesday September 20 2016, @01:00PM   Printer-friendly [Skip to comment(s)]
from the now-you-can-be-1337-by-knowing-what-a-far-call-is dept.

The Retro-Malware series is an experiment on original content for SoylentNews, written in the hopes to motivate people to subscribe to the site and help grow our resources. The previous article talked a bit about the programming environment imposed by DOS and 16-bit Intel segmented programming; it should be read before this one.

Before we get into this installment, I do want to apologize for the delay into getting this article up. A semi-unexpected cross-country drive combined with a distinct lack of surviving programming documentation has made getting this article written up take far longer than expected. Picking up from where we were before, today we're going to look into Terminate-and-Stay Resident programming, interrupt chaining, and get our first taste of how DOS handles conventional memory. Full annotated code and binaries are available here in the retromalware git repo.

In This Article

  • What Are TSRs
  • Interrupt Handlers And Chaining
  • Calling Conventions
  • Walking through an example TSR
  • Help Wanted

As usual, check past the break for more. In addition, if you are a licensed ham operator or have ham radio equipment, I could use your help, check the details at the end of this article.

[Continues...]

What Are TSRs?

For anyone who used DOS regularly, TSRs (short for Terminate and Stay Resident) were likely a source of both fun and frustration. Originally appearing in DOS 2.0, TSRs, as the name suggests, are programs that exit but leave some part of their code around in memory. TSRs are primarily used to provide device drivers, extended APIs, or hooks that other applications can take advantage of. At the same time, they also could be used (as we will be doing) to install invisible hooks to modify, change, or log system behaviors. In that sense, they can be considered broadly equivalent to extensions on classic Mac OS. The BIOS could be considered a special type of TSR as it's always available in memory to provide services to the operating system and applications.

From a technical perspective, a TSR is any application that executes int 21h with the right options. Ralph Brown's interrupt guide has this to say on DOS's API for TSRs:

DOS 2+ - TERMINATE AND STAY RESIDENT

AH = 31h
AL = return code
DX = number of paragraphs to keep resident

Return:
Never

Notes: The value in DX only affects the memory block containing the PSP; additional
memory allocated via AH=48h is not affected. The minimum number of paragraphs
which will remain resident is 11h for DOS 2.x and 06h for DOS 3.0+. Most TSRs can
save some memory by releasing their environment block before terminating
(see #01378 at AH=26h,AH=49h). Any open files remain open, so one should
close any files which will not be used before going resident; to access a file
which is left open from the TSR, one must switch PSP segments first (see AH=50h)

Well, for most people, I suspect that is as clear as mud. Let me try and explain it a bit better. Essentially, when a program flags to DOS that it wants to TSR, DOS simply leaves the amount of memory marked in DX alone, and marks those paragraphs (which are 16 bytes) as 'in use' so neither it nor any other well-behaving programs will attempt to use them. No relocation or copying is done as part of this process; the memory is simply marked dead, and left 'as is' as we'll see below.

This is problematic for a number of reasons. As I mentioned in the previous article, when in real mode, Intel processors can only access up to 1 MiB of memory, and it's the area where all applications, drives and device address space needs to squeeze into. Of this, only 640 kiB are normally available to applications (which is known as conventional memory). If a TSR is too large, or too many are loaded, its is very easy to run out of RAM to do anything useful with the machine. To make matters worse, DOS provides absolutely no mechanism to manage or uninstall TSRs. (Once an application is resident, it's staying there unless it is specifically designed to unhook itself and free itself from memory.) Combine that with the fact that there's no 'official' way of doing so in the DOS APIs.

This quickly lead to an era where you might need a specific boot floppy for a given application so as to have its TSRs available (such as mouse or network drivers) — and nothing else — so that there would be enough conventional memory left to fit everything in. While several third-party efforts tried to standardize TSR installation/removal — such as TesSeRact — none of them became a true de-facto standard. Furthermore, it is very possible for a TSR removal to leave memory in a fragmented state which could break other applications. An entire cottage industry of memory optimizers quickly sprang up which could load TSRs into high memory.

At this point, you may be wondering "If TSRs are so miserable, why use them?". The answer, unfortunately, is that it is the only way on DOS to provide any sort of extended functionality. DOS has no concept of shared libraries or multitasking; it was TSR or bust. This brings us to our next topic: interrupt handling.

Interrupt Handlers

While I touched on interrupts in the previous article, I didn't go into too much detail. Interrupts, simply put, are special signals sent to the processor to tell it to stop what it's doing and do something else immediately. These interrupts can be generated by either hardware or software. Interrupts essentially operate like this:

  • Processor is doing work.
  • Interrupt occurs
  • Processor saves location and jumps to interrupt handler
  • Interrupt handler runs
  • Interrupt handler finishes, and returns to the original task

When an interrupt occurs, the processor looks at the Interrupt Vector Table (IVT) located at 0x0 to determine where it needs to jump to handle that interrupt. The function that handles an interrupt is known as an Interrupt Service Routine (ISR). Assuming there is a valid handler address in the IVT, the processor does a far call to the IVT and immediately continues execution. A 'bare bones' interrupt handler looks something like this:

previous_hook_offset: dw 0
previous_hook_segment: dw 0

hook:
	; When we come into an interrupt, only the
	; code segment and instruction pointer are preserved
	; for us. It's the responsibility of the handler to
	; preserve this information.

	; This is saved on the application's local stack, which is fine
	; for now (FreeDOS does the same thing internally) as long as
	; we're not putting any large items on it. We'll look at setting
	; up a local stack later.

	pushf ; Save flags
	pusha ; push all general registers to the stack

	; Setup segments
	push ds
	push es

	; For interrupt handlers, CS=DS normally, and SS either points at:
	; the application stack (aka, whatever was running before we were)
	; or at a local stack setup by the TSR.
	; On x86, it's not possible to directly copy from one segment register
	; to another, so we'll use AX as a scratch:
	mov ax, cs
	mov ds, ax
	mov es, ax

	; Let's add a "hello world" hook:

	; NOTE: Normally it's a bad idea to call DOS interrupts in a TSR
	; because DOS itself is not re-entrant. However, as in this example,
	; we've hooked the unused 0x66, which DOS does not call out of the box,
	; which means we'll never be in this ISR while we're in DOS.
	; If this were real code, we would have to check the INDOS flag for sanity.
	mov ah, 9
	mov dx, hello_world_str
	int 0x21

	; DOS compatability "quirk?". On DOSBox (which I initially tested this on)
	; there's a default entry in the IVT for all interrupts in F000:xxxx.
	; Documentation suggestions that this is also the default behavior
	; for MS-DOS though I can't confirm it.
	;
	; FreeDOS, on the other hand, leaves unused INTs initialized to 0000:0000
	; so blindly far calling it causes a fault. So we need to check if the
	; segment is 0000, and skip chaining if that's the case

	cmp word [previous_hook_segment], 0x0000
	je skip_chain

	; Chain to other TSRs
	pushf ; pushf is required because iret expects to pop flags
	call far [previous_hook_offset]

	skip_chain:

	; We're done, restore to previous state
	pop es
	pop ds
	popa
	popf

	; To return from an interrupt, we use the special iret instruction
	iret

Quite a bit of code for not doing much. As the code comments explain, the interrupt handler has to preserve any information in the registers it wants to use. For this example, we just save everything with a pushf instruction followed by pusha instruction, which puts the FLAGS register followed by all the general purpose registers (AX-DX, SI, DP, BP, SP) on the stack. Preserving flags in an ISR is extremely important since FLAGS is where things like comparison results are stored; if you corrupt FLAGS, it's completely possible that an application evaluates an "if" statement the wrong way and becomes a source of hard-to-impossible-to-find bugs.

ISRs are somewhat notorious in that they appear deceptively easy to code, and absolutely disastrous if you get it wrong. One of the major things to be aware of is that it's possible for an interrupt to be interrupted. For example, if your interrupt handler is running, and someone taps on the keyboard, the keyboard handler will preempt you. Depending on what you're doing, this might not be a problem, or it could "lock up" the computer. ISRs can turn interrupts on and off with the sti/cli instructions, but an all-too-common bug is forgetting to turn interrupts back on. Raymond Chen, a developer at Microsoft, wrote an entire chapter in his book "The Old New Thing" dedicated to the things that stupid applications do that Windows had to patch around — such as forgetting how to handle interrupts.

The second consequence is that ISRs should be reentrant. For those who are not hugely familiar with computer programming, reentrancy is the ability for a subroutine to be interrupted, then called again safely. For example, if you're listening to keyboard events, it's possible that two events can come at the same time and the second event preempts the first one. Bad Things(tm) happen if you have non-reentrant ISRs. The only reason this is a 'should' vs. a 'must' is that DOS itself is not reentrant; as the comment explains, you can't safely call a DOS interrupt from an ISR. DOS provides a special global flag known as INDOS to let callers know if it's safe to make an interrupt check; it was excluded above for brevity and because we used an unused interrupt.

The final common pitfall for DOS-based ISRs is it is possible for multiple TSRs to hook the same interrupt. For example, App A and App B can both decide they want the same interrupt. Depending on the application, it may chain interrupts down, or it may claim an interrupt entirely for itself. This can lead to infuriatingly complicated issues to debug if the other TSR is not well-behaved. Microsoft and IBM eventually provided built-in TSR multiplexing in DOS in the form of int 2F, but the API is extremely difficult to use and failed to solve many of the inherent issues.

The Stack and Calling Conventions

Let's take a momentary digression from TSRs to look at how functions work and how they interact with the stack. From an instruction perspective, Intel processors provide a "call" opcode which pushes the current instruction pointer to the stack, and then unconditionally jumps to a given location. It doesn't, however, define the behavior of how arguments are passed or the management of the stack. As such, developers have created conventions to specify how the stack and arguments should be passed from one function to another.

For non-programmers, the stack can be considered to be a "working space" where a program can store local variables and temporary information such as result values. In contrast to the heap, stacks are relatively small, and are essentially localized to a given function. For historical reasons, the stack grows 'down' from upper memory addresses to lower memory addresses. The stack pointer SP always points to the top of the stack. When information is pushed to the stack with a "push" operation, the value saved is stored in memory to the location pointed at SP and the register itself is decremented by the size. In contrast, deleting an item from the stack simply increments SP allowing new information to override the old.

For example, let's assume we have a C function with the following prototype:

// By default, most C compilers use the CDECL calling convention on x86
int example(int a, int b) {
	// We'll do stuff here
	return a+b;
}

Unlike most architectures, x86 defines multiple types of calling conventions. Of these, the most common are stdcall (used primarily by Windows), and cdecl (C Declaration). For the code I write, I'm sticking to the cdecl convention for my own sanity. cdecl is what's known as a "caller-based" convention, which means the calling function is responsible for cleaning up the stack at the end of a function. Here's what the calling code looks like in assembly:

example_call:
	mov ax, 4
	mov bx, 5

	; Arguments go in left to right
	push ax
	push bx

	; Under CDECL, names are decorated with a _ to indicate
	; they're a function, so example becomes _example
	call _example

	; Now we need to clean the stack up
	add sp, 4

	; The return value (9) comes back in ax
	; all other registers are smashed (aka their
	; values are not preserved into or out of the
	; function)

Fairly straight forward, right? Let's look at how this function might be implemented so we can discuss the base pointer (BP) as well. Here's what _example looks like:

_example:
	; Setup stack frame
	push bp
	mov bp, sp

	; The stack now has the following layout
	; bp[+2] stack frame
	; bp[+4] int a
	; bp[+6] int b

	; Move values from the stack to registers
	mov ax, [bp+4]
	mov bx, [bp+6]
	add ax, bx

	pop bp
	ret

BP, or the base pointer, can be considered a reference point for where each function begins and ends. Whenever we enter or leave a function, the base pointer forms the base of the stack for that function (hence the name). These reference points are known as stack frames, and since every function copies SP (which always points to the top of the stack) to BP, you can always tell where you are relative to other functions. Debuggers, for example, walk the stack to determine where they currently are by comparing the values of BP to known offsets.

Near and Far Calls

Before we leave the topic of calling conventions, the final point to bring up are near and far calls. In the previous article, I discussed that 16-bit processors can only reference up to 64 kilobytes of memory directly at any given time. As such, if you need to reference code or data outside that 64k window, you need to change the segment so it's pointing in the right location.

For functions, code that's within the same segment is known as a near call. Near calls are equivalent to normal function calls on most other architectures. Far calls in contrast include the required segment, and load CS as part of the function call. Far calls are made by the "call far" instruction, and require the called function to use the "retf" instruction to indicate they need to return far. Far calls have a fairly high performance hit due to the segment change, and thus should be limited as much as possible

In the previous interrupt handler example code, we saw that we had to do a far call to chain to previous TSRs. The reason for this is that interrupt service handling is essentially a special case of a far call; the processor has to change to the ISR's segment in memory. When we chain to another interrupt handler, we have to do the same thing. If you're still confused, the following example will clear things up.

TSRs In Action

So now that we have the basis of TSRs in our heads, let's look at how they're managed and installed by the operating system. To do that, we need an actual DOS installation. While TSRs do work in DOSbox, DOSbox has some unusual quirks with its environment that make it not 100% accurate to actual DOS (for example, all interrupts have an installed default handler; FreeDOS at least does not do this).

Installing FreeDOS

Fortunately for free software, FreeDOS exists which is a (mostly) compatible free software re-implementation of DOS 5. Installation is pretty much identical to what DOS 5 would have been like if it was shipped on a CD vs. floppy disks

*

The CD is bootable, and starting it up in VirtualBox brings up this boot menu.

*

The installer offers to start FDISK to create a boot partition. Users of MS-DOS FDISK should find this more or less identical to the standard FDISK.COM

* *

After which DOS installation takes a few minutes, and then promptly crashes. For reasons I can't figure out, the included JemmEx memory extender refuses to work under VirtualBox. Fortunately, EMM386 is happy to do the job, and after a quick reboot, I get dumped to C:\

After configuring WatTCP, and firing up the built in FTP server, I can copy my TSRs over without issue. Of course, given that DOS uses VESA graphics, I can't copy and paste. Fortunately for my sanity, FreeDOS (and MS DOS) support redirecting the terminal with the CTTY command. After a little bit of fiddling with VirtualBox's settings, I get this:

*

Copy and paste for the win. Anyway, now that we have a decent to use testing environment, let's get into the practical aspect of this.

DOS Memory Layout

After doing a clean reboot of the system, FreeDOS reports the following as its memory usage:

C:\>mem

Memory Type       Total       Used     Free
--------------- ---------  -------- --------
Conventional         639K       50K     589K
Upper                 36K       31K       5K
Reserved             349K      349K       0K
Extended (XMS)    31,680K    5,626K  26,054K
---------------- --------  -------- --------
Total memory      32,704K    6,056K  26,648K

Total under 1 MB     675K       81K     594K

Total Expanded (EMS) 31M (32,571,392 bytes)
Free Expanded (EMS)  25M (26,705,920 bytes)

Largest executable program size   589K (602,672 bytes)
Largest free upper memory block     4K ( 4,096 bytes)
FreeDOS is resident in the high memory area.
C:\>

Lots of numbers, right? We'll do a more in-depth article about the types of memory, but let's do a brief primer here so that the output can be understood. Let's break these down step by step

Conventional

Conventional memory is what applications in DOS generally have available and refers to the lower 640k of the 1 MiB address space. Anything operating in real mode has to fit in this memory area. FreeDOS reports a total of 639k because a very small chunk of RAM at 0x0000 has to be reserved for the processor's interrupt tables, as well as a small part of COMMAND.COM that has to stay resident at all times to aid things like LOADALL. On this specific system, I have a few TSRs already installed to provide network services which is why a 50k block of conventional memory is already used.

Upper/Reserved

Above the 640k line is what's referred to as the "upper memory area", or UMA and is reserved by DOS. The upper memory area also has things like the monochrome and VGA memory buffers, as well as option ROMs, the DOS kernel, and the BIOS shadow map. Normally, this region of memory shouldn't be used by applications, but due to the fact that conventional memory can get very crowded, on most systems there are small but usable sections of memory in these areas, known as UMA blocks. A memory manager can determine which blocks are safe to use, and load applications or data into these chunks, a process known as "loading high". When we get into hiding our TSR, use of upper memory will become very important

Extended Memory

Memory that exists above 1M+64k (that 64k is special, see below), and cannot be directly accessed by real mode. Because neither DOS nor the BIOS can operate in 32-bit/protected mode, and that the 80286 processor could not easily switch from protected mode to real mode, accessing memory above the 1 MiB barrier required various amounts of trickery. Extended memory can extend up from 1 MiB to 4 GiB (which is the architectural limit of 32-bit processors). Accessing extended memory either requires entering protected mode, tricking the processor into unreal mode (which on the 80286 required the LOADALL instruction to put the processor in an invalid state), or using a BIOS service which did one of the previous two options to exchange blocks with conventional memory.

High Memory Area (not shown)

One important line to look at is "FreeDOS is resident in the high memory area." I've stated multiple times that 1 MiB is the limit of what Intel processors can address. As it turns out, this is only a partial truth. Remember that addressing in real mode is done in the form of segment:offset. So what happens if I load a segment value of FFFF?. Well it turns out we can address an additional 64 kilobytes of RAM beyond the 1 MiB barrier. This is known as the high memory area.

Due to many quirks related to the abomination known as A20 (which will get an entire section in the next article), the high memory area requires special rules and methods to access. The short version is that unless you have a memory manager, or are willing to manipulate the A20 line directly (which is dangerous), the HMA is not usable by general applications. We'll look more at this in a future article.

TSR Loading

So with that all out of the way, let's look at how a TSR is loaded. In the github repository, there's an example TSR known as tsr_example which, when loading, prints out the segment registers and the segment:offset of the next hook in memory. It's combined with a "callhook" program that simply runs int 0x66 to invoke it. So let's load it and see what happens:

C:\> tsr_demo
DOS loaded the COM with this:
CS: 0C9C
DS: 0C9C
SS: 0C9C
 

When our TSR is loaded, it reads and dumps out the segment registers, showing DOS loaded us at 0C9C. For COM files (or any executable that is 'tiny'), CS=DS=SS. When DOS loads a COM executable, the entire thing is copied into memory, CS/DS/SS are set to the execution point, and the process far calls to CS:0100 to begin execution. If we check our memory usage, we can see that it has dropped:

C:\>mem

Memory Type         Total     Used     Free
---------------- -------- -------- --------
Conventional         639K     115K     524K

NOTE: It shouldn't be using 50 kiB of RAM per run; the binary is only 324 bytes! I think I'm calculating the paragraphs-to-preserve number wrong, but I didn't get a chance to fix it by time this article went up. If someone wants to look at the code, check tsr_examine.asm; the TSR int call is at the very bottom of the file and based off example code I found elsewhere.

If we run callhook, we can determine that our TSR in fact installed successfully, and the previous hook is at 0000:0000 (which is skipped over).

C:\>callhook
CS: 0C9C
DS: 0C9C
SS: 1CB6
Previous hook is at 0000:0000

Note that SS is different. When a TSR is invoked (in this case by doing int 0x66), it inherits the running state of whatever application that was running at the time. It's the responsibility of the TSR to put the stack back the way it found it when it exits, else you'll cause random corruption in userspace applications.

Now lets look at see what happens if we invoke our TSR multiple times:

C:\>tsr_demo
DOS loaded the COM with this:
CS: 1CB6
DS: 1CB6
SS: 1CB6
C:\>tsr_demo
DOS loaded the COM with this:
CS: 2CD0
DS: 2CD0
SS: 2CD0
C:\>tsr_demo
DOS loaded the COM with this:
CS: 3CEA
DS: 3CEA
SS: 3CEA

With each load, we're loading higher in memory. DOS does not automatically rebase or relocate TSRs; they stay at whatever memory segment they were in when they terminated. As DOS automatically loads COM files as low as possible, each run is loaded at the next "available" section of RAM. Calling mem shows that our available conventional memory has dropped

C:\>mem

Memory Type Total Used Free
---------------- -------- -------- --------
Conventional 639K 308K 331K

So what now happens if we run callhook?

C:\>callhook
CS: 3CEA
DS: 3CEA
SS: 4D04
Previous hook is at 2CD0:0103
CS: 2CD0
DS: 2CD0
SS: 4D04
Previous hook is at 1CB6:0103
CS: 1CB6
DS: 1CB6
SS: 4D04
Previous hook is at 0C9C:0103
CS: 0C9C
DS: 0C9C
SS: 4D04
Previous hook is at 0000:0000

We chain through each version of the TSR, easily visible by CS/DS changing as we go upwards until we reach the 'stop' at 0000:0000. At this point, I think we have a fairly good grasp on how TSRs work in practice, what DOS gives us, and how interrupts work more in-depth. At this point, this article is already past the 4k word mark, so I'm going to cut this off here before the editors stage a revolution. So let me close this off with the fact I need some help with the community.

Help Wanted

As I mentioned in Part 1, for getting the keylogged data out of the system, I'm interested in using a non-TCP/IP based protocol. Up until the mid-90s, IPX and NetBIOS-only networks were still relatively common, and it wasn't until the domination of the 'modern' internet that TCP/IP became ubiquitous. After considerable amounts of research, I've decided that fitting in the theme of 'unusual yet neat', I'd like to extract the data out using AX.25 and ham radio equipment. The other alternative I may do is using IPX, as I found the original DOOM source code actually has a complete IPX driver on it. As of right now, I'm somewhat torn between doing this with AX.25 or IPX. The thing is though, I'm going to need some help to make AX.25-based keylogger a reality.

The use of standard radio would allow the keylogger to work on air-gapped computers and show how a potential exfiltration of data might have been done in environments predating TCP/IP. It would be fairly easy to modify a standard PC to hide a 2m or 70cm transmitter within the case and connect to it via I/O lines in an early form of the NSA's current Tailored Access Operations. It would also mean that the keylogger itself would be fairly useless for use in real-life which aids the goal of preventing proliferation of attack tools.

The problem is right now, I have a serious lack of equipment. While I'm a licensed ham in the United States (KD2JRT/Technician), the only equipment I have are two Baofeng UV-82s. What I need is to figure out a decent way to handle getting data broadcasted. I know it's at least theoretically possible to build a cable to hook the Baofengs up to a computer's mic/sound in, and use a software TNC (Terminal-Node Connector) to do AX.25. By doing so, I could simply connect the TNC to VirtualBox's serial port emulation, and “blamo”, AX.25 for DOS.

What I need from the community is two-fold:

  • Experience with doing AX.25 data in real life
  • Help building the necessary cables *or* loaning radio equipment with hardware TNCs on it

I'm currently in New York City for the foreseeable future. I could potentially build cables for my Baofengs myself but I don't currently have a soldering iron, and my living situation makes it rather difficult to do electronics work here. Depending on pricing, I can probably cover shipping and handling, or compensate out-of-pocket work done by a community member. If you're interested in helping, post a comment or send me an email (mcasadevall@soylentnews.org), and I'll be in touch.

Finally, if you've enjoyed this article, please consider subscribing or gifting a subscription. No account is required, as you can anonymously gift subscriptions to my alt-account, mcasadevall (6). I'm hoping we can raise enough money to fully pay off the stakeholders of the site, and perhaps get a small budget together to let me dedicate more time to content like this, or buy equipment to explore more obscure pieces of hardware (i.e., digging into doing some INIT coding on classic Mac OS, or something of that nature). I'd like to give thanks to all those subscribed, including jimtheowl after the previous article.

And with that, 73 de NCommander!

Related Stories

FreeDOS Turns 25 Years Old 15 comments

Last week, FreeDOS turned 25 years old. FreeDOS is a complete, Free Software Disk Operating System (DOS) and a drop-in replacement for MS-DOS which has disappeared long ago. It is still used in certain niche cases such as playing legacy games, running legacy software, or certain embedded systems. Back in the day, it was also quite useful for updating BIOS.

Of those that will be, are, or have been using it, what tasks has it been good for?

Also, at:
The Linux Journal : FreeDOS's Linux Roots
OpenSource.com : FreeDOS turns 25 years old: An origin story
OS News : FreeDOS’s Linux roots
Lilliputing : FreeDOS turns 25 (open source, DOS-compatible operating system)

Earlier on SN:
Jim Hall on FreeDOS and the Upcoming 1.2 Release (2016)
Retro-Malware: DOS TSRs, Interrupt Handlers, and Far Calls, Part 2 (2016)
Retro-Malware: Writing A Keylogger for DOS, Part 1 (2016)


Original Submission

Retro-Malware: Writing A Keylogger for DOS, Part 1 97 comments

I've made no secret that I'd like to bring original content to SoylentNews, and recently polled the community on their feelings for crowdfunding articles. The overall response was somewhat lukewarm mostly on dividing where money and paying authors. As such, taking that into account, I decided to write a series of articles for SN in an attempt to drive more subscriptions and readers to the site, and to scratch a personal itch on doing a retro-computing project. The question then became: What to write?

As part of a conversation on IRC, part of me wondered what a modern day keylogger would have looked running on DOS. In the world of 2016, its no secret that various three letter agencies engage in mass surveillance and cyberwarfare. A keylogger would be part of any basic set of attack tools. The question is what would a potential attack tool have looked like if it was written during the 1980s. Back in 1980, the world was a very different place both from a networking and programming perspective.

For example, in 1988 (the year I was born), the IBM PC/XT and AT would have been a relatively common fixture, and the PS/2 only recently released. Most of the personal computing market ran some version of DOS, networking (which was rare) frequently took the form of Token Ring or ARCNet equipment. Further up the stack, TCP/IP competed with IPX, NetBIOS, and several other protocols for dominance. From the programming side, coding for DOS is very different that any modern platform as you had to deal with Intel's segmented architecture, and interacting directly with both the BIOS, and hardware. As such its an interesting look at how technology has evolved since.

Now obviously, I don't want to release a ready-made attack tool to be abused for the masses especially since DOS is still frequently used in embedded and industry roles. As such, I'm going to target a non-IP based protocol for logging both to explore these technologies, while simultaneously making it as useless as possible. To the extent possible, I will try and keep everything accessible to non-programmers, but this isn't intended as a tutorial for real mode programming. As such I'm not going to go super in-depth in places, but will try to link relevant information. If anyone is confused, post a comment, and I'll answer questions or edit these articles as they go live.

More past the break ...

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by turgid on Tuesday September 20 2016, @01:15PM

    by turgid (4318) Subscriber Badge on Tuesday September 20 2016, @01:15PM (#404213) Journal

    About a million years ago when I was 16 I got a summer holiday job working on a DOS TSR (written in 8086 of course) and this was enough to convince me that commercial computers, the people that designed them and the software written for them were completely insane and I decided not to have anything more to do with it. Then I went to university and discovered Unix for myself.

    • (Score: 2) by NCommander on Tuesday September 20 2016, @01:41PM

      by NCommander (2) Subscriber Badge <mcasadevall@soylentnews.org> on Tuesday September 20 2016, @01:41PM (#404221) Homepage Journal

      Honestly, not a lot has changed under the hood. Even "big metal" machines of the era running on a VAX or perhaps DEC Alpha had similar code paths like this. For example, the Alpha had a two ring security mode, supervisor and user which could only be switched to via interrupts which required ISRs like this. Anything that has to be available to all applications either has to be mapped into memory via a dynamic loader or exist as part of kernel space. Any operating system that keeps a part of its resident is essentially an extension of the TSR programming model; UEFI in firmware does something very similar for services and loads although it supports rebasing.

      What makes the 8086 'special' in this regard is that since it's segmented, you have to change CS when you do a context switch and the only sane way to do that is via a far call or interrupt handling. Interrupts are preferred unless you know exactly where something is in memory at any given time. In constrast, the original Mac was single-tasking, but supported small desktop accessories that could run side by side with an application. These DAs were basically application code linked into the kernel (in Mac terms, they existed in a CODE resource in the data fork), and operated essentially identical to TSRs including having to mroe or less be coded in 680x0 assembly.

      --
      Still always moving
    • (Score: 0) by Anonymous Coward on Tuesday September 20 2016, @02:03PM

      by Anonymous Coward on Tuesday September 20 2016, @02:03PM (#404228)

      Stopgap measures such as bank-switched memory, overlay compilers, and cooperative multitasking are commonly "rediscovered" on new hardware platforms on the way to preemptive multitasking operating systems with page-switched virtual memory backed by hardware protection. The IBM PC was not only not an exception, it was the paragon.

      Yup, it's a lot easier to sit back and wait until an immature technology stabilizes. You could say the same about IoT and cloud today.

      • (Score: 3, Informative) by NCommander on Tuesday September 20 2016, @02:19PM

        by NCommander (2) Subscriber Badge <mcasadevall@soylentnews.org> on Tuesday September 20 2016, @02:19PM (#404241) Homepage Journal

        Bank switching (or EMS in PC terms) more or less seems dead as a technology. Most recent microcontrollers I know seem to have standardized on a Cortex-M profile part, or a MIPS processor which have a 32-bit bus.

        Cooperative multitasking on the other hand continues to live in embedded spaces and other random areas mostly due to the fact that it can ensure real-time performance since a process doesn't have to worry about being pre-empted at a bad time, nor does it have to yield unexpectedly. It also means the scheduler can be both simple and predictable, and thus context switches are much faster.

        --
        Still always moving
        • (Score: 2) by LoRdTAW on Tuesday September 20 2016, @02:53PM

          by LoRdTAW (3755) on Tuesday September 20 2016, @02:53PM (#404259) Journal

          Bank switching was very useful on the old 8 bit systems which were limited by 16 bits of address. With x86 and flat 32bit memory spaces of ARM, It's not really useful as you have 4 or more GB of address space.

          The more interesting bank switching setup I seen was in a JK laser with a single 256k EPROM bank switched through an old Altera PLD with a 6809 CPU. Just 8k of RAM. Most of the extra space was for messages in multiple languages.

          • (Score: 1, Funny) by Anonymous Coward on Tuesday September 20 2016, @03:42PM

            by Anonymous Coward on Tuesday September 20 2016, @03:42PM (#404280)

            google "Intel PAE". This stuff is like Polio and Smallpox. Just when you think the world has seen its last of 'em...

            • (Score: 0) by Anonymous Coward on Wednesday September 21 2016, @01:36AM

              by Anonymous Coward on Wednesday September 21 2016, @01:36AM (#404613)

              Required for virtualization on the RPi 2/3 at least and probably any 32 bit arm processor.

              I imagine similiar is used for aarch64.

              As to 36bit PAE on Intel: The fucked up part is that they wasted all those other address bits (36 bit PAE used a 48 bit address space. And even x86_64 wastes everything past the 52nd or 53rd address bit for things other than addressing more memory!) The marketing names of addressing ranges rarely tell you their actual capabilties or utility. That said, PAE on intel provided an important interim solution before the jump to 64 bit, and during an era where the only systems using it would have been multinode NUMA boxes with a maximum of 4-8 gig to a cpu cluster (of 1-4 chips. 440fx/450gx boards usually maxxed out at 1 gigabyte whether 72 pin SIMMs or 168 pin DIMMs. The AMI bios available at the time only had digits for up to 8 gigs of RAM, and outside of clusters the standard RAM boards were only designed for 1 gig for 440FX (sharing that limitation with the later BX) or 8 gig for the 450GX (which practically speaking never had near that. The stock memory boards for revision 1 systems (which have a LOT of trace wires added for early design flaws) only had 1 gigabyte of commonly available slots on them (assuming 32 meg unregistered ecc/parity simms), with a maximum of 2-4 gigs if they supported addressing modes for registered or denser memory (While I am sure the 450 supported registered memory, I have no idea what the real supported memory for that generation was. The 440fx in comparison was limited to 8x128 if you could find 128M EDO dimms.

  • (Score: 2) by number6 on Tuesday September 20 2016, @02:01PM

    by number6 (1831) on Tuesday September 20 2016, @02:01PM (#404225) Journal

    Are we allowed to post images anywhere on this site, or is this article an exception?

    I have a couple of articles I'd like to post to my journal which require images to be viewed for a complete understanding of the subject.
    It would be nice if I could include image thumbs in such articles.

    • (Score: 3, Informative) by NCommander on Tuesday September 20 2016, @02:12PM

      by NCommander (2) Subscriber Badge <mcasadevall@soylentnews.org> on Tuesday September 20 2016, @02:12PM (#404236) Homepage Journal

      Images are limited to actual articles. For submissions, if there's a link in the article, an editor may or may not inline it like above; we generally don't do so though as a rule of thumb. We could theoretically enable it for journal content though the site CSS probably will have to be hit with a hammer to prevent a large image from breaking the layout.

      --
      Still always moving
      • (Score: 0) by Anonymous Coward on Tuesday September 20 2016, @03:25PM

        by Anonymous Coward on Tuesday September 20 2016, @03:25PM (#404273)

        .whatever img { max-width: 100%; } is going to be enough for everybody. Hopefully.

        • (Score: 2) by JNCF on Tuesday September 20 2016, @05:51PM

          by JNCF (4317) on Tuesday September 20 2016, @05:51PM (#404364) Journal

          .whatever img { max-width: 100%; }

          This would have the funny effect of making images get smaller the more you zoom in. The left and right sidebars are sized in pixels (not real screen pixels, just an arbitrary unit), and thus get smaller as you zoom out. This leaves more room for the main content to grow, making its width in pixels larger, making a 100% width image in the main content larger. Zoom in and the image gets smaller and harder to see.

      • (Score: 2) by JNCF on Tuesday September 20 2016, @05:38PM

        by JNCF (4317) on Tuesday September 20 2016, @05:38PM (#404352) Journal

        If an image from a submitted article was embedded would SoylentNews host the image as with the original content above, or would an externally hosted image be embedded?

        • (Score: 4, Informative) by NCommander on Tuesday September 20 2016, @05:48PM

          by NCommander (2) Subscriber Badge <mcasadevall@soylentnews.org> on Tuesday September 20 2016, @05:48PM (#404359) Homepage Journal

          It would very much be a case-by-case thing, depending on the license of the image in question, etc.

          Rehosting stuff opens a legal can of worms, and fair use is much harder to justify when you post something verbium. We can quote excerpts under fair use without issue and summarize. I don't really want to have to deal with people sending use DMCA requests.

          --
          Still always moving
          • (Score: 2) by JNCF on Tuesday September 20 2016, @05:58PM

            by JNCF (4317) on Tuesday September 20 2016, @05:58PM (#404370) Journal

            I totally feel you. I also see potential concerns about third parties knowing when a page is visited based on requests for embedded images. Not that my opinion matters, but I would personally prefer images deemed unhostable to simply be linked to. Do you think you would ever put an externally hosted image above the fold, and thus on the main page?

    • (Score: 4, Informative) by janrinok on Tuesday September 20 2016, @02:24PM

      by janrinok (52) Subscriber Badge on Tuesday September 20 2016, @02:24PM (#404244) Journal

      We do not encourage images in submissions because they eat into data caps for those that have them. However, having a link inside a story is fine as long as it is obvious that the link leads to an image i.e. [Image [img.sur.ly]]. That places control back into the hand of the reader who can elect to view or ignore the image. Some of the images that go with the stories published are several megabytes in size, hence another reason why we provide a link to the original source.

      I'm not sure how much of an issue data caps are for those who follow this site on their mobile devices. Our intention has always been to keep the site simple with the minimum of data transfer. We could easily go all AJAX'y or fill each story with images, but I'm fairly sure that is not what the community wants.

      However, this original content story needs images to make the material understandable, and NCommander has always been a law unto himself. :-)

      • (Score: 3, Informative) by takyon on Tuesday September 20 2016, @03:52PM

        by takyon (881) <takyonNO@SPAMsoylentnews.org> on Tuesday September 20 2016, @03:52PM (#404286) Journal

        However, this original content story needs images to make the material understandable, and NCommander has always been a law unto himself. :-)

        Also, these images are highly compressible PNGs ranging from about 8 to 39 KB.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by JNCF on Tuesday September 20 2016, @07:21PM

        by JNCF (4317) on Tuesday September 20 2016, @07:21PM (#404422) Journal

        However, this original content story needs images to make the material understandable,

        Now that you mention it, those images are all of text in terminals. There are ways to turn terminal output into markup with inline styles that will display fine in most browsers, and even the browsers that botched part of the styling would still show something that was pretty readable. Colors would be fucked up in some, text width in others. There's no reason this particular content couldn't have been conveyed with straight markup. Not complaining about the decision to embed images, just thinking about alternatives.

  • (Score: 0) by Anonymous Coward on Tuesday September 20 2016, @03:02PM

    by Anonymous Coward on Tuesday September 20 2016, @03:02PM (#404263)

    Ah, the glorious days of keyrus.com…

    • (Score: 2) by NCommander on Tuesday September 20 2016, @03:04PM

      by NCommander (2) Subscriber Badge <mcasadevall@soylentnews.org> on Tuesday September 20 2016, @03:04PM (#404265) Homepage Journal

      I assume you're referring to KeyRus key mapping for DOS [wikipedia.org]

      --
      Still always moving
      • (Score: 0) by Anonymous Coward on Tuesday September 20 2016, @03:29PM

        by Anonymous Coward on Tuesday September 20 2016, @03:29PM (#404274)

        Indeed. I used dosbox from something some time ago and suddenly remembered that I need it to be able to use cyrillic characters. It was still googleable at the time, thankfully.

  • (Score: 3, Informative) by bzipitidoo on Tuesday September 20 2016, @03:21PM

    by bzipitidoo (4388) on Tuesday September 20 2016, @03:21PM (#404270) Journal

    DOS was such a "git r done" environment. Documentation? When I wrote an app for DOS in assembly language, I couldn't find much documentation on how to allocate memory. There were a few interrupts that sounded like they might be what I was looking for, but they didn't quite fit. I finally concluded that the program could just use memory without asking or notifying DOS in any way. Had to be careful not to trample on anything actually being used, so to be super safe, might restrict the program to only 384K. There had to be something, some way to know what the limit was, but I never found it. As I recall, a ".COM" executable program (stands for "command", not to be confused with .com for "commercial" top level domain, that's totally different) was confined to one 64k segment, while a ".EXE" program had the full 640K minus whatever DOS and TSRs were taking up. Worked out to about 620K at best.

    It was an art to maximize free memory below 640K, "low memory". You'd have DOS load itself into "high memory" (the area between 640K and 1M), then experiment to find the best order to "loadhigh" TSR things like a mouse driver, sound card driver, ANSI (if you wanted it), EMM386, and so on. (I'm talking here about computers that had more than 1M RAM, typically 386s and later, and not only 1M RAM as many 286s had.) Some of that high memory wouldn't be available because it was mapped to the graphics, and the missing parts were most inconveniently in the middle. The first exercise was a bin packing kind of problem, figuring out how to pack the most TSRs of differing sizes in each free area which were of course different sizes themselves. Fairly standard memory allocation issues that are a core function of modern OSes. But DOS is primitive, and has very little to no memory management. Most TSRs needed more memory to initialize than to run, that's why the order was so important. Have the biggest pigs load first when lots of high memory was still free, then cram in the smaller TSRs last. If you got everything to fit, you'd have about 620K free in low memory. Things that didn't fit would be automatically loaded into low memory. A bad job of loading high could easily result in less then 580K low memory free, and then you'd have problems with many applications unable to run, demanding at least 600K.

    I have kept my last computer, a late 90s Pentium II, that had a working DOS with compatible sound and graphics hardware, and 5.25" and 3.5" floppy drives.

    • (Score: 3, Informative) by NCommander on Tuesday September 20 2016, @03:49PM

      by NCommander (2) Subscriber Badge <mcasadevall@soylentnews.org> on Tuesday September 20 2016, @03:49PM (#404284) Homepage Journal

      Memory allocation on DOS was a very strange beast, and I'm honestly surprised their weren't more compatibility issues because of it; I won't be surprised if a lot of apps did what you just said. Google has very little documentation on what you're supposed to do, and most C environments provided malloc() or an interface to the XMS driver.

      Basically, what you're supposed to do depends if you need a conventional memory allocation, or an XMS allocation. For conventional allocations, you need to call int 21h, AH=48h, which took the number of paragraphs you wanted as a block, and returned a segment base pointer to said block. You could then take that pointer, put it in ES, and use that memory to your heart's content. XMS could also do conventional allocations for you since conventional memory was aliased to handle 0.

      --
      Still always moving
    • (Score: 2) by maxwell demon on Tuesday September 20 2016, @10:08PM

      by maxwell demon (1608) Subscriber Badge on Tuesday September 20 2016, @10:08PM (#404515) Journal

      Speaking of DOS documentation, Ralph Brown's Interrupt List [ctyme.com] is an invaluable source of information.

      Actually I'm amazed that it is still available.

      --
      The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 2, Informative) by Kharos on Tuesday September 20 2016, @04:42PM

    by Kharos (6354) on Tuesday September 20 2016, @04:42PM (#404312)

    I wrote some TSRs on MSDOS 3.2 back when I was a teenager (the memories...).

    1. Iirc (need to double check) you do not need to push the flags at the beginning. The system pushed the flags and the return address before calling your handler, and the iret at the end of the handler will restore the address and the flags. So pushing/popping them again is redundant.

    2. Do not push all registers with pusha, it uses a lot of stack (16 bytes) and may overflow (actually underflow, as SP counts down) the stack of the application you are interrupting if this application was compiled with a very tight stack. Instead push only what you are going to modify. You will only call int 21 ah=9 in your handler, so you should only need to modify AX, DX and DS (no need to set up ES for that call). So just push/pop these.

    3. Tailcall the previous handler instead of doing the push-flags-far-call dance. In other words, your end routine should be: Pop the registers (see 2); if previous_hook_offset is 0x0, do an iret; otherwise, do a far jump (not call!) to previous_hook_offset. Not only is this faster and looks cleaner in the code, it also (again) minimizes stack usage, reducing the chance that you kill the stack of the interrupted application.

    • (Score: 2, Informative) by Kharos on Tuesday September 20 2016, @04:59PM

      by Kharos (6354) on Tuesday September 20 2016, @04:59PM (#404325)

      Sorry for the double post. Here is my limited understanding about the TSR int call from tsr_example.asc:

              mov dx, begin // begin = 0x0100
              sub dx, entry // entry > 0x100, so this sub underflows and the result will be big. did you mean to switch around entry and begin?
              shr dx, 4 // divide by 16 to get the number of paragraphs, rounded down
              add dx, 17 // ok, now you have definitively lost me :(
              mov ax, 0x3100
              int 0x21

      It is a very long time since I did this, but my first guess would have been

              mov dx, entry // we need everything before entry
              add dx, 15 // add 15 to make sure that
              shr dx, 4 // the divide-by-16 always rounds up
              mov ax, 0x3100
              int 0x21

      But I don't remember the details, e.g. if the memory is including the 0x100 bytes reserved in front, etc.

      Also you need to debug you program if it does not work. My code never worked without debugging, either. If you don't have anything else available, learn how to use debug.exe.

      • (Score: 3, Interesting) by NCommander on Tuesday September 20 2016, @05:22PM

        by NCommander (2) Subscriber Badge <mcasadevall@soylentnews.org> on Tuesday September 20 2016, @05:22PM (#404340) Homepage Journal

        Replying to both your posts at once:

        1. I'm going to have to pull the Intel reference manual to check the behavior of the flags register on interrupt entry. My memory tells me (and example code within the New Old Thing) that flags is not automatically preserved on entry into interrupt handler, though the counter example is then I shouldn't need to pushf before chaining. I'll pull out the dusty tombs of lore to check later tonight. FreeDOS [github.com] does an explicate pushf upon entering a kernel handler (though it clobbers the segment return).

        2. Also correct. You only need to push/pop what you use. Great for saving memory, horrid for readability. What I should do is setup a temporary stack somewhere in resident memory and push/pop to that. The 100% absolutely correct way to do it is move ax to a known static location, copy CS=SS, then point SP at the right offset. It was excluded for brevity. The example code I have here actually does need to push all 8 GP registers since I use all of them in calculating offsets and converting them to hex (with the possible exception of SI). That being said, I rather be safe and give up four bytes instead of accidentally pushing something I don't need.

        3. It's a valid approach, but sometimes you want to do work after calling other TSRs, in which case you need to do a far call vs. a far jump to return. The canonical example of this is if you were to hook a 21h service, and change the data after DOS is done with it (i.e., changing the returned values, etc). Of course, you need a check to make sure you're not recursing on yourself when you do so.

        I do know how to use DEBUG, but the bug wasn't noticed until the article was mostly drafted and ready for posting, and I've been having a lack of RL time so I just noted it and went to other things with the hope I'd fix it before it was posted. I just cheesed out at the last moment. Bad NCommander; no biscuit.

        --
        Still always moving
        • (Score: 0) by Anonymous Coward on Tuesday September 20 2016, @09:11PM

          by Anonymous Coward on Tuesday September 20 2016, @09:11PM (#404485)

          If you decide to go the LeanPub route, better make sure the code runs. People who actually pay for the goods can be unforgiving.

  • (Score: 0) by Anonymous Coward on Tuesday September 20 2016, @06:09PM

    by Anonymous Coward on Tuesday September 20 2016, @06:09PM (#404376)

    No, not a Hamming window, DSP nerds. For your Ham/AX.25, do you just need a modem? You know, data over audio signal paths stuff. You would need to do some hardware adaptation, but not much, as the phone lines are 600 Ohm balanced (I think...) And learn about and do AT commands...

  • (Score: 2) by iWantToKeepAnon on Tuesday September 20 2016, @09:34PM

    by iWantToKeepAnon (686) Subscriber Badge on Tuesday September 20 2016, @09:34PM (#404506) Homepage Journal

    mov dx, begin sub dx, entry

    So "begin" is a small number (0x100) and "entry" is bigger than that; so does this reserve negative memory? Memory values are unsigned so this reserves more than you expect and that's why "It shouldn't be using 50 kiB of RAM per run; the binary is only 324 bytes!"?

    --
    "Happy families are all alike; every unhappy family is unhappy in its own way." -- Anna Karenina by Leo Tolstoy
  • (Score: 0) by Anonymous Coward on Wednesday September 21 2016, @12:48AM

    by Anonymous Coward on Wednesday September 21 2016, @12:48AM (#404597)

    Go with IPX. While AX.25 would be neat, it was not something you would normally find attached to a network, so a bit implausible for exfiltration.

    There are open source ODI(and I think a few other format) drivers available for rtl8139/8169/eepro100 if you look around (I would provide links, but it has been a few months and countless bookmarks since I have done so.) Besides that you should only need one or two novell files (assuming freedos doesn't have a reimplementation) to fire up IPX on dos. If you want to be really serious about it, there is NCP server/client apps for linux you could use to emulate the fileserver you are sneaking data to/from. I don't think it was in the released doom source code, but if you wanted to be extra hardcore, while you have the ipx networking set up, try firing up doom with triplehead support. It allowed left/center/right networked mode in place of 4 player multiplayer. The support was dropped by Doom 1.8, and I believe the code was not included in the source release that happened a few years later. Truly an impressive feature for its time (I don't think it was the first example of such. But outside sim software it was probably the first mainstream game/application to include free networked multiview for a dos based system.) I only managed to try it once with a 486 and Pentium (or 386sx and 486dx, which I definitely and slowly played coop on!) but it had syncing issues due to the speed of the systems, and I think had syncing issues over the course of a session which I believe was why it was eventually removed.)