Stories
Slash Boxes
Comments

SoylentNews is people

posted by NCommander on Monday March 06 2017, @05:00PM   Printer-friendly
from the adventures-in-data-recovery-and-30-year-old-bugs dept.

One of my favorite hobbies is both retrocomputing projects, and the slightly more arcane field of software archeology; the process of restoring and understanding lost or damaged pieces of history. Quite a while ago, I came across the excellent OS/2 Museum, run by Michal Necasek which helps categorize many of the more obscure bits of PC history, including a series of articles about Xenix, Microsoft’s version of SVR2 UNIX.

What caught my attention were two articles talking about Xenix 386 2.2.3c, a virtually undocumented release that isn’t mentioned in much if any of the Santa Cruz Operation's (SCO, but see footnote) surviving literature of the time. Michal documented [1], [2] his efforts at the time, and ultimately concluded that the media was badly corrupted. Not knowing when to give up, I decided to give it a try and see if anything could be salvaged. As of this writing, and working with Michal, we’ve managed to achieve approximately 98% restoration of the product as it would have existed at the time.

Xenix 386 booted with uname

I’m going to write up the rather long and interesting quest of rebuilding this piece of history. I apologize in advance about the images in this article, but we only recently got serial functionality working again, and even then, early boot and installation has to be done over the console.

* - SCO in this case refers to the original Santa Cruz Operation, and not the later SCO Group who bought the name and started the SCO/Linux lawsuits.

Read more past the fold.

Historical Background

From a historical perspective, Xenix is interesting as it was one of the first (if not the first) operating systems to take advantage of Protected Mode on the iAPX 80286 without being hamstrung by lack of backwards compatibility. I’ve talked about the 286 before on SoylentNews, but to summarize, the 80286 was the first processor with Protected Mode. However, it didn’t support paging, and the switch from real mode (8086 compatibility) to protected mode was one way; there was no official way to return to real mode without restarting the processor, and neither DOS, nor BIOS could operate in Protected Mode. To my knowledge, it was the only operating system to adopt the view that a system would enter protected mode, and never return to 16-bit compatibility. As such, it’s implementation of protected mode is somewhat different than most people are familiar with.

Instead, the 80286 was intended to allow running legacy DOS applications in real mode, while people would upgrade to new protected mode operating systems and software. The much loathed real mode segmentation system was revamped as well due to the new 32-bit register size, and it was now possible to have segments up to 16 MiB (a tremendous amount of memory at the time) in size, allowing applications to operate with a de-facto flat memory model.

Correction: Wow, I went wrong here. The 80286's protected mode allowed segments to reside in a 24-bit address space, but were limited to 16-bits (64k) in length. The 80386 changed the rules to allow larger segment size by using additional fields on the LDT and GDT to extend the base, limit and a size modifer.

Additionally, Xenix was one of the most polished, and featureful UNIX systems of it's time. Out of the box, the system was the originator of virtual terminals, and supported both UUCP networking, and RS-232 serial based MicNet, and bridging between the two. MicNet appears to have been Microsoft's answer to AppleTalk as a very low cost networking solution, and allowed multiple systems to appear as one single UUCP node on the bang path. We'll explore both these features in later articles.

For software installation, Xenix's "custom" utility provided full featured package management, installation and removal, and even allowed per-file selection, relatively on par with modern Linux package management. Beside the stock operating system, Xenix had official add-on packages for international support, K&R based C compilers for DOS and Xenix, and a text processing system based on AT&T's troff. Third party solutions provided STREAMS and TCP/IP support before these features were added in Xenix 2.3.

System administration utilities for the most part were interactive, and easy to use, allowing for quick and easy setup of networking, printers, and user administration, and the system could dual boot with DOS. Combined with the visual shell, it's likely one of the best experiences you could get on a UNIX system of the era, and in many ways, still holds up today, nearly 30 years later. Microsoft was pushing Xenix heavily, and for a time, it was intended as the true replacement to the 16-bit DOS. However, fate intervened.

In 1984, following the break-up of Bell System (https://en.wikipedia.org/wiki/Breakup_of_the_Bell_System) into the baby bells, AT&T decided to enter the computer market and directly sell UNIX System V. Microsoft decided that they didn’t want to compete against AT&T, and began to collaborate with IBM to create what would become known as OS/2. In 1987, Microsoft transferred ownership of Xenix to the Santa Cruz Operation, and SCO began porting the operating system to take advantage of the 80386 and creating Xenix 386.

The most commonly known release of Xenix 386 is the 2.3, supported alongside the earlier Xenix 286 2.2 releases, and SCO’s Support Level Supplements simultaneously supported both releases. The SLS index only shows a single update for “Xenix 386 2.2.1-2.2.3” for UUCP, but an examination of that update shows that this appears to be a mislabeling, as the binaries it contained target the 286.

So, what exactly is this unusual 2.2.3c release then? To find that out, I needed to get the thing running.

Stumbling Towards Boot

The images floating around on the internet come in two forms, a set of TeleDisk TD0 images, and a group of raw 720 kilobyte raw images, suitable for use in a VM (or with dd). Much later in our recovery effort, we eventually determined that the TD0 images were the originals, and the raw images were later created from these.

Initially though, I just wanted to get it to start. The image files contained six operating system (known as N1-6) disks, “Basic Utilities” (B1-2) disks, “Extended Utilities” (X1-5) disks, three International Supplement disks, and a single games disk. An initial examination of the disks showed that N1 and N2 had a Xenix filesystem, and the rest were simply raw tar archives that I could extract with GNU tar (with some warnings). The vast majority of data looked intact, so I grabbed QEMU, and popped N1 in and booted it up.

N1 Boot

Unfortunately, the system would hang almost immediately after. Some testing revealed that the same issue existed on Bochs. PCjs got a bit further, but kernel panicked nearly immediately. Somewhat surprising to me though was VirtualBox not only booted, it got to the first step of the installer.

Language Selection

Some time later, I did discover the failure here, but I’ll save that story for another article :). */evil*

With the first hurdle passed, it wasn’t long before another problem reared its ugly head (more later). Unfortunately, shortly after that, the system would hang trying to partition /dev/hd0.

Partitioning Hang

Some trial and error showed that if I started the system up without any IDE drives, I could successfully get through to the partitioning screen. As I know Michal had gotten farther in his resurrection attempt, I dropped him an email, and began to dig into the both the boot hang, and the IDE driver, and get a debugging build of VirtualBox setup. As we exchanged emails, I learned Michal had not only found the IDE issue, he also had managed to extract a full set of debugging symbols and offsets, and some tips with using the VirtualBox debugger.

I’ll let him explain in his own words:

Hi Michael,

Here’s my analysis. The wd1010 driver in this version of Xenix is just plain wrong, and they were just lucky that it worked.

The problem is unquestionably with the INITIALIZE PARAMETERS command. The command is automatically executed by the _wdio routine if it finds that it hasn’t been done yet. All the code is in _wdio. It writes all the registers except for the command register. Then it potentially executes a loop which writes the command register and immediately reads the status register. If the error bit is set, the command is written again and the loop repeats until the error register is not set.

What happens in VirtualBox is that reading the status register clears the interrupt triggered by INITIALIZE PARAMETERS. That is the correct behavior, because reading the status register is *supposed* to clear the interrupt. Now at this point the CPU runs with interrupts enabled, but the disk interrupt is masked because the driver executed _spl5 further up the call stack in _wdstrategy. The interrupt is cleared from the device and from the controller, and the OS never receives it.

But the OS relies on the interrupt. It’s supposed to execute _wdintr, notice that INITIALIZE PARAMETERS was executed, set up a RECALIBRATE command into _wdjob and call _wdio again to continue with I/O. Once the interrupt from RECALIBRATE is processed, _wdjob is set up with a read or a write command, _wdio is executed, and the actual I/O happens.

Because the interrupt is cleared too soon, the state machine breaks down and the OS just sits there totally idle because it has nothing to do.

It appears that in old drives, INITIALIZE PARAMETERS [took] some non-negligible time to execute and reading the status register right after writing the command did not clear the interrupt because the command hadn’t yet set it. But then it is wrong to read the status register because if the command is going to fail, it’s probably going to take some time to fail, too.

This would be solved by making INITIALIZE PARAMETERS take a millisecond or two to complete. It is probably much easier to patch Xenix to do what it should have been doing all along, i.e. reading the alternate status register (3F6h instead of 1F7h) which does not clear interrupts.

A 30-Year Old Bug

For those less versed in ATA/IDE interfaces, let me translate this into more basic English. On x86 compatible machines, access to the hard drive is controlled via a dedicated hard-drive controller and managed via the port I/O interface on the process (using in/out opcodes). ATA commands are written to these registers. In this case, Xenix is sending the INITIALIZE PARAMETERS command which brings the drive out of reset, and sets up the addressing mode.

The designers of the ATA specification designed it such a way that I/O operations can be asynchronous; the CPU sends a command, and then goes to do something else. When the hard drive is ready for more, it raises an interrupt, telling the processor to send another command. This interrupt is cleared by reading from the primary status register at 0x1F7. This behavior is by design and has been a part of the ATA specification since day one. In some cases however, one may simply want to poll the drive to know its status without changing interrupt statuses. For this purpose, an alternate status register at 0x3F7 is provided.

Xenix uses lazy initialization; that is to say that a device isn’t initialized until it’s used; the wd driver is never executed until something accesses /dev/hd0, and thus why it hangs at partitioning and not during IPL. When fdisk starts, the wd driver attempts to initialize the drive, and immediately reads the status register to check for any possible error codes. Afterwards, it waits for the IDE controller to generate an interrupt letting it know the drive is ready. In doing so, Xenix clears the interrupt it would get from the INITIALIZE PARAMETERS command, and gets stuck in a spinloop. As such, the hang is caused by a legitimate bug in Xenix in its IDE implementation and can occur on real hardware.

It’s hard to say if this was actually a problem in 1987, however, older releases of Xenix were known to be incredibly picky about the hardware they would work on, and prevailing logic on USENET was that older releases of Xenix would flat out break on any processor faster than 50 Mhz, partially due to bugs like this. However, Xenix 2.3 (which was released not long after this version) rewrote the wd driver to not suffer from this race condition, so it likely was as much a problem then as it was now. As Michal noted, its possible to read the status register without clearing the interrupt, and get the behavior Xenix wants. One quick hex edit later, and I now get this.

Disk Geometry Select Partition Finishing up

Success! Due to the fact that it uses CHS (Cylinder, Head, and Sector) addressing and bypasses the BIOS, Xenix tops out at a maximum drive size of 504 MiB. After a few basic questions, I’m prompted to remove N2, and reboot.

Reboot

N1 goes back in as per the instructions, I cross my fingers, push Enter and …

Dreaded Z Hang

It hangs. Crud.

In our next installment, we'll go into trying to manually start the operating system when the only commands we have are tar, mount, dd, and sh, along with the Xenix manifest files, and thereby crash head first into Xenix's copy protection.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0, Disagree) by Anonymous Coward on Monday March 06 2017, @06:39PM (15 children)

    by Anonymous Coward on Monday March 06 2017, @06:39PM (#475729)

    Not only does this seem like a massive waste of time, but your comment is in no way deserving of the "Interesting" modifier.

    Starting Score:    0  points
    Moderation   0  
       Flamebait=1, Troll=1, Interesting=1, Funny=1, Disagree=2, Total=6
    Extra 'Disagree' Modifier   0  

    Total Score:   0  
  • (Score: 5, Insightful) by Unixnut on Monday March 06 2017, @06:58PM (11 children)

    by Unixnut (5779) on Monday March 06 2017, @06:58PM (#475742)

    > Not only does this seem like a massive waste of time, but your comment is in no way deserving of the "Interesting" modifier.

    In what way is this a waste of time? This is computing history. No different to people who try to preserve ancient artifacts, literature, or (more recently) classic cars for example. Just because the items in this case are virtual, doesn't mean it doesn't hold value to preserve them.

    Ok, so I doubt a copy of Xenix will ever result in a multi-million pound sale at Christies auction houses, but that doesn't mean it is a waste. We are building a history of computing here, not unlike ESRs "The Jargon file" which is held online. It is nice for newer generations to see where all the systems they use day to day originated from, especially as I would not be surprised to find that computing will be part of humanity for the forseable future.

    I knew of Xenix, I remember seeing the disks when I was a kid (and no idea what it was at the time), and it is nice to see someone get it up and running. Plus it is cool and nerdy what they did. I mean, fiddling with Hex values to fix a 30+ year old bug, that is pretty cool.

    While I personally would not do this (just don't have the time), I love reading about this kind of stuff. Not unlike the folks who will hand assemble CPUs from Transistors or ICs, either old architectures (like the PDP's) or their own ISAs. It is hard core geekery, and second the original posters point. We could do with more such stuff on Soylent (indeed this is what Slashdot originally had so much of, which first drew me in to the site).

    • (Score: -1, Troll) by Anonymous Coward on Monday March 06 2017, @07:11PM (8 children)

      by Anonymous Coward on Monday March 06 2017, @07:11PM (#475751)

      I disagree, so therefore I'm a troll; it is my opinion that SN does not need more posts like this, so I'm a troll.

      Mod me down, despite the fact that my comment contains as much value as the "Interesting" OP.

      • (Score: 3, Informative) by weeds on Monday March 06 2017, @07:34PM (3 children)

        by weeds (611) on Monday March 06 2017, @07:34PM (#475764) Journal

        You'll get modded down because you are a whiney AC complaining about mods instead of talking about the article.

        • (Score: -1, Troll) by Anonymous Coward on Monday March 06 2017, @07:40PM (2 children)

          by Anonymous Coward on Monday March 06 2017, @07:40PM (#475772)

          That's what's hilarious. Can you feel the pulsating beat of the Hivemind?

          • (Score: 2) by weeds on Monday March 06 2017, @07:49PM (1 child)

            by weeds (611) on Monday March 06 2017, @07:49PM (#475777) Journal

            Now you are just trolling. Insult the readers, insult the modders. bye bye.

            • (Score: -1, Troll) by Anonymous Coward on Monday March 06 2017, @07:52PM

              by Anonymous Coward on Monday March 06 2017, @07:52PM (#475778)

              It's the new way to say "I don't like your face."

      • (Score: 3, Insightful) by dry on Tuesday March 07 2017, @02:32AM

        by dry (223) on Tuesday March 07 2017, @02:32AM (#475880) Journal

        If you don't like the article, don't click it. I'm not a gamer and find the gaming articles boring. I simply stay away from them rather then posting that they're a waste of time. If you don't find this interesting, stay away. Meanwhile those of us who are interested can read.
        And for those of us who like stuff like this, the OS/2 Museum is an excellent site, covering much more then OS/2 though the OS/2 coverage is good as well.

      • (Score: 2) by NotSanguine on Tuesday March 07 2017, @08:36AM (2 children)

        Don't like the topic, don't read about it. That was easy, no?

        Is there something you'd like to see here?

        Go to the submissions page [soylentnews.org] and submit something you'd like to see and discuss.

        Let's talk about what you want to talk about, friend.

        Have a wonderful day!

        --
        No, no, you're not thinking; you're just being logical. --Niels Bohr
        • (Score: 3, Insightful) by Joe Desertrat on Tuesday March 07 2017, @10:22AM (1 child)

          by Joe Desertrat (2454) on Tuesday March 07 2017, @10:22AM (#475962)

          It's probably the same guy who complains this is supposed to be a tech site when a non-tech submission comes through.

          • (Score: 2) by NotSanguine on Tuesday March 07 2017, @01:38PM

            It's probably the same guy who complains this is supposed to be a tech site when a non-tech submission comes through.

            Perhaps you're right. Although this one is more technical than most postings here (and thank goodness, we could use more of these), so perhaps not.

            Regardless, it doesn't take much time or energy to encourage folks to submit the stuff they want to see. If this person just needs a kind word to get involved, then it's totally worth it. If he/she really is trolling, nothing much was lost and they are revealed as a poster boy for GIFT [penny-arcade.com].

            --
            No, no, you're not thinking; you're just being logical. --Niels Bohr
    • (Score: 2) by edIII on Monday March 06 2017, @07:24PM

      by edIII (791) on Monday March 06 2017, @07:24PM (#475760)

      Same here. I didn't even know of Xenix until I got up this morning and started reading SN. This is very cool stuff.

      --
      Technically, lunchtime is at any moment. It's just a wave function.
    • (Score: 1, Funny) by Anonymous Coward on Tuesday March 07 2017, @09:12AM

      by Anonymous Coward on Tuesday March 07 2017, @09:12AM (#475953)

      by Unixnut (5779) Neutral on 2017.03.07 3:58 (#475742)

      Username checks out.

      :D

  • (Score: 5, Insightful) by weeds on Monday March 06 2017, @07:32PM (2 children)

    by weeds (611) on Monday March 06 2017, @07:32PM (#475763) Journal

    At the end of the day, all hobbies are a "waste of time." There isn't any way to justify spending hours playing cards, building airplane models, or reading science fiction. It's a hobby. He likes doing it. Turns out some other people think it is interesting too. You are welcome to your opinion and certainly should voice it. That doesn't make you a troll. If you don't like the mod someone gets, make the huge and complicated leap to come off AC and mod it yourself.

    • (Score: 2) by FatPhil on Thursday March 16 2017, @08:40AM (1 child)

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Thursday March 16 2017, @08:40AM (#479702) Homepage
      My hobby's not a waste of time - it's drinking beer! https://www.ratebeer.com/user/51287/ Yes, I've managed to turn what I'd be doing anyway into my hobby, so clearly no time is wasted. It also means I have to waste less time thinking about holiday destinations - I go to where the new beer is! And less time planning my days whilst on holiday - start with a beer, continue until night, try to get 1 festival or 10-15 pubs in between.
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves