When we last left off, with the help of the excellent Michal Necasek of the OS/2 Museum, we had gotten the damaged Xenix 2.2.3c past the first hurdle of installation, and directly into a post-reboot crash, the cause of which (at the time) I suspected was another emulation failure.
Needless to say, I needed to get past this. At this point, I have been examining the raw images as best I can, and figuring out how the installer comes together. After a few experiments, I managed to determine a few basic facts about how Xenix is installed when booting from N1/N2:
So with knowing what the installer is trying to do, it was time to try and get down and dirty with it.
With a relatively complete understanding of the initial installation steps. I decided to create a boot floppy. By finding the initial strings for language selection, I was able to find where in the boot image the installer starts, and force it to pop open a dedicated shell with a hex editor. With that in place, I finally had a chance to explore the system somewhat. I learned a few interesting details while digging through this. There are references to 96 and 135 tpi media such as the following.
# We want to make the hard disk bootable in the 96 and 135 tpi
# installations so that we don't need to re-insert N1 to re-boot
TPI refers to as "tracks per inch" and is a very old style way of referring to differing types of floppy disk medium. In this specific case, 96 TPI refers to low- (or double-) density 720 KiB 3.5-inch floppy disks, and 135 TPI refers to high-density 1.44 MiB floppies. This suggests that this version of Xenix was available in multiple types of media. This comment would help me immensely in trying to perform a manual install. As it turned out, much to my annoyance, the N2 file system was extremely lean overall. By using "echo *" as a poor-man's ls, I was able to get a list of what I did and didn't have, the /bin directory was rather ... empty.
I also found I had /etc/mount and /etc/mknod which helped, but not much overall. Deciding to charge ahead, I ran through the normal partitioning and formatting steps, and then rebooted again with N1, and my modified N2 boot floppy. As I got my hands dirty, I also began to unpack and explore the other disks. As I mentioned before, aside from the first two disks, all the other ones were simply tar archives written as raw files. Or more specifically:
$ file *.img
Basic Utilities 1.img: tar archive
Basic Utilities 2.img: tar archive
Extended Utilities 1.img: tar archive
Each disk begins with a specific header with an empty file which identifies the disk number, product set, and machine set:
As one can plainly see, the B/X disks have a slightly different version, and identify themselves as n86, or generic x86. Furthermore, the N disks are the only ones that have "80386" binaries as defined by their headers. On top of that, investigating N1 I found a master manifest file that lists all the files on all the base installation disks, as well as special files, and mknod numbers. Bingo. Almost all the pieces I needed.
A quick check of the manifest file listings, and the contents of each disk confirmed that despite the differing version numbers, the media in and of itself belonged with each other; that is, these are the disks that correspond to Xenix 386 2.2.3c.
My initial experiments taught me a few things about Xenix, chief of which it very much didn't like its root filesystem floppy removed. If I removed N2 from A: at any point, Bad Things™ would happen not long after. As such, if I wanted to successfully bypass the installer and extract things into a working system, I need to figure out how to talk to it.
On UNIX systems, for those less familiar with them, disk operations are handled by special files in the /dev directory, such as /dev/hd0 for the first hard drive, or /dev/fd0 for first floppy drive controller, and so on. In contrast to more modern Linux systems using udev, these nodes exist as a set of static "dummy" files, created via the mknod command — mknod takes four arguments; the file to create, whether the device is binary or character based, and a blank-separated major/minor number that associates it with a driver in the kernel. Combined with the manifest file, it should have been trivial to create /dev/fd1 if it weren't for two simple issues.
As far as I can tell, having a read-only root filesystem is a hack that essentially is in place for two things; checking the file system and installation. Under Xenix, when / is mounted read-only, write operations succeed, and for a brief moment, you'll see a file in place and can even interact with it for a time and then it vanishes. Hindsight being 20:20, I could have simply forced / to be mounted read-write, but at the time, the thought didn't occur to me.
Needless to say, this caused all amounts of fun. I eventually realized I could simply mount the root partition at /mnt, and create the device nodes I needed at /mnt/dev, and they would stick around. First hurdle passed!
The floppy issue was a bit more difficult to work out. During installation, the scripts read from the /dev/rinstall device. The manifest also listed /dev/rinstall1 file which also generated errors. The manifest listed several variations.
FD48 b666 bin/bin 3 ./dev/fd1 2/5
FD96 b666 bin/bin 1 ./dev/fd196ds9 2/37
FD96 b666 bin/bin 2 ./dev/fd196ds15 2/53
FD96 b666 bin/bin 1 ./dev/fd196ds18 2/61
In practice, the only node that would work correctly was /dev/(r)fd196ds9, which probably means nothing to most people. Broken down, it's a mode selection for fd1 (B:). 96 refers tracks-per-inch, ds for double sided, and 9 for tracks per side. AKA, mode geometry for low/double density 3.5-inch floppies. Having divined the correct setting, tar could now read the disks:
Feeding the disks through tar, and manually executing several of the installation steps gave me a reasonable approximation of what the installed system should look like. Testing many of the utilities confirmed my original suspicion that the vast majority of the data was intact. Furthermore, I managed to extract /usr/bin/chroot from the Extended Utilities disk.
To make a long story short, I successfully extracted all the base installation disks, and began to work out the necessary steps to boot from the root file system. The system was extremely unstable in this state, with several utilities causing immediate kernel panics on launch (most annoying, vi did this, forcing me to use ed for almost all file editing). After several attempts, using N1 as a boot floppy, and pointing the root argument to the HD, I got very close to a successful boot.
The important line to see here is *** cron started ***, which is one of the final steps listed in /etc/rc before bringing up the login prompt, and a very optimistic step at eventually getting this all working. At this point, I had also learned the existence of the /tmp/init.* files, special shell scripts run during installation. Through these, I managed to learn of the setperms command, which reads the master manifest files on N1 and other disks, and does final tweaking and configuration. I also learned that I needed to do a brand operation on /etc/getty to decrypt the file, and install a serial number in it. With chroot in hand, and fingers crossed, I ran setperms with each manifest, rebooted, and ...
Well isn't that an interesting problem? That's the type of message you'd expect if someone detonated a fork bomb on your system.
Another examination of the installation scripts revealed the problem. During installation, three files are personalized with the "brand" utility. In the case of /etc/getty and /usr/sys/lib/libmdep.a, these files are decrypted with a secret derived from the serial number, and activation key. It would also foreshadow the issues we ran into once we began trying to restore the media to near-mint condition. The brand utility is also used to write those values into the kernel binary image.
As I found out as part of debugging, Xenix has unique behavior in handling the validation of serial numbers depending on how it's started. By its nature of being essential boot code, the kernel, by definition, can not be encrypted. As such, the kernel has a runtime check to make sure it has correct information. When started from the hard drive, the kernel reports "Invalid Serial Number" if it gets a mismatched set of keys and subtly degrades behavior.
However, in my case, my frankensteined system was loading its kernel from the the floppy drive. In this case, Xenix suppresses the serial check and prevents the message from displaying, but doesn't prevent the tripwire from being activated.
The tripwire in question is drastically lowering the number of processes that can be run. As it turns out, the limit is reached when the system is brought up in multiuser mode. As I found out (much) later, this behavior is actually documented as a footnote in one of the Xenix 286 manuals. As such, I copied the kernel from N1 to the hard drive, personalized it with brand, and after a reboot ....
With some more fiddling, I was able to run most of the post installation scripts, and even load the package manager, though it had some corruption issues.
Right about this time, Michal got back to me, and found that the reason the system hangs after reboot; N2 was missing two sectors in /bin/init. I was somewhat in disbelief, so I pulled out dosformat, made a DOS compatible disk, and copied out /etc/init from the booted system.
Sure enough ...
Ugh. So my frankensteined system was booting with half of its init binary missing. Awesome. At this point though, I had noticed something interesting on the international supplement, specifically, a /etc/init8 binary, one that had the same file size as the file on N2. When I compared them side by side...
Well isn't that interesting! A comparison of file-sizes show they're identical length, with similar (though not identical) modification dates. As far as I can tell, the only modification appears to be the time-stamp further in the binary. On a hunch, I compared the tail ends of the missing sectors, and they matched. So I simply copied the missing blocks from init8 to init, and then started a fresh new VM. After feeding floppies, this time, instead of the dreaded Z, I got something new.
It would die shortly afterwards, but now I was on a mission to try and see if I could restore the media to working state. I already proved to myself that enough data existed to at least make a restoration attempt viable. However, to rebuild the media, I needed to characterize the existing damage and find a way to rebuild or replace the missing sectors.
Next time, we dig into the world of teledisk, data reconstruction, and our first steps towards restoring the media.
Honestly, given the era, that's actually pretty damn accurate. UNIX itself has a lot of pitfalls which becomes very evident when you run software of the era; flakey filesystems, networking issues with NFS, lack of finegranded controls, etc. If large multiuser systems with thin clients/dumb clients was still common place, VMS would likely still be considerably more relevant in our lives.
When dealing with true multiuser systems such as minicomputers of the time, VMS would essentially run circles around UNIXes of the era, since it had superior support for batch processing, a much more robust filesystem complete with versioning, and better development tools. Support for Ada, Fortran, PL/1, and a few others. OpenVMS eventually fell to the wayside due to being tied to the VAX platform and the AXP port came kinda late, followed by I64 port. If there was an Intel compatible port in the late 80s/early 90s VMS would be less likely to be heading towards a footnote in history. I actually have Itanium hardware here so if I can get a hobbyist PAK, I could do a series of articles on VMS.
onestly, given the era, that's actually pretty damn accurate. UNIX itself has a lot of pitfalls which becomes very evident when you run software of the era; flakey filesystems, networking issues with NFS, lack of finegranded controls, etc. If large multiuser systems with thin clients/dumb clients was still common place, VMS would likely still be considerably more relevant in our lives.
You're absolutely right. I remember. Robust filesystems (the file versioning was, IMHO, better than anything I see even today in a basic filesystem), strong networking support (LAT/DECNet and TCP/IP), excellent SMP and clustering support, and a sophisticated security model. DCL was kind of clunky, but it made sense and was internally consistent. All in all, the PDP (RSX) and VAX (VMS) product lines were pretty darn good.
What's more, the DEC Fortran IV and F77 compilers were good. Really good. Better than just about any others.
DEC also had a really amazing service and support infrastructure (and it cost a pretty penny too!).
The "quaint" comment was for those who don't remember anything before, say, 1985. I cut my teeth on DEC systems (ah, the joys of RSX11m and VMS 4.7!) and was amazed at the newfangled PCs with 5.25" hard drives that stored 40MB (MFM) or 80MB (RLL) after installing 300MB drives that were about 2mX1.25mX1m and weighed 60Kg.
The DEC documentation was both voluminous and dense. Olsen wasn't kidding about it being "all there."
I do get a kick out of that quote, given where things went with both Unix (thanks to Sun, SGI, IBM and SCO), OS/2, Windows NT and eventually Linux.
Within ten years of that quote, DEC was on its way out even though they still had a huge installed base. The final ignominy was the sale to Compaq in 1998.
As for surviving with an Intel port, I'm not so sure about that. The thing about DEC was its vertical integration. And even with DECStations, their stuff was always going to be priced out of the commodity PC market, especially once ISA/EISA mobos were being mass produced in Asia.
A series like this on VMS would be pretty cool.
I almost wish I hadn't thrown away my old SPARCStations, I could do a series on SunOS 4. That would have been fun, although I have been playing around with Illumos/OpenIndiana [wikipedia.org] of late, but that's based on Solaris (SVr4) not BSD 4.2.
As an aside, If Novell had ported NCP/IPX to Unix, rather than relying on their bare metal/X86 assembler code base, they might have survived and even thrived. DEC failed to stay relevant too, although that was more of a corporate structure/business model issue than a platform issue, IMHO.
Well, Novell actually did do pretty much what you described; expect they did it with OS/2: http://www.os2museum.com/wp/netware-for-os2/. [os2museum.com] I've always been tempted to do *something* with NetWare as a retrocomputing project, but its just not that interesting unless I go dig out some token ring, or coax ethernet and go all in well ... *eh*. From what I know about DCL, it makes a heck of a lot more sense if you're familiar with mainframe systems of the era since as far as I know, it's essentially a mini-computer variant of JCL. The syntax is wonky only if you're expecting UNIX or DOS semantics, but isn't horrid after you get past that initial learning curve. At least if I do OpenVMS, I can talk about DECNet, and maybe even setup the old PATHWORKS stuff on Windows 3.1 to talk to a DEC system.
I'm not convinced that a port to Linux or a UNIX based platform would have saved NetWare, a dedicated "network" operating system made a lot less sense in a era where pre-emptive multitasking was a thing, and a server could do more than one dedicated operation. IPX was difficult to route in networks that needed to also talk to TCP/IP since you needed switches that could handle both protocols if you did any layer 3 routing, and that PDC and later AD basically did everything NDS did, and could do them both with NetBIOS or over TCP/IP which drastically simplified pain. AppleTalk was the only real survivor of the LAN battles, and that was mostly up until Mac OS X, talking to CIFS file shares was a nightmare out of the box; it was usually far easier to install AppleTalk for Windows 2000 then get classic Mac OS to play ball with Windows file sharing in a domain situation.
On VMS, it should have taken over the niche currently held onto by IBM pSeries and some of the higher end AIX stuff, and competed well on multi-user shared VPSes in the era before virtualization. If it wasn't for the fact that Itanium hardware is both hideously expensive, and has horrid performance on most generic workloads, I would have probably recommended a VMS solution even in this day to some customers where it was very well suited for (high reliability/availability requirements, combined with extremely "robust" documentation). The current x86_64 platform port of OpenVMS though might put some life back into the system and I might re-evaluate as a choice for folks where standard Linux/*NIX isn't a great fit.
Unfortunately, DEC went bust, Compaq gutted the corpse, and killed it, Alpha, and Tru64 when it jumped into bed with Intel on the Itantic. Then the POWER house of cards collapsed, and essentially killed any real competition to Intel in almost every market expect the low end, and the very high end.
I didn't say DCL's syntax was *wonky*, I said DCL was *clunky*. As is JCL. In fact, I pointed out that it was internally consistent and quite powerful. Much more so than just about anything else at the time. Even now it stands up pretty well.
I think you misunderstood my point of view. I can certainly see applications for VMS today, and it provided (and still does, in some places) quite a lot of value.
Initially, I just wanted to share what is, now, an amusing anecdote from the era of Xenix, not shit on DEC or VMS.
DEC tried (with Ultrix) to get into the p-series/AIX space, but Sun, SGI and even IBM ate their lunch pretty effectively. The standardization and economies of scale for ISA/EISA (and then PCI) killed those guys too. Intel was helped along by players like Apple (moving to the x86 platform) and Microsoft (dropping support for alternative architectures).
I kind of wish there was a more varied ecosystem for microprocessors. I think we'd get a lot more innovation that way.
I'd forgotten about Netware for OS/2. Given what happened to OS/2, it probably wouldn't have made much of a difference. My thought WRT porting NCP/IPX to Unix had more to do with creating an environment where development could be done on the same platform as production. With the implication that Novell moved to a model which didn't require NLMs, and used standard IPC and sockets to integrate functionality -- making NCP (which could have run on top of IP too) a much better competitor to (Blecch!) LANMAN.
IIRC, in the early to mid 90s, most routers had both IP and IPX support standard, so I don't think that would have been much of an issue.
I'd also point out that if it weren't for Phil Karn (KA9Q), Russell Nelson and the PC/TCP packet driver spec, Microsoft might still be using NetBIOS/NetBeui.
Actually, implementing KA9Q would be a great retro project. I haven't messed with it since 1991, but I think I'll download it and see what I can do.
If I ever upgrade my ham ticket to General class, I fully intend to do AX.25 (which KA9Q also supported) over HF, and then port UUCP to work over that type of link for downloading USENET groups over ham, would be nice if I'm backpacking in Africa and have a portable rig I could use to always get connectivity. I used a fairly ghetto UUCP setup to download mail when I lived in China because the worksite, and if I did it now, I'd probably also setup netnews and download a fair bit of GMANE RSS feeds and possibly a group or two, though a lot of USENET seems dead these days; I dusted off my eternal-september account but I can't find any groups that have a pulse.