Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 11 submissions in the queue.
Meta

Log In

Log In

Create Account  |  Retrieve Password


Site News


Funding Goal
Base Goal: $3000.00
Progress So Far: $854.74
28.5%
Stretch Goal: $2000.00
Progress So Far: $0.00
0%

Covers the period:
  2017-01-01 .. 2017-02-22
  (SPIDs: [586..612]) --martyb


Join our Folding@Home team:

Main F@H site

Our team page


Support us: Subscribe Here


Buy SoylentNews Swag
We always have a place for talented people, visit the Get Involved section on the wiki to see how you can make SoylentNews better.
posted by NCommander on Monday February 27, @12:51PM   Printer-friendly
from the we-knew-there-would-be-pitchforks dept.

Continuation of: Site Update 2/27

So, the recent site update got a lot of news, and comments. Predictably, there was a lot of comments split on the fence both ways. I've been out sick and haven't been actively involved in SN in a few days, but I did review the updated changes on dev before they went out. I'm still not up to responding to you guys personally, and TMB/Paul have had things covered, so I'm just going to write a blanket story. So, let's open this and say THIS ISN'T THE FINAL SET OF HOW THINGS WILL BE. I'm leaving my comments above the fold to make it clear what's going on. I'd put that in a blink tag on if that was still in the HTML standard.

The changes to commenting were primarily driven on technical grounds. To do D1.5, the site had to load a mass load of comments and do server side processing to thread them. To give you an example, on a cold page load, before we apply caching a few points in the site would take over a minute to load, render and thread. The only thing that prevented the site from becoming unusable in 503s is that the frontend has a lot of caching. Even with that, we can't cache every single bit of the site at once. In a "cold cache" scenario such as after a varnish or DB update, the site would be borderline unusable until those caches could be loaded. So let me make this clear that this change wasn't a change for changes sake. There was (and is) a need to revamp the commenting.

We noted that this change was coming in other meta stories, and even had a landing article on dev for people coming to check it out. No one did. How we use commenting on dev and how we use it on production are two different things; you can't realistically test these things in real world conditions without updating production.

As TMB stated, we couldn't get the same behavior without making the site cry in the corner, and this was fairly extensively tested on dev before it went live. For older users to the site, you may remember this is not the first time we've changed comments, and rather predictably, the roll out of Improved Commenting actually was fairly buggy. This is a more drastic update.

Right now, we're going to keep improving and changing things to address as many things as possible. To that extent, there will be a daily article for at least this week if not longer to allow for feedback as we work to make things better. If, at the end of all the tweaking, we can't satisfy the vast majority of folks, a revert remains as an available option. We've built this entire site on listening to the community, and taking their feedback into account. That isn't going to change now. I'm hoping we've earned enough trust from you guys collectively to be allowed to at least experiment for a bit.

I'm going to leave the rest of the article for the dev crew to use. Due to personal real life issues, I'm likely not going to be around much, so if you don't see me, that's why. I have full faith in the staff in helping manage and keep things going.

~ NCommander

Hi! I'm martyb (aka Bytram) your friendly neighborhood QA/test guy chiming in with my 2¢ on the upgrade/rollout.

Firstly, I apologize that you are seeing ANY issues with the site upgrade. I took this update very seriously and was, unfortunately, only able to perform about half of the testing that I wanted to see done before we went live. That said, there are some issues that were reported that I had not foreseen, so this has been a learning experience for me, too.

Secondly, I'd like to point out what you are NOT seeing -- the many MANY changes that TMB and PJ made as a result of feedback arising from testing. That said, comments are THE thing that makes this site. It's not the timeliness or fine writing of the stories — as I see it, this site is all about providing a venue for discussion.

Look past the fold for the rest of my comments.

Though there were a whole lot of tests that I was able to perform, there were many others that I had still not gotten to yet. I apologize that some of you had to scrape your knuckles on some very rough edges that made it through. In preparation for rollout I had written a series of programs to allow me to automate some aspects of submitting comments in different hierarchies which were key in identifying shortcomings in testing the correct operation of the expand/collapse and hide/show features. I was by no means able to perform an exhaustive test of all of the permutations but I was able to catch a number of issues and I'm sure TMB and PJ will attest that I beat on them pretty hard to make some changes. So far, I've seen no comments complaining about those controls functioning as they should, so YAY on that.

What has not been tested, and for which I hereby request the help of the community, are the user preferences whereby one can provide modifiers to certain aspects of comments. To access these, go to your preferences page, and then click on the "Comments" tab.

Here, you will see a set of modifiers grouped under the header: "Points Modification." The comment's actual score remains unchanged, but these modifiers allow you to provide a nudge to different categories so you could, say, favor "Funny" comments by adding +2 to the score calculation, and hiding all comments modded "Offtopic" by changing that modifier to "-6".

The "Reason Modifiers" are:
Insightful Offtopic Spam Interesting Flamebait Disagree Funny Troll Touché Informative Redundant

The "People Modifiers" are:
Friend Fan Foe Freak Friends-of-Friends fof Foes-of-Friends

And so on with modifiers for Anonymous postings, Karma Bonus, New User Modifiers, Small Comment Modifiers, and Long Comment Modifier.

I would appreciate these being explored and verified as to their correct operation. If you choose to help, please mention in the comments which control you tested, and what happened when you set it to -6, -2, +2, and +6.

These values are suggested so as to explore settings that make a given category nearly hidden (a "+5 Interesting" comment with the "Interesting" modifier set to -6 results in an effective score of -1) — set your threshold/breakthrough to 0 and those comments should not be displayed. Conversely, you can set the "Troll" modifier to +6 so even a "-1 Troll" comment would receive an effective score of +5 and should always appear in the comments you see displayed.

Lastly, but of extreme importance in my mind, is how impressed I am by the community feedback. Issues were stated, explained why it was problematic, steps required to reproduce, steps taken as an attempt at a workaround -- THIS is what keeps me going and donating my time to this site. We are working together to make this the best site we can. I'm proud to be a member of this community. Together I'm sure we can get the remaining issues worked out to people's satisfaction. And, as NCommander stated, if we are not able to do so, there is a fallback to the old approach. I must admit that some of the new features were a bit jarring to me (I started reading at the green site before it even had UIDs) so there's some long-practice reading/viewing skills that are being challenged, but overall I'm liking the changes. I hope you do, too.

posted by The Mighty Buzzard on Sunday February 26, @01:52AM   Printer-friendly
from the not-actually-an-NCommander-post dept.

Okay, I know it's been a long time since we did one of these but life does intrude on volunteer dev time. Hopefully this one will be worth the wait. Bear with me if I seem a bit off today, I'm writing this with a really fun head cold.

First, what didn't make it into this update but is directly upcoming. Bitpay is still down on account of them changing the API without notifying existing customers or versioning the new API and leaving the old one still up and functional. It's the first thing I'm going to work on after we get this update rolled out but it will basically require a complete rewrite. Don't expect it any earlier than two months from now because we like to test the complete hell out of any code that deals with your money.

Also, adding a Jobs nexus didn't quite make the cut because we're not entirely sure how/if we want to work it. One thing we are certain about, it would not be for headhunters or HR drones to spam us silly but for registered members who have a specific vacancy they need to fill and would like to throw it open to the community.

The API still has some broken bits but it's been low priority compared to what I've been busy with. I'm thinking I'll jump on it after Bitpay unless paulej72 cracks the whip and makes me fix bugs/implement features instead.

There were several other things that I had lined up for post-Bitpay but I can't remember them just now what with my head feeling like it's stuffed full of dirty gym socks.

Now let's throw the list of what did make it out there and go over it in more detail afterwards.

  • Tweaked the themes a bit where they were off.
  • Changed or fixed some adminy/editory stuff that most of you will never see or care about.
  • Fixed a mess of minor bugs not worth noting individually.
  • Improved Rehash installation. It should almost be possible to just follow directions and have a site working in an hour or two now.
  • Added a very restrictive Content Security Policy.
  • Added a link to the Hall of Fame. It was always there, just not linked to.
  • Return to where you just moderated after moderating. (yay!)
  • Return to where you just were after commenting. (yay some more!)
  • Added a field for department on submissions. Editors get final say but if you have a good one, go for it.
  • Added a Community Reviews nexus.
  • Added a Politics nexus.
  • Added <spoiler> tags for the Reviews nexus in case you want to talk about a novel without ruining it for everyone else. They function everywhere though.
  • Changed really freaking long comments to have a scrollbar now instead of being click-to-show.
  • Massively sped up comment rendering on heavily commented stories.
  • Dimming of comments you've already read. (You can turn this off with the controls on the "Comments" tab of your preferences page if it annoys you.)
  • Added a "*NEW*" badge to new comments in case you don't like dimming but still want to easily see new posts. (Disable it the same place as above.)
  • Removed Nested, Threaded, and Improved threaded comment rendering modes (Necessary due to the changes required for the massive speed-up)
  • Added Threaded-TOS and Threaded-TNG comment rendering modes. (TOS is the default)
  • All comment modes now feature collapsible/expandable comments. (Without javascript)

Morning Update: Really digging the constructive criticism. Some quality thoughts in there. Keep them coming and we'll see how fast we can get a few done. --TMB


Before the specifics, I know some of you are going to see the new Threaded modes and be like "that's pretty awesome" and some of you are going to call us dev types very bad names. Well, this ain't the other site. We're not saying "You Shall Use This Because It's New And Shiny". We're saying something had to be done about page load times approaching a full minute on heavily trafficked stories and the way we pulled and rendered comments made up nearly all of that time.

So, the first thing we did was we stopped pulling every single comment and then removing the ones we didn't want to display. Mostly that means that the comment counts in the dropdown menus for Threshold and Breakthrough are on a per-page basis now.

Next we did away with templates for comments. Wildcarded, case insensitive search and replace, even in perl, is horribly slow and that's a large part of how templates worked. The html and related logic is now hardcoded into the source. This did mean though that we had to entirely rewrite all the comment modes logic. Flat and Threaded-TOS are pretty much identical to the old Flat and Threaded, so there shouldn't be any surprises there except that we got rid of the javascript in Improved Threaded and gave every mode collapsible comments with nothing but CSS. Threaded-TNG is new-ish however. It's essentially Nested but without Threshold or every top-level comment being fully visible. If Nested users absolutely cannot live with that, we'll preempt working on the bitcoin rewrite and slap a Nested mode in as well. It shouldn't take but a week, testing included.

Third, we paginated every mode. I know it was nice being able to see every comment on one page but that meant pulling and rendering every comment and that simply doesn't work if a story has over a hundred comments.

The removal of sorting by score we can't roll back though. Its loss was a necessity due to the way we pull and sort only the comments that the user actually requests. Previously, we were pulling every single comment for a story and then removing the ones we didn't want. That was both bloody stupid and slow as hell, so it had to go. Unfortunately it means we have to do things slightly differently. It may make a triumphant return eventually but it would require some moderately tricky coding with the particular way our code is laid out.

Oh and if you have objections to the new Threaded modes, by all means bitch about specifics in comments here and we'll see what we can do to address them. After having spent so much time recently bashing on exactly these bits of code, we're quite familiar with them and changes/additions shouldn't take too terribly long to whip out.

Now to the specifics.

The buttons on the upper left of each comment don't work exactly like the Javascript version did but we do like how they work. The double chevron either shows or hides the comment tree beneath a comment but it does not change their collapsed/expanded state. The single chevron controls the expanded/collapsed state of each comment individually. Adding another button to expand/collapse every individual comment beneath a given comment may be doable but we haven't figured out how so far. It is high on the wish list but not high enough to delay the release any longer than it already has been.

Flat: Flat is still flat but now with a collapse/expand button that functions like the ones from Improved Threaded.

Threaded-TOS: If you can find significant differences between Improved Threaded and Threaded-TOS, let us know because it's probably a bug. The idea was to make it as much like Improved Threaded as technically possible with just CSS but paginated like Nested so we don't have to render more than 100 comments at a go. We defaulted everyone on Nested/Threaded/Improved threaded to Threaded-TOS to minimize the aggravation of unexpected change. Oh, and Breakthrough now takes precedence over Threshold, so high scoring comments will always be visible even if they're responding to blatant trolling.

Threaded-TNG: All comment trees start fully branched out but with the individual comments either expanded or collapsed. "Comment Below Threshold" functionality is gone. Breakthrough gets compared to a comment's score to decide if it gets expanded or collapsed. Play with it a couple minutes; it's not terribly hard to grok. Why do we need this mode if TOS covers most all of the best bits of the three old modes? Because I like it. You don't have to use it. Shut up.

What happened to Nested? What's old is new again. Threaded-TNG more or less is Nested but with the fun bits of Improved Threaded bolted on as well and without the annoyance of having to allow Javascript to run. Minus Threshold functionality. If you spot any serious differences between the two besides those, give us a heads up, because we didn't. It's a very easy mode to code on though, so if you absolutely cannot live without Threshold it's not at all difficult to clone it, add Threshold back in, and call it Nested.

Why not leave the old comment rendering modes in as well as the new ones? Because by rewriting them we got a rendering speed increase around a factor of two+, to go with the factor of two+ increase we got by pulling only the necessary comments instead of every last comment a story has with every page load. This has been becoming necessary as we increasingly go way above the 100 comment mark on busy stories. It's not cool for you lot to have to wait forty-five seconds to load a page of comments and it's even less cool to peg a cpu core for forty-five seconds to deliver it to you. If you ever again find a story that takes 10+s to load, something's going wrong and we'd appreciate a heads up. We think there's still some room in the code for improvement but this was the lowest-hanging fruit.

Now on to the rest of the details.

The Content Security Policy should cover what's required for operation of this site (plus allowing for Stripe payments) and nothing else. If your browser honors CSPs, it should not be possible to get smacked with XSS or inline script injection on this site any more; even if we write code buggy enough to allow it, which we have once or twice.

On dimmed comments... This only functions for logged in users currently as it would take some serious work to get it functioning for individual ACs, even using cookies. What it does is when you load a page of comments, it picks the highest comment ID from that story and marks that comment as read by you. Switching between pages of comments or changing your Threshold/sort order should not update which comments you have read, even if new ones have come in since your last read comment ID was set. Hitting the "Mark All as Read" button or hitting your browser's Refresh button on the main story page should take the stored comment ID and set the opacity to 60% on all the comments with a comment ID equal to or less than that. It's not entirely accurate but it's pretty damned close and it doesn't bloat the db much at all. Oh and read histories get wiped after two weeks of not being updated for a particular user/story combination to save on db space as well.

The new comment badge functions exactly opposite of dimmed comments. It puts "* NEW *" in the title bar of comments you haven't read yet. It's there strictly so you can have the same functionality but dislike the aesthetics of comment dimming. You can technically use both if you really want new comments to stand out but that would just be weird.

Returning to where you last moderated works like this. If you moderate one comment, you'll get sent back to that comment. If you moderate several in one go, you should get sent to the one farthest down the page. Moderating does not update the comment ID of what you've read for dimming purposes.

Returning to where you just made a comment? That's pretty self-explanatory. It also should not update the comment ID of what you've read for dimming purposes.

The Politics nexus. This does not mean we're looking to have even more political stories. The balance of tech/science/etc... to political stories is not going to change nor will the quality of accepted political submissions. It's primarily a way to let people who are sick and bloody tired of seeing politics here set a preference and never see political stories again. It's also handy if you wish to see what political stories we've run recently as clicking on the nexus link on the left of the page will show you only those stories.

The Reviews nexus has been brought up three separate times that I can remember by different groups of people, so we decided to go ahead with it. It's going to be a book/film/software/hardware/etc... review and discussion place. By my understanding, though I'm not really involved, it's getting its own space because some folks wanted to start what amounts to a site book club. Tech books will of course be welcome but it's open to all genres of printed and bound words. Ditto non-book reviews. Just don't go sending in a review of something we normally wouldn't publish news about on the site. Not enough people are going to be interested in your review of the barber shop down the street from your house, so it won't get published.

Spoiler tags, <spoiler>text you don't want casually seen</spoiler>, work both in stories and comments and are just a bit of css trickery that hide the text between them until the person viewing them hovers over the *SPOILER* text. There's a slight delay, so don't think it's not working because it's not immediate. That's intentional so you don't accidentally trigger showing the contained text by briefly crossing it.

By popular demand, <del> tags were also added.

That's all worth mentioning in this site update. Look for another one hopefully in May or late April. If you find any bugs, please slap them up as issues on our github repo or email them to dev@soylentnews.org.

posted by martyb on Friday February 17, @02:06AM   Printer-friendly
from the long-and-winding-road dept.

Two of SoylentNews' staff submitted stories noting our three-year anniversary; one a site summary of where we are and a summary of what we've done, and the other a detailed presentation of the very early days and how SoylentNews got started.

Three Whole Years -- Thanks to You!

Three years ago, today, SoylentNews announced its presence to the world. Much has happened along the way of our providing a place for a community to grow and to engage in discussion.

It started as a fork of five-year-old, open-sourced code which had suffered under benign neglect. Perl, Apache, MySQL, and other products had continued on. So we had to deal with dependencies on unsupported and back-level versions of code. A great deal of effort went into bringing the site up-to-date with current versions of that base. See below for mechanicjay's illuminating first-hand account of how that all got started.

[Continues...]

Those of you who were with us then can attest to the fact that site outages were a regular occurrence. Bugs were found and eradicated. New bugs were made, and found as well. We invited the community to vote to name the site. We created documents of incorporation and had them dutifully filed. On July 4th, 2014 we received notice of officially becoming SoylentNews PBC. But I get ahead of myself.

Not content with just running a clone of the old code, the staff embarked on a large number of improvements to the site. Support for Unicode characters (via UTF-8) was an early improvement. Refinements to moderation took place — you could now moderate and comment in the same story. Moderation points were issued to every registered user every single day. An API was written and made available. We have our own Folding@Home team (currently ranked 314 of 226132 teams in the world) which contributes spare compute cycles to help find cures to maladies such as Huntington's Disease. (See the Main F@H site and our team page.) We sent out a call for new editors to help our beleaguered editing team which was approaching burnout; several of you answered the call and we are greatly enriched by their viewpoints and their questioning of the status quo.

And what have we wrought? Our own place on the world-wide web, supported and run entirely by the community. For the numerate in our midst, here are some statistics for the site. As of the time of this writing (20170217_002919 UTC), SoylentNews has:

  • conducted 96 polls
  • posted 2098 user journal articles
  • registered 6496 user nicknames
  • published 15660 stories
  • received 18611 story submissions
  • posted 462690 comments
  • had 52699108 hits on stories

But that's not all! Unwilling to rest on their laurels, our development team has been hard at work bringing improvements to the site — along with some bug fixes. If you want to play with the current, in-development, subject-to-change-without-notice version of the site, hop on over to our development server. Do be aware several specially-crafted stories were created and posted there so as to evoke certain test conditions, so please respect the admonitions stated on those stories. Have an observation, question, or found a bug? We'd love to hear your feedback in the #dev channel on our IRC server.

We could not have done it alone — a great many of you have contributed to the site. There is the administrative tasks of paying the bills and handling legal obligations. Sysops support to keep our boxes up and running. Writing code and patching bugs (while minimizing the bug writing). Suggesting and testing new code/features and providing constructive feedback. Making financial contributions by signing up for subscriptions. Submitting story submissions for the editors to poke and prod at. All of this in support of a goal to provide a place where people can submit comments and engage in discussions with other interesting and intelligent people on the 'net. As with any community, there have been some 'heated' discussions. And most refreshing of all, are those discussions where nuggets of wisdom and brilliance appear — and make the whole effort worthwhile.

So, on behalf of the rest of the all-volunteer staff here at SoylentNews, let me say thank you. For your support, engagement, and questioning — we are a better site because of you. May we continue to earn your trust and support for many years to come.

In the comments, please feel free to mention anything significant that happened over these years which were inadvertently omitted as well as to tell us what we can do better.

So, to wind this up, I have one last question: "emacs or vi?" =)

Reflections on our First Days

For our third year, I have some Reflections on our third day.

In some of the pre-history of SoylentNews, here is some of the stuff that gets lost in the mists of time around the first coordinated development effort -- running on a VM, on a laptop in my basement under the slashcott.org domain. The slashcott had been announced and was to commence in some number of days. A bunch of folks thought it would be an awesome idea to get an independent version of slash running in time for the slashcott -- what could go wrong?

3 years and ton of life changes for me, makes some of this a little fuzzy, but I'll do my best to put things together. I've relied heavily on my email archive of that time which helped spur a bunch of memories. Hopefully this will be a coherent tale. (Maybe for next year I'll mine my personal IRC logs from when we were still on freenode).

At first there was a bunch of coordination in the ##slashcode channel on freenode, a bunch of emails were also buzzing around trying to coordinate some things and ideas. My first email to Barrabas was on 02/06/2014 [6 Feb 2014 for our non-US readers]. The issue at hand was that "slashcode" had been hastily open sourced 5 years prior, then pretty well abandoned. Not only did you need to build the perl modules from scratch, but it would only build against Apache 1.x. Once you managed to run that gauntlet, even compiled and installed, things barely ran and were pretty horribly broken. Anyway, it soon became apparent that robinld, NCommander and myself were making the most progress on getting something running, as I recall Robin was the first to success in getting an installed running site, but his VM was stuck behind a corporate firewall.

In the meantime, I had gotten the domain slashcott.org registered while trying to build things myself. At some point, a bunch of us decided to combine forces, Robin shipped me his VM, I got it running on my laptop (as it was the only 64-bit thing I had at the time), we got myself and Ncommander ssh'ed in and we started hacking. For some reason, RedHat vm's were horribly laggy on my openSuse VirtualBox host and work was slow and painful, but progress started to be made.

The only bug I've ever fixed in the code base was a critical piece of the new account email/password generation stuff, as I recall the generated password wasn't actually getting written to the DB. (sadly the evidence of my contribution has been lost, I think I shipped the fix to either robin or ncommander, so they have credit in the git history). Regardless, it was a critical piece - I have an email dated 02/08/2014 with my new account/password, which worked -- it was a huge boon and let us start to let a couple people in to start hammering away to find front-end bugs (of which there were countless). The next big thing I see from mining my email is the first "Nightly stories email", which came out on 02/11/2014 (from the slashcott.org domain). I think we ended up with about 50ish users on slashcott.org (gosh I hope I still have that vmdk stashed somewhere).

On the night of 02/11/2014 (or very early morning of 02/12/2014), after giving up and going to bed (I had a new born and was teaching an undergrad class on the side in addition to my regular 9-5 -- I was beyond toasted after a week). The VM locked up hard (it had done this a couple times, but I was always available to poke it with a stick and bring it back. As I was unavailable and no one had exchanged important things like phone numbers yet, NCommander made the executive decision to spin up a linode, which was great. The laggy VM on the laptop wasn't meant to last forever, though I admit I had visions (delusions?) of hosting the site myself on some real hardware at some point. In retrospect, Linode has been an amazing way to run this site and absolutely the right decision.

I got my new account on the li694-22 domain, on the 02/12/2014, that new account email was for mechanicjay, UID 7 -- which is where I live on the site to this day. I kept the slashcott.org server in sync with code changes for a bit, and was a pretty handy testing platform, until the "official" dev box came online on 02/14/2014. At some point during this week, we had landed on the soylentnews.org domain and that's where we went live on 02/17/2014.

So there you have it, we went from a group of independent pissed off people with no organization and an abandoned broken codebase to launching an honest-to-goodness site in ELEVEN fucking days.


Original Submission #1Original Submission #2

posted by NCommander on Tuesday February 07, @11:45AM   Printer-friendly
from the insert-systemd-rant-here dept.

So, in previous posts, I've talked about the fact that SoylentNews currently is powered on Ubuntu 14.04 + a single CentOS 6 box. Right now, the sysops have been somewhat deadlocked on what we should do going forward for our underlying operating system, and I am hoping to get community advice. Right now, the "obvious" choice of what to do is simply do-release-upgrade to Ubuntu 16.04. We've done in-place upgrades before without major issue, and I'm relatively certain we could upgrade without breaking the world. However, from my personal experience, 16.04 introduces systemd support into the stack and is not easily removable. Furthermore, at least in my personal experience, working with journalctl and such has caused me considerable headaches which I detailed in a comment awhile ago.

Discounting systemd itself, I've also found that Ubuntu 16.04 seems less "polished", for want of a better word. I've found I've had to do considerably more fiddling and tweaking to get it to work as a server distro than I had to do with previous releases, as well as had weird issues with LDAP. The same was also true when I worked with recent versions with Debian. As such, there's been a general feeling with the sysops that it's time to go somewhere else.

Below the fold are basically the options as we see them, and I hope if the community can provide some interesting insight or guidance.

Right now, we have about three years before security updates for 14.04 stop, and we are absolutely forced to migrate or upgrade. However, we're already hitting pain due to outdated software; I managed to briefly hose the DNS setup over the weekend trying to deploy CAA records for SN due to our version of BIND being outdated. When TLS 1.3 gets standardized, we're going to have a similar problem with our frontend load balancers. As such, I want to get a plan in place for migration so we can start upgrading over the next year instead of panicking and having to do something at the last moment

The SN Software Stack

As with any discussion for server operating system, knowing what our workloads and such is an important consideration. In short, this is what we use for SN, and the software we have to support

  • nginx - Loadbalancing/SSL Termination
  • Apache 2.2 + mod_perl - rehash (we run it with a separate instance of Apache and Perl, and not the system copy)
  • MySQL Cluster for production
  • MySQL standard for secondary services
  • Kerberos + Hesiod - single-signon/authetication
  • Postfix+Squirrelmail - ... mail

In addition, we use mandatory application controls (AppArmor) to limit the amount of stuff a given process can access for critical services to try and help harden security. We'd like to maintain support for this feature to whatever we migrate, either continuing with AppArmor, switching to SELinux, or using jails/zones if we switch operating systems entirely.

The Options

Right now, we've floated a few options, but we're willing to hear more.

A non-systemd Linux distro

The first choice is simply migrate over to a distribution where systemd is not present or completely optional. As of writing, Arch Linux, Gentoo, and Slackware are three such options. Our requirements for a Linux distribution is a good record of updates and security support as I don't wish to be upgrading the system once a week to a new release.

Release-based distributions

I'm aware of the Devuan project, and at first glance, it would seem like an obvious choice; Debian without systemd is the de-facto tagline. However, I've got concerns about the long-term suitability of the distribution, as well as an intentional choice to replace much of the time-tested Debian infrastructure such as the testing archive with a git-powered Jenkins instance in it's place. Another option would be slackware, but Slackware has made no indication that they won't adapt systemd, and is historically very weak with in-place upgrading and package management in general. Most of the other distributions on without-systemd.org are either LiveCDs, or are very small minority distros that I would be hesitant to bet the farm on with.

Rolling-release distributions

On the other side of the coin, and an option favored by at least some of the staff is to migrate to Gentoo or Arch, which are rolling-release. For those unaware, a rolling release distribution basically always has the latest version of everything. Security updates are handled simply by updating to the latest upstream package for the most part. I'm not a huge fan of this option, as we're dependent on self-built software, and it's not unheard of for "emerge world" to break things during upgrades due to feature changes and such. It would essentially require us to manually be checking release notes, and crossing our fingers every time we did a major upgrade. We could reduce some of this pain by simply migrating all our infrastructure to the form of ebuilds so that at least they would get rebuild as part of upgrading, but I'm very very hesitant about this option as a whole, especially for multiple machines.

Switch to FreeBSD/illumos/Other

Another way we could handle the problem is simply jump off the Linux ship entirely. From a personal perspective, I'm not exactly thrilled on the way Linux as a collective whole has gone for several years, and I see the situation only getting worse with time. As an additional benefit, switching off Linux gives us the possiblity of using real containers and ZFS, which would allow us to further isolate components of the stack, and give us the option to do rollbacks if ever necessary on a blocked upgrade; something that is difficult to impossible with most Linux distributions. As such, I've been favoring this option personally, though I'm not sold enough to make the jump. Two major options attract me of these two:

FreeBSD

FreeBSD has been around a long time, and has both considerable developer support, and support for a lot of features we'd like such as ZFS, jails, and a sane upstream. FreeBSD is split into two components, the core stack which is what constitutes a release, and the ports collection which is add-on software. Both can be upgraded (somewhat) independently of each other, so we won't have as much pain with outdated server components. We'd also have the ability to easy create jails for things like rehash, MySQL, and such and easily isolate these components from each other in a way that's more iron-clad than AppArmor or SELinux.

illumos

illumos is descended from OpenSolaris, and forked after Oracle closed up the source code for Solaris 11. Development has continued on it (at a, granted, slower place). Being the originator of ZFS, it has class A support for it, as well as zones which are functionally equivalent to FreeBSD jails. illumos also has support for SMF, which is essentially advanced service management and tracking without all the baggage systemd creates and tendrils throughout the stack. Zones can also be branded to run Linux binaries to some extent so we can handle migrating the core system over by simply installing illumos, restoring a backup into a branded zone, and then piecemeal decommissioning of said zone. As such, as an upgrade choice, this is fairly attractive. If we migrate to illumos, we'll either use the SmartOS distribution, or OpenIndiana.

Final Notes

Right now, we're basically on the fence with all options, so hopefully the community can provide their own input, or suggest other options we're not aware of. I look forward to your comments below!

~ NCommander

posted by NCommander on Friday January 20, @04:43PM   Printer-friendly
from the hot-upgrading-database-servers-ftw dept.

Earlier today, we ran an article detailing that Oracle released 270 critical security updates for many of its products, including MySQL cluster which we use here to provide high uptime and reliability for SoylentNews. Needless to say, it was time to upgrade both NDB backends, and the four MySQLd frontends. While the upgrade did not go completely smoothly due to the fact that MySQL strict mode got enabled, and broke the site briefly, our total downtime was less than five minutes or so. Right now, we had to do a full flush and purge of all caches, which means the site is running a bit larky until they can repopulate but I'm pleased to announce we're up to date and secure!

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)]	2 node(s)
id=2	@redacted (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0)
id=3	@redacted (mysql-5.7.17 ndb-7.5.5, Nodegroup: 0, *)

[ndb_mgmd(MGM)]	2 node(s)
id=101	@redacted (mysql-5.7.17 ndb-7.5.5)
id=102	@redacted (mysql-5.7.17 ndb-7.5.5)

[mysqld(API)]	4 node(s)
id=11	@redacted (mysql-5.7.17 ndb-7.5.5)
id=12	@redacted (mysql-5.7.17 ndb-7.5.5)
id=13	@redacted (mysql-5.7.17 ndb-7.5.5)
id=14	@redacted (mysql-5.7.17 ndb-7.5.5)

If you notice any unusual breakages or slowdowns, please let me know in the comments. Otherwise, keep calm and carry on!

~ NCommander

posted by martyb on Friday January 20, @07:00AM   Printer-friendly
from the making-a-legacy dept.

A couple months ago we ran a story asking the SoylentNews community for volunteers to help with editing and the community did not let us down; we received a full dozen inquiries! You've probably noticed a few new names art the top of the stories and quite frankly, their contributions made it possible for the staff to survive the holiday season — many, many thanks!

If, for whatever reason, you did not want to be an Editor, but still wish to contribute, there are many other areas:

Submit stories
Click the Submit Story link in the "Navigation" slashbox on the left-hand side of the main page. It is not necessary to write perfect prose (though we sure appreciate it when we see it!) If you find a story that you find interesting and think that others might also enjoy it, too, send it in! We publish, on average, about 450 stories a month. If 1% of the community submitted a story or two each month, it would make a huge difference!
Post comments
You don't need to be a subject-matter expert to comment on a story! (Though we sure do appreciate when such people chime in!) Sometimes the best discussions come about simply because someone asked a question.
Perform Moderation
Moderation is like Olympic Scoring. Everything from a "-1" (not worth the electrons used to store it) to a "5" (one of the best on the site). Each registered user gets 5 mod points per day. Concentrate on promoting the good rather than hiding the bad... we want to make sure the most insightful, interesting, and informative comments are visible.
Help site development and operation
Something bugging you about site behavior? Have experience in running a web site? Know how to run an IRC server? Know your way around a Wiki? Can code Perl in your sleep? Have experience doing QA and/or test? Don't have this knowledge but would like to learn? Join our development and/or operations team.
Support the site
Your financial contributions are critical to our continued operation. Subscribe to SoylentNews or buy SoylentNews swag.
Other?
See something else where you'd like to help out? Let us know — the more the merrier! There's a lot of fun and camaraderie in our team... in large part it's why I continue to contribute to the site. Join in on the fun!

There are many rewards for contributing. Just to be a part of such a diverse and knowledgeable team is indescribable. I have learned so much from some amazingly helpful people. So join up as an editor, submit stories and comments, moderate, or help the site to keep running.

Lastly, spread the word. Share a link to the main page, to a particular story, or even to a single comment.

--martyb

posted by cmn32480 on Friday December 23, @02:38PM   Printer-friendly
from the we-appreciate-the-help! dept.

Hi Guys, Soylent's Editors do a lot behind the scenes to keep the community going. As a gift idea for them this year, please consider submitting lots of stories over the next two days to get the queue nice and full. Then they'll be able to schedule in their appearance on the home page ahead of time and take Christmas (or Hanukkah) off to spend time with their friends and families.

If you've never submitted a story before, here are guidelines for composing a story submission. You submit it here.

My own method is to find tech/science articles from SN's RSS-bot or a dozen other sources like the BBC or sciencenews.org, grab the title, and a couple of paragraphs that communicate the gist. Often I'll add a quip, question, or note of my own, but that's up to your personal taste. It's easy and takes under 5 minutes per story.

Thanks for reading, and have a happy holiday!


[Ed Note: The week between Christmas and New Years is always slow for submissions and time is a precious commodity for all of us. The more subs in the queue, the further out we can get the story queue, and the more time we have to spend with our loved ones. Any help you can give would be appreciated!]

Original Submission

posted by janrinok on Wednesday November 30, @04:24PM   Printer-friendly
from the fresh-blood dept.

Hopefully you will have noticed a number of new editors that have appeared to help keep this site running. They have been active for over a week but you might not have noticed them if you have been enjoying the Thanksgiving Day holiday, or just spending money during Black Friday (which seems to last longer each year!)

Snow, Charon, FatPhil, Fnord666, and GreatOutdoors have completed their training and are busy making their contributions to the team, and there are several more volunteers who will begin training in the near future. I hope that you will welcome them and keep them busy by providing more and varied submissions for them to battle with. They have already significantly reduced the strain on the editorial team and we are all breathing a collective sigh of relief back here. Thank you for volunteering guys!

posted by cmn32480 on Tuesday November 15, @02:00AM   Printer-friendly
from the editorial-staff-is-getting-TIRED dept.

I will put this to you, the community, in a very straight, simple (hopefully understandable) way.

The editorial staff is a small, hardworking group. There are currently about 5 of us that are actively pushing stories out on a regular basis, and we need help.

We humbly come to you, the community, to solicit for a volunteer or two. We will provide all the necessary training, at a cost of just 3 easy payments of $999.99US, or entirely free if you apply before 1 Apr 2099.

For that pittance, you can expect to learn:

  • The editing process
  • How to get onto the bouncer for IRC
  • The best ways to abuse The Mighty Buzzard
  • The secret staff handshake
  • How to be abused, and learn to like it
  • How to deal with having your name in lights
  • and much much more!

In all seriousness, we all are busy and have lives. So do you, and we get that, but for this community to continue to thrive, we need a little fresh blood on the editorial staff. Some of us have been at this since the site went live almost 3 years ago (janrinok and martyb have posted over 3000 articles EACH). To put it in perspective, the site has only run about 14,500. Some of us came on almost a year later, but like any organization, there has been attrition, and we need to replenish.

We are starting to see some of the tell-tale signs of burnout, and to avoid that, we need your help.

If you are interested, please feel free to reach out in the comments below, via email ([nick] at soylentnews dot org), or hit us on IRC. If we aren't there (we all LOOK like we are logged in all the time due to the bouncer, but we may not actually be there), /join #editorial and leave a message — we will get back to you.

Remember, it isn't all doom and gloom! Working on staff, you will be on a team with a fantastic group of REALLY smart (myself excluded) people. I can honestly say I have made some really good friends from this experience, and I've even gotten to meet one of the guys in meat space. It is something that I am truly glad I took advantage of when the opportunity came around.

Thanks for listening, and with a little luck, we will see one or two of you pretty soon.

Live Long and Prosper,

-cmn32480

[TMB Note: Seriously. You really don't want me having to pick stories.]

[Update: see this comment below if you've expressed interest in volunteering.

posted by NCommander on Monday November 07, @12:44PM   Printer-friendly
from the whadaya-say? dept.

So, as per usual, I like to occasionally check the pulse on the community to make sure that people for the most part are happy and satisfied with the day-to-day operation of the site. For those of you who are new to the community, first, let me welcome you and explain how these work.

When I open the floor to the community, the intent is to provide a venue to discuss anything related to site operations, content, and anything along those lines. I actively review and comment on these posts, and if one issue pops up multiple times in comments, I generally run follow up articles to try and help address issues the community feels is important before someone decides to take rehash and form a spinoff. Feel free to leave whatever thoughts you want below.

In contrary to my usual posts, I don't have that much to say to this, so to both the community and editorial team's relief, I'll cut this off right here before it becomes Yet Another NCommander Novel.

~ NCommander

posted by The Mighty Buzzard on Sunday October 30, @02:45PM   Printer-friendly
from the no-fishing-for-me-this-morning dept.

Right, so there's currently a DDoS of our site specifically happening. Part of me is mildly annoyed, part of me is proud that we're worth DDoS-ing now. Since it's only slowing us down a bit and not actually shutting us down, I'm half tempted to just let them run their botnet time out. I suppose we should tweak the firewall a bit though. Sigh, I hate working on weekends.

Update: Okay, that appears to have mitigated it; the site's functional at a reasonable rate of responsiveness.

Update2: Attack's over for now. You may go about your business.

posted by martyb on Monday October 10, @01:26AM   Printer-friendly
from the fun-with-numbers dept.

Since the launch of SoylentNews in February of 2014, there have been 274,870 comment moderations made against the 412,100 comments that our community has posted to our site. Who has posted the most comments? Who garnered the most up-moderations? The most down-moderations?

Such simple questions, but they led to a fun bit of DB querying. The results surprised me, and I thought others might be interested, as well. Most surprising to me was the assessment of comments from Anonymous Cowards.

[Continues...]

Who received the most moderations?

For better or worse, to whom did Soylentils direct their greatest moderation effort?

NICK UID TOTAL DOWN UP NET
The Mighty Buzzard 18 2260 626 1634 1008
takyon 881 2315 103 2212 2109
aristarchus 2645 2494 615 1879 1264
c0lo 156 2717 183 2534 2351
Thexalon 636 3225 83 3142 3059
Ethanol-fueled 2792 3447 1238 2209 971
VLM 445 4401 346 4055 3709
Runaway1956 2926 4531 992 3539 2547
frojack 1554 5855 593 5262 4669
Anonymous Coward 1 78936 13002 65934 52932

The single greatest target of moderation was the "Anonymous Coward" with 78,936 moderations. This was followed by frojack, Runaway1956, VLM, Ethanol-fueled, and Thexalon who garnered over 3000 moderations each.

Who had the most down-moderations?

Here, only the number of down moderations was considered — it mattered not whether it was Flamebait or Troll — they all counted the same.

NICK UID TOTAL DOWN UP NET
VLM 445 4401 346 4055 3709
Hairyfeet 75 1620 387 1233 846
MichaelDavidCrawford 2339 1513 387 1126 739
frojack 1554 5855 593 5262 4669
aristarchus 2645 2494 615 1879 1264
The Mighty Buzzard 18 2260 626 1634 1008
jmorris 4844 2144 753 1391 638
Runaway1956 2926 4531 992 3539 2547
Ethanol-fueled 2792 3447 1238 2209 971
Anonymous Coward 1 78936 13002 65934 52932

Once again, our prolific AC topped the list with 13,002 down-mods. Ethanol-fueled was the only other user who topped 1000 down-mods, coming in with 1238. Runaway1956 made a valiant showing with 992 down-mods.

Who had the most up-moderations?

In the eyes of the community, who most often received an up-mod? Again, no consideration was given for the nature of the up-mod — Insightful, Interesting, or Informative — all were considered the same.

NICK UID TOTAL DOWN UP NET
aristarchus 2645 2494 615 1879 1264
Phoenix666 552 2184 80 2104 2024
Ethanol-fueled 2792 3447 1238 2209 971
takyon 881 2315 103 2212 2109
c0lo 156 2717 183 2534 2351
Thexalon 636 3225 83 3142 3059
Runaway1956 2926 4531 992 3539 2547
VLM 445 4401 346 4055 3709
frojack 1554 5855 593 5262 4669
Anonymous Coward 1 78936 13002 65934 52932

Once again AC reins supreme with 65,934 up-mods. This was followed by frojack with 5,262 and VLM with just over 4000.

Who had the highest net-moderation?

Putting it all together — subtracting the number of down-mods from the number of up-mods — who had the highest net moderation on our site?

NICK UID TOTAL DOWN UP NET
wonkey_monkey 279 1754 117 1637 1520
maxwell demon 1608 1786 55 1731 1676
Phoenix666 552 2184 80 2104 2024
takyon 881 2315 103 2212 2109
c0lo 156 2717 183 2534 2351
Runaway1956 2926 4531 992 3539 2547
Thexalon 636 3225 83 3142 3059
VLM 445 4401 346 4055 3709
frojack 1554 5855 593 5262 4669
Anonymous Coward 1 78936 13002 65934 52932

Once again, the shy but prolific AC tops the list with a net of 52,932 mod points. Only one other Soylentil was able to surpass 4000: frojack with 4,669. Two other Soylentils exceeded 3000: VLM with 3709 and Thexalon with 3059.

Who hath pointy horns?

Who managed to acquire the most down-mods as a percentage of all moderations on their comments? For a tie, number of moderated comments is the second sort field. Who is the devil in our midst?

NICK UID TOTAL #DOWN %DOWN #UP %UP NET
scarboni888 5061 1 1 100.00 0 0.00 -1
MooCow 6048 1 1 100.00 0 0.00 -1
cybergimli 436 2 2 100.00 0 0.00 -2
rancidman 769 2 2 100.00 0 0.00 -2
rmdingler 1038 2 2 100.00 0 0.00 -2
SoylentsISay 1331 2 2 100.00 0 0.00 -2
stupid 2631 2 2 100.00 0 0.00 -2
contrapunctus 3495 2 2 100.00 0 0.00 -2
killal -9 bash 2751 5 5 100.00 0 0.00 -5

Pfft, just a few minor imps around here. killal -9 bash topped (bottomed?) the list with 5 down-mods out of 5 moderations.

Who earned a Halo?

Whose comments had the best percentage of up-mods to total-mods? And in the case of ties, received the most up-mods? Who are the angels among us?

NICK UID TOTAL #DOWN %DOWN #UP %UP NET
dx3bydt3 82 69 0 0.00 69 100.00 69
romlok 1241 70 0 0.00 70 100.00 70
Hawkwind 3531 75 0 0.00 75 100.00 75
jdccdevel 1329 78 0 0.00 78 100.00 78
rleigh 4887 102 0 0.00 102 100.00 102
DrMag 1860 103 0 0.00 103 100.00 103
SrLnclt 1473 117 0 0.00 117 100.00 117
Joe 2583 126 0 0.00 126 100.00 126
Aiwendil 531 164 0 0.00 164 100.00 164

Here, it appears we've got a flock of angels, or at least people who know which way the wind blows. All folks listed here scored 100.00% meaning all of their moderations were up-mods. Aiwendil topped our list with 164, and we had 4 others — Joe, SrLnclt, DrMag, and rleigh — who each had over 100 such comment moderations... not even a single down-mod among them!

I must admit I was surprised to see the sheer number of positive moderations of AC comments, and the fact that 83.5% of those mods were positive.

[Update: Added two tables, one each for top percentage of down-mods and of up-mods. -Ed.]

posted by NCommander on Tuesday September 20, @01:00PM   Printer-friendly
from the now-you-can-be-1337-by-knowing-what-a-far-call-is dept.

The Retro-Malware series is an experiment on original content for SoylentNews, written in the hopes to motivate people to subscribe to the site and help grow our resources. The previous article talked a bit about the programming environment imposed by DOS and 16-bit Intel segmented programming; it should be read before this one.

Before we get into this installment, I do want to apologize for the delay into getting this article up. A semi-unexpected cross-country drive combined with a distinct lack of surviving programming documentation has made getting this article written up take far longer than expected. Picking up from where we were before, today we're going to look into Terminate-and-Stay Resident programming, interrupt chaining, and get our first taste of how DOS handles conventional memory. Full annotated code and binaries are available here in the retromalware git repo.

In This Article

  • What Are TSRs
  • Interrupt Handlers And Chaining
  • Calling Conventions
  • Walking through an example TSR
  • Help Wanted

As usual, check past the break for more. In addition, if you are a licensed ham operator or have ham radio equipment, I could use your help, check the details at the end of this article.

[Continues...]

What Are TSRs?

For anyone who used DOS regularly, TSRs (short for Terminate and Stay Resident) were likely a source of both fun and frustration. Originally appearing in DOS 2.0, TSRs, as the name suggests, are programs that exit but leave some part of their code around in memory. TSRs are primarily used to provide device drivers, extended APIs, or hooks that other applications can take advantage of. At the same time, they also could be used (as we will be doing) to install invisible hooks to modify, change, or log system behaviors. In that sense, they can be considered broadly equivalent to extensions on classic Mac OS. The BIOS could be considered a special type of TSR as it's always available in memory to provide services to the operating system and applications.

From a technical perspective, a TSR is any application that executes int 21h with the right options. Ralph Brown's interrupt guide has this to say on DOS's API for TSRs:

DOS 2+ - TERMINATE AND STAY RESIDENT

AH = 31h
AL = return code
DX = number of paragraphs to keep resident

Return:
Never

Notes: The value in DX only affects the memory block containing the PSP; additional
memory allocated via AH=48h is not affected. The minimum number of paragraphs
which will remain resident is 11h for DOS 2.x and 06h for DOS 3.0+. Most TSRs can
save some memory by releasing their environment block before terminating
(see #01378 at AH=26h,AH=49h). Any open files remain open, so one should
close any files which will not be used before going resident; to access a file
which is left open from the TSR, one must switch PSP segments first (see AH=50h)

Well, for most people, I suspect that is as clear as mud. Let me try and explain it a bit better. Essentially, when a program flags to DOS that it wants to TSR, DOS simply leaves the amount of memory marked in DX alone, and marks those paragraphs (which are 16 bytes) as 'in use' so neither it nor any other well-behaving programs will attempt to use them. No relocation or copying is done as part of this process; the memory is simply marked dead, and left 'as is' as we'll see below.

This is problematic for a number of reasons. As I mentioned in the previous article, when in real mode, Intel processors can only access up to 1 MiB of memory, and it's the area where all applications, drives and device address space needs to squeeze into. Of this, only 640 kiB are normally available to applications (which is known as conventional memory). If a TSR is too large, or too many are loaded, its is very easy to run out of RAM to do anything useful with the machine. To make matters worse, DOS provides absolutely no mechanism to manage or uninstall TSRs. (Once an application is resident, it's staying there unless it is specifically designed to unhook itself and free itself from memory.) Combine that with the fact that there's no 'official' way of doing so in the DOS APIs.

This quickly lead to an era where you might need a specific boot floppy for a given application so as to have its TSRs available (such as mouse or network drivers) — and nothing else — so that there would be enough conventional memory left to fit everything in. While several third-party efforts tried to standardize TSR installation/removal — such as TesSeRact — none of them became a true de-facto standard. Furthermore, it is very possible for a TSR removal to leave memory in a fragmented state which could break other applications. An entire cottage industry of memory optimizers quickly sprang up which could load TSRs into high memory.

At this point, you may be wondering "If TSRs are so miserable, why use them?". The answer, unfortunately, is that it is the only way on DOS to provide any sort of extended functionality. DOS has no concept of shared libraries or multitasking; it was TSR or bust. This brings us to our next topic: interrupt handling.

Interrupt Handlers

While I touched on interrupts in the previous article, I didn't go into too much detail. Interrupts, simply put, are special signals sent to the processor to tell it to stop what it's doing and do something else immediately. These interrupts can be generated by either hardware or software. Interrupts essentially operate like this:

  • Processor is doing work.
  • Interrupt occurs
  • Processor saves location and jumps to interrupt handler
  • Interrupt handler runs
  • Interrupt handler finishes, and returns to the original task

When an interrupt occurs, the processor looks at the Interrupt Vector Table (IVT) located at 0x0 to determine where it needs to jump to handle that interrupt. The function that handles an interrupt is known as an Interrupt Service Routine (ISR). Assuming there is a valid handler address in the IVT, the processor does a far call to the IVT and immediately continues execution. A 'bare bones' interrupt handler looks something like this:

previous_hook_offset: dw 0
previous_hook_segment: dw 0

hook:
	; When we come into an interrupt, only the
	; code segment and instruction pointer are preserved
	; for us. It's the responsibility of the handler to
	; preserve this information.

	; This is saved on the application's local stack, which is fine
	; for now (FreeDOS does the same thing internally) as long as
	; we're not putting any large items on it. We'll look at setting
	; up a local stack later.

	pushf ; Save flags
	pusha ; push all general registers to the stack

	; Setup segments
	push ds
	push es

	; For interrupt handlers, CS=DS normally, and SS either points at:
	; the application stack (aka, whatever was running before we were)
	; or at a local stack setup by the TSR.
	; On x86, it's not possible to directly copy from one segment register
	; to another, so we'll use AX as a scratch:
	mov ax, cs
	mov ds, ax
	mov es, ax

	; Let's add a "hello world" hook:

	; NOTE: Normally it's a bad idea to call DOS interrupts in a TSR
	; because DOS itself is not re-entrant. However, as in this example,
	; we've hooked the unused 0x66, which DOS does not call out of the box,
	; which means we'll never be in this ISR while we're in DOS.
	; If this were real code, we would have to check the INDOS flag for sanity.
	mov ah, 9
	mov dx, hello_world_str
	int 0x21

	; DOS compatability "quirk?". On DOSBox (which I initially tested this on)
	; there's a default entry in the IVT for all interrupts in F000:xxxx.
	; Documentation suggestions that this is also the default behavior
	; for MS-DOS though I can't confirm it.
	;
	; FreeDOS, on the other hand, leaves unused INTs initialized to 0000:0000
	; so blindly far calling it causes a fault. So we need to check if the
	; segment is 0000, and skip chaining if that's the case

	cmp word [previous_hook_segment], 0x0000
	je skip_chain

	; Chain to other TSRs
	pushf ; pushf is required because iret expects to pop flags
	call far [previous_hook_offset]

	skip_chain:

	; We're done, restore to previous state
	pop es
	pop ds
	popa
	popf

	; To return from an interrupt, we use the special iret instruction
	iret

Quite a bit of code for not doing much. As the code comments explain, the interrupt handler has to preserve any information in the registers it wants to use. For this example, we just save everything with a pushf instruction followed by pusha instruction, which puts the FLAGS register followed by all the general purpose registers (AX-DX, SI, DP, BP, SP) on the stack. Preserving flags in an ISR is extremely important since FLAGS is where things like comparison results are stored; if you corrupt FLAGS, it's completely possible that an application evaluates an "if" statement the wrong way and becomes a source of hard-to-impossible-to-find bugs.

ISRs are somewhat notorious in that they appear deceptively easy to code, and absolutely disastrous if you get it wrong. One of the major things to be aware of is that it's possible for an interrupt to be interrupted. For example, if your interrupt handler is running, and someone taps on the keyboard, the keyboard handler will preempt you. Depending on what you're doing, this might not be a problem, or it could "lock up" the computer. ISRs can turn interrupts on and off with the sti/cli instructions, but an all-too-common bug is forgetting to turn interrupts back on. Raymond Chen, a developer at Microsoft, wrote an entire chapter in his book "The Old New Thing" dedicated to the things that stupid applications do that Windows had to patch around — such as forgetting how to handle interrupts.

The second consequence is that ISRs should be reentrant. For those who are not hugely familiar with computer programming, reentrancy is the ability for a subroutine to be interrupted, then called again safely. For example, if you're listening to keyboard events, it's possible that two events can come at the same time and the second event preempts the first one. Bad Things(tm) happen if you have non-reentrant ISRs. The only reason this is a 'should' vs. a 'must' is that DOS itself is not reentrant; as the comment explains, you can't safely call a DOS interrupt from an ISR. DOS provides a special global flag known as INDOS to let callers know if it's safe to make an interrupt check; it was excluded above for brevity and because we used an unused interrupt.

The final common pitfall for DOS-based ISRs is it is possible for multiple TSRs to hook the same interrupt. For example, App A and App B can both decide they want the same interrupt. Depending on the application, it may chain interrupts down, or it may claim an interrupt entirely for itself. This can lead to infuriatingly complicated issues to debug if the other TSR is not well-behaved. Microsoft and IBM eventually provided built-in TSR multiplexing in DOS in the form of int 2F, but the API is extremely difficult to use and failed to solve many of the inherent issues.

The Stack and Calling Conventions

Let's take a momentary digression from TSRs to look at how functions work and how they interact with the stack. From an instruction perspective, Intel processors provide a "call" opcode which pushes the current instruction pointer to the stack, and then unconditionally jumps to a given location. It doesn't, however, define the behavior of how arguments are passed or the management of the stack. As such, developers have created conventions to specify how the stack and arguments should be passed from one function to another.

For non-programmers, the stack can be considered to be a "working space" where a program can store local variables and temporary information such as result values. In contrast to the heap, stacks are relatively small, and are essentially localized to a given function. For historical reasons, the stack grows 'down' from upper memory addresses to lower memory addresses. The stack pointer SP always points to the top of the stack. When information is pushed to the stack with a "push" operation, the value saved is stored in memory to the location pointed at SP and the register itself is decremented by the size. In contrast, deleting an item from the stack simply increments SP allowing new information to override the old.

For example, let's assume we have a C function with the following prototype:

// By default, most C compilers use the CDECL calling convention on x86
int example(int a, int b) {
	// We'll do stuff here
	return a+b;
}

Unlike most architectures, x86 defines multiple types of calling conventions. Of these, the most common are stdcall (used primarily by Windows), and cdecl (C Declaration). For the code I write, I'm sticking to the cdecl convention for my own sanity. cdecl is what's known as a "caller-based" convention, which means the calling function is responsible for cleaning up the stack at the end of a function. Here's what the calling code looks like in assembly:

example_call:
	mov ax, 4
	mov bx, 5

	; Arguments go in left to right
	push ax
	push bx

	; Under CDECL, names are decorated with a _ to indicate
	; they're a function, so example becomes _example
	call _example

	; Now we need to clean the stack up
	add sp, 4

	; The return value (9) comes back in ax
	; all other registers are smashed (aka their
	; values are not preserved into or out of the
	; function)

Fairly straight forward, right? Let's look at how this function might be implemented so we can discuss the base pointer (BP) as well. Here's what _example looks like:

_example:
	; Setup stack frame
	push bp
	mov bp, sp

	; The stack now has the following layout
	; bp[+2] stack frame
	; bp[+4] int a
	; bp[+6] int b

	; Move values from the stack to registers
	mov ax, [bp+4]
	mov bx, [bp+6]
	add ax, bx

	pop bp
	ret

BP, or the base pointer, can be considered a reference point for where each function begins and ends. Whenever we enter or leave a function, the base pointer forms the base of the stack for that function (hence the name). These reference points are known as stack frames, and since every function copies SP (which always points to the top of the stack) to BP, you can always tell where you are relative to other functions. Debuggers, for example, walk the stack to determine where they currently are by comparing the values of BP to known offsets.

Near and Far Calls

Before we leave the topic of calling conventions, the final point to bring up are near and far calls. In the previous article, I discussed that 16-bit processors can only reference up to 64 kilobytes of memory directly at any given time. As such, if you need to reference code or data outside that 64k window, you need to change the segment so it's pointing in the right location.

For functions, code that's within the same segment is known as a near call. Near calls are equivalent to normal function calls on most other architectures. Far calls in contrast include the required segment, and load CS as part of the function call. Far calls are made by the "call far" instruction, and require the called function to use the "retf" instruction to indicate they need to return far. Far calls have a fairly high performance hit due to the segment change, and thus should be limited as much as possible

In the previous interrupt handler example code, we saw that we had to do a far call to chain to previous TSRs. The reason for this is that interrupt service handling is essentially a special case of a far call; the processor has to change to the ISR's segment in memory. When we chain to another interrupt handler, we have to do the same thing. If you're still confused, the following example will clear things up.

TSRs In Action

So now that we have the basis of TSRs in our heads, let's look at how they're managed and installed by the operating system. To do that, we need an actual DOS installation. While TSRs do work in DOSbox, DOSbox has some unusual quirks with its environment that make it not 100% accurate to actual DOS (for example, all interrupts have an installed default handler; FreeDOS at least does not do this).

Installing FreeDOS

Fortunately for free software, FreeDOS exists which is a (mostly) compatible free software re-implementation of DOS 5. Installation is pretty much identical to what DOS 5 would have been like if it was shipped on a CD vs. floppy disks

*

The CD is bootable, and starting it up in VirtualBox brings up this boot menu.

*

The installer offers to start FDISK to create a boot partition. Users of MS-DOS FDISK should find this more or less identical to the standard FDISK.COM

* *

After which DOS installation takes a few minutes, and then promptly crashes. For reasons I can't figure out, the included JemmEx memory extender refuses to work under VirtualBox. Fortunately, EMM386 is happy to do the job, and after a quick reboot, I get dumped to C:\

After configuring WatTCP, and firing up the built in FTP server, I can copy my TSRs over without issue. Of course, given that DOS uses VESA graphics, I can't copy and paste. Fortunately for my sanity, FreeDOS (and MS DOS) support redirecting the terminal with the CTTY command. After a little bit of fiddling with VirtualBox's settings, I get this:

*

Copy and paste for the win. Anyway, now that we have a decent to use testing environment, let's get into the practical aspect of this.

DOS Memory Layout

After doing a clean reboot of the system, FreeDOS reports the following as its memory usage:

C:\>mem

Memory Type       Total       Used     Free
--------------- ---------  -------- --------
Conventional         639K       50K     589K
Upper                 36K       31K       5K
Reserved             349K      349K       0K
Extended (XMS)    31,680K    5,626K  26,054K
---------------- --------  -------- --------
Total memory      32,704K    6,056K  26,648K

Total under 1 MB     675K       81K     594K

Total Expanded (EMS) 31M (32,571,392 bytes)
Free Expanded (EMS)  25M (26,705,920 bytes)

Largest executable program size   589K (602,672 bytes)
Largest free upper memory block     4K ( 4,096 bytes)
FreeDOS is resident in the high memory area.
C:\>

Lots of numbers, right? We'll do a more in-depth article about the types of memory, but let's do a brief primer here so that the output can be understood. Let's break these down step by step

Conventional

Conventional memory is what applications in DOS generally have available and refers to the lower 640k of the 1 MiB address space. Anything operating in real mode has to fit in this memory area. FreeDOS reports a total of 639k because a very small chunk of RAM at 0x0000 has to be reserved for the processor's interrupt tables, as well as a small part of COMMAND.COM that has to stay resident at all times to aid things like LOADALL. On this specific system, I have a few TSRs already installed to provide network services which is why a 50k block of conventional memory is already used.

Upper/Reserved

Above the 640k line is what's referred to as the "upper memory area", or UMA and is reserved by DOS. The upper memory area also has things like the monochrome and VGA memory buffers, as well as option ROMs, the DOS kernel, and the BIOS shadow map. Normally, this region of memory shouldn't be used by applications, but due to the fact that conventional memory can get very crowded, on most systems there are small but usable sections of memory in these areas, known as UMA blocks. A memory manager can determine which blocks are safe to use, and load applications or data into these chunks, a process known as "loading high". When we get into hiding our TSR, use of upper memory will become very important

Extended Memory

Memory that exists above 1M+64k (that 64k is special, see below), and cannot be directly accessed by real mode. Because neither DOS nor the BIOS can operate in 32-bit/protected mode, and that the 80286 processor could not easily switch from protected mode to real mode, accessing memory above the 1 MiB barrier required various amounts of trickery. Extended memory can extend up from 1 MiB to 4 GiB (which is the architectural limit of 32-bit processors). Accessing extended memory either requires entering protected mode, tricking the processor into unreal mode (which on the 80286 required the LOADALL instruction to put the processor in an invalid state), or using a BIOS service which did one of the previous two options to exchange blocks with conventional memory.

High Memory Area (not shown)

One important line to look at is "FreeDOS is resident in the high memory area." I've stated multiple times that 1 MiB is the limit of what Intel processors can address. As it turns out, this is only a partial truth. Remember that addressing in real mode is done in the form of segment:offset. So what happens if I load a segment value of FFFF?. Well it turns out we can address an additional 64 kilobytes of RAM beyond the 1 MiB barrier. This is known as the high memory area.

Due to many quirks related to the abomination known as A20 (which will get an entire section in the next article), the high memory area requires special rules and methods to access. The short version is that unless you have a memory manager, or are willing to manipulate the A20 line directly (which is dangerous), the HMA is not usable by general applications. We'll look more at this in a future article.

TSR Loading

So with that all out of the way, let's look at how a TSR is loaded. In the github repository, there's an example TSR known as tsr_example which, when loading, prints out the segment registers and the segment:offset of the next hook in memory. It's combined with a "callhook" program that simply runs int 0x66 to invoke it. So let's load it and see what happens:

C:\> tsr_demo
DOS loaded the COM with this:
CS: 0C9C
DS: 0C9C
SS: 0C9C
 

When our TSR is loaded, it reads and dumps out the segment registers, showing DOS loaded us at 0C9C. For COM files (or any executable that is 'tiny'), CS=DS=SS. When DOS loads a COM executable, the entire thing is copied into memory, CS/DS/SS are set to the execution point, and the process far calls to CS:0100 to begin execution. If we check our memory usage, we can see that it has dropped:

C:\>mem

Memory Type         Total     Used     Free
---------------- -------- -------- --------
Conventional         639K     115K     524K

NOTE: It shouldn't be using 50 kiB of RAM per run; the binary is only 324 bytes! I think I'm calculating the paragraphs-to-preserve number wrong, but I didn't get a chance to fix it by time this article went up. If someone wants to look at the code, check tsr_examine.asm; the TSR int call is at the very bottom of the file and based off example code I found elsewhere.

If we run callhook, we can determine that our TSR in fact installed successfully, and the previous hook is at 0000:0000 (which is skipped over).

C:\>callhook
CS: 0C9C
DS: 0C9C
SS: 1CB6
Previous hook is at 0000:0000

Note that SS is different. When a TSR is invoked (in this case by doing int 0x66), it inherits the running state of whatever application that was running at the time. It's the responsibility of the TSR to put the stack back the way it found it when it exits, else you'll cause random corruption in userspace applications.

Now lets look at see what happens if we invoke our TSR multiple times:

C:\>tsr_demo
DOS loaded the COM with this:
CS: 1CB6
DS: 1CB6
SS: 1CB6
C:\>tsr_demo
DOS loaded the COM with this:
CS: 2CD0
DS: 2CD0
SS: 2CD0
C:\>tsr_demo
DOS loaded the COM with this:
CS: 3CEA
DS: 3CEA
SS: 3CEA

With each load, we're loading higher in memory. DOS does not automatically rebase or relocate TSRs; they stay at whatever memory segment they were in when they terminated. As DOS automatically loads COM files as low as possible, each run is loaded at the next "available" section of RAM. Calling mem shows that our available conventional memory has dropped

C:\>mem

Memory Type Total Used Free
---------------- -------- -------- --------
Conventional 639K 308K 331K

So what now happens if we run callhook?

C:\>callhook
CS: 3CEA
DS: 3CEA
SS: 4D04
Previous hook is at 2CD0:0103
CS: 2CD0
DS: 2CD0
SS: 4D04
Previous hook is at 1CB6:0103
CS: 1CB6
DS: 1CB6
SS: 4D04
Previous hook is at 0C9C:0103
CS: 0C9C
DS: 0C9C
SS: 4D04
Previous hook is at 0000:0000

We chain through each version of the TSR, easily visible by CS/DS changing as we go upwards until we reach the 'stop' at 0000:0000. At this point, I think we have a fairly good grasp on how TSRs work in practice, what DOS gives us, and how interrupts work more in-depth. At this point, this article is already past the 4k word mark, so I'm going to cut this off here before the editors stage a revolution. So let me close this off with the fact I need some help with the community.

Help Wanted

As I mentioned in Part 1, for getting the keylogged data out of the system, I'm interested in using a non-TCP/IP based protocol. Up until the mid-90s, IPX and NetBIOS-only networks were still relatively common, and it wasn't until the domination of the 'modern' internet that TCP/IP became ubiquitous. After considerable amounts of research, I've decided that fitting in the theme of 'unusual yet neat', I'd like to extract the data out using AX.25 and ham radio equipment. The other alternative I may do is using IPX, as I found the original DOOM source code actually has a complete IPX driver on it. As of right now, I'm somewhat torn between doing this with AX.25 or IPX. The thing is though, I'm going to need some help to make AX.25-based keylogger a reality.

The use of standard radio would allow the keylogger to work on air-gapped computers and show how a potential exfiltration of data might have been done in environments predating TCP/IP. It would be fairly easy to modify a standard PC to hide a 2m or 70cm transmitter within the case and connect to it via I/O lines in an early form of the NSA's current Tailored Access Operations. It would also mean that the keylogger itself would be fairly useless for use in real-life which aids the goal of preventing proliferation of attack tools.

The problem is right now, I have a serious lack of equipment. While I'm a licensed ham in the United States (KD2JRT/Technician), the only equipment I have are two Baofeng UV-82s. What I need is to figure out a decent way to handle getting data broadcasted. I know it's at least theoretically possible to build a cable to hook the Baofengs up to a computer's mic/sound in, and use a software TNC (Terminal-Node Connector) to do AX.25. By doing so, I could simply connect the TNC to VirtualBox's serial port emulation, and “blamo”, AX.25 for DOS.

What I need from the community is two-fold:

  • Experience with doing AX.25 data in real life
  • Help building the necessary cables *or* loaning radio equipment with hardware TNCs on it

I'm currently in New York City for the foreseeable future. I could potentially build cables for my Baofengs myself but I don't currently have a soldering iron, and my living situation makes it rather difficult to do electronics work here. Depending on pricing, I can probably cover shipping and handling, or compensate out-of-pocket work done by a community member. If you're interested in helping, post a comment or send me an email (mcasadevall@soylentnews.org), and I'll be in touch.

Finally, if you've enjoyed this article, please consider subscribing or gifting a subscription. No account is required, as you can anonymously gift subscriptions to my alt-account, mcasadevall (6). I'm hoping we can raise enough money to fully pay off the stakeholders of the site, and perhaps get a small budget together to let me dedicate more time to content like this, or buy equipment to explore more obscure pieces of hardware (i.e., digging into doing some INIT coding on classic Mac OS, or something of that nature). I'd like to give thanks to all those subscribed, including jimtheowl after the previous article.

And with that, 73 de NCommander!

posted by cmn32480 on Friday September 16, @10:23AM   Printer-friendly
from the blame-the-mighty-buzzard dept.

We have been informed by Linode (which hosts our servers) that there is some hardware maintenance being performed tonight. The impacted servers are 'fluorine' and 'neon'. Here is the message we received:

Linode continuously monitors the health of our equipment and we've been alerted to a condition which affects the physical server on which your Linode is hosted. While we have determined that this is not an emergency, this should be addressed in order to optimize the performance of your Linode. We have scheduled a maintenance window for the physical server on which your Linode is hosted:

Friday, September 16, 2016 at 1:00 AM EDT (5:00 AM UTC)

Downtime from this maintenance is expected to be no more than 1 hour. Please note, however, that the entire maintenance window may be required. Your Linode will be gracefully powered down and rebooted during the maintenance. Services not configured to start on a reboot will need to be manually started. If this time frame does not work for you, you have the option of migrating to another host which has these settings enabled.

Thanks to redundancy between our front end and database servers, our main site should remain functional. There will, however, be some minor inconveniences. During this period:

  • we will not be able to process credit card payments,
  • comment counts will not update, and
  • emails and web notifications will not go out.

We appreciate your understanding and patience while the servers are being serviced.

We anticipate most of the affected services should auto-restart; those that do not will be addressed starting around 0600 CDT (0700 EDT / 1100 UTC).

UPDATE: All is shiny and happy again.


Original Submission

posted by NCommander on Tuesday August 30 2016, @12:14PM   Printer-friendly
from the int-21h-is-how-cool-kids-did-it dept.

I've made no secret that I'd like to bring original content to SoylentNews, and recently polled the community on their feelings for crowdfunding articles. The overall response was somewhat lukewarm mostly on dividing where money and paying authors. As such, taking that into account, I decided to write a series of articles for SN in an attempt to drive more subscriptions and readers to the site, and to scratch a personal itch on doing a retro-computing project. The question then became: What to write?

As part of a conversation on IRC, part of me wondered what a modern day keylogger would have looked running on DOS. In the world of 2016, its no secret that various three letter agencies engage in mass surveillance and cyberwarfare. A keylogger would be part of any basic set of attack tools. The question is what would a potential attack tool have looked like if it was written during the 1980s. Back in 1980, the world was a very different place both from a networking and programming perspective.

For example, in 1988 (the year I was born), the IBM PC/XT and AT would have been a relatively common fixture, and the PS/2 only recently released. Most of the personal computing market ran some version of DOS, networking (which was rare) frequently took the form of Token Ring or ARCNet equipment. Further up the stack, TCP/IP competed with IPX, NetBIOS, and several other protocols for dominance. From the programming side, coding for DOS is very different that any modern platform as you had to deal with Intel's segmented architecture, and interacting directly with both the BIOS, and hardware. As such its an interesting look at how technology has evolved since.

Now obviously, I don't want to release a ready-made attack tool to be abused for the masses especially since DOS is still frequently used in embedded and industry roles. As such, I'm going to target a non-IP based protocol for logging both to explore these technologies, while simultaneously making it as useless as possible. To the extent possible, I will try and keep everything accessible to non-programmers, but this isn't intended as a tutorial for real mode programming. As such I'm not going to go super in-depth in places, but will try to link relevant information. If anyone is confused, post a comment, and I'll answer questions or edit these articles as they go live.

More past the break ...

Looking At Our Target

Back in 1984, IBM released the Personal Computer/AT which can be seen as the common ancestor of all modern PCs. Clone manufacturers copied the basic hardware and software interfaces which made the AT, and created the concept of PC-compatible software. Due to the sheer proliferation of both the AT and its clones, these interfaces became a de-facto standard which continues to this very day. As such, well-written software for the AT can generally be run on modern PCs with a minimum of hassle, and it is completely possible to run ancient versions of DOS and OS/2 on modern hardware due to backwards compatibility.

A typical business PC of the era likely looked something like this:

  • An Intel 8086 or 80286 processor running at 4-6 MHz
  • 256 kilobytes to 1 megabyte of RAM
  • 5-20 MiB HDD + 5.25 floppy disk drive
  • Operating System: DOS 3.x or OS/2 1.x
  • Network: Token Ring connected to a NetWare server, or OS/2 LAN Manager
  • Cost: ~$6000 USD in 1987

To put that in perspective, many of today's microcontrollers have on-par or better specifications than the original PC/AT. From a programming perspective, even taking into account resource limitations, coding for the PC/AT is drastically different from many modern systems due to the segmented memory model used by the 8086 and 80286. Before we dive into the nitty-gritty of a basic 'Hello World' program, we need to take a closer look at the programming model and memory architecture used by the 8086 which was a 16-bit processor.

Real Mode Programming

If the AT is the common ancestor of all PC-compatibles, then the Intel 8086 is processor equivalent. The 8086 was a 16-bit processor that operated at a top clock speed of 10 MHz, had a 20-bit address bus that supported up to 1 megabyte of RAM, and provided fourteen registers. Registers are essentially very fast storage locations physically located within the processor that were used to perform various operations. Four registers (AX, BX, CX, and DX) are general purpose, meaning they can be used for any operation. Eight (described below) are dedicated to working with segments, and the final registers are the processor's current instruction pointer (IP), and state (FLAGS) An important point in understanding the differences between modern programming environments and those used by early PCs deals with the difference between 16-bit and 32/64-bit programming. At the most fundamental level, the number of bits a processor has refers to the size of numbers (or integers) it works with internally. As such, the largest possible unsigned number a 16-bit processor can directly work with is 2 to the power of 16 (minus 1) or 65,535. As the name suggests, 32-bit processors work with larger numbers, with the maximum being 4,294,967,296. Thus, a 16-bit processor can only reference up to 64 KiB of memory at a given time while a 32-bit processor can reference up to 4 GiB, and a 64-bit processor can reference up to 16 exbibytes of memory directly.

At this point, you may be asking yourselves, "if a 16-bit processor could only work with 64 KiB RAM directly, how did the the 8086 support up to 1 megabyte?" The answer comes from the segmented memory model. Instead of directly referencing a location in RAM, addresses were divided into two 16-bit parts, the selector and offset. Segments are 64 kilobyte selections of RAM. They could generally be considered the computing equivalent of a postal code, telling the processor where to look for data. The offset then told the processor where exactly within that segment the data it wanted was located. On the 8086, the selector represented the top 16-bits of an address, and then the offset was added to it to create 20-bits (or 1 megabyte) of addressable memory. Segments and offsets are referenced by the processor in special registers; in short you had the following:

  • Segments
    • CS: Code segment - Application code
    • DS: Data segment - Application data
    • SS: Stack segment - Stack (or working space) location
    • ES: Extra segment - Programmer defined 'spare' segment
  • Offsets
    • SI - Source Index
    • DI - Destination Index
    • BP - Base pointer
    • SP - Stack pointer

As such, memory addresses on the 8086 were written in the form of segment:offset. For example, a given memory address of 0x000FFFFF could be written as F000:FFFF. As a consequence, multiple segment:offset pairs could refer to the same bit of memory; the addresses F555:AAAF, F000:FFFF, and F800:7FFF all refer to the same bit of memory. The segmentation model also had important performance and operational characteristics to consider.

The most important was that since data could be within the same segment, or a different type of segment, you had two different types of pointers to work with them. Near pointers (which is just the 16-bit offset) deal with data within the same segment, and are very fast as no state information has to be changed to reference them. Far pointers pointed to data in a different selector and required multiple operations to work with as you had to not only load and store the two 16-bit components, you had to change the segment registers to the correct values. In practice, that meant far pointers were extremely costly in terms of execution time. The performance hit was bad enough that it eventually lead to one of the greatest (or worst) backward compatibility hacks of all time: the A20 gate, something which I could write a whole article on.

The segmented memory model also meant that any high level programming languages had to incorporate lower-level programming details into it. For example, while C compilers were available for the 8086 (in the form on Microsoft C), the C programming language had to be modified to work with the memory model. This meant that instead of just having the standard C pointer types, you had to deal with near and far pointers, and the layout of data and code within segments to make the whole thing work. This meant that coding for pre-80386 processors required code specifically written for the 8086 and the 80286.

Furthermore, most of the functionality provided by the BIOS and DOS were only available in the form of interrupts. Interrupts are special signals used by the process that something needs immediate attention; for examine, typing a key on a keyboard generates a IRQ 1 interrupt to let DOS and applications know something happened. Interrupts can be generated in software (the 'int' instruction) or hardware. As interrupt handling can generally only be done in raw assembly, many DOS apps of the era were written (in whole or in part) in intel assembly. This brings us to our next topic: the DOS programming model

Disassembling 'Hello World'

Before digging more into the subject, let's look at the traditional 'Hello World' program written for DOS. All code posted here is compiled with NASM

; Hello.asm - Hello World

section .text
org 0x100

_entry:
 mov ah, 9
 mov dx, str_hello
 int 0x21
 ret

section .data
str_hello: db "Hello World",'$'

Pretty, right? Even for those familiar with 32-bit x86 assembly programming may not be able to understand this at first glance what this does. To prevent this from getting too long, I'm going to gloss over the specifics of how DOS loads programs, and simply what this does. For non-programmers, this may be confusing, but I'll try an explain it below.

The first part of the file has the code segment (marked 'section .text' in NASM) and our program's entry point. With COM files such as this, execution begins at the top of file. As such, _entry is where we enter the program. We immediately execute two 'mov' instructions to load values into the top half of AX (AH), and a near pointer to our string into DX. Ignore 9 for now, we'll get to it in a moment. Afterwords, we trip an interrupt, with the number in hex (0x21) after it being the interrupt we want to trip. DOS's functions are exposed as interrupts on 0x20 to 0x2F; 0x21 is roughly equivalent to stdio in C. 0x21 uses the value in AX to determine which subfunction we want, in this case, 9, to write to console. DOS expects a string terminated in $ in DX; it does not use null-terminated strings like you may expect. After we return from the interrupt, we simply exit the program by calling ret.

Under DOS, there is no standard library with nicely named functions to help you out of the box (though many compilers did ship with these such as Watcom C). Instead, you have to load values into registers, and call the correct interrupt to make anything happen. Fortunately, lists of known interrupts are available to make the process less painful. Furthermore, DOS only provides filesystem and network operations. For anything else, you need to talk to the BIOS or hardware directly. The best way to think of DOS from a programming perspective is essentially an extension of the basic input/output functionality that IBM provided in ROM rather than a full operating system.

We'll dig more into the specifics on future articles, but the takeaway here is that if you want to do anything in DOS, interrupts and reference tables are the only way to do so.

Conclusion

As an introduction article, we looked at the basics of how 16-bit real mode programming works and the DOS programming model. While something of a dry read, it's a necessary foundation to understand the basic building blocks of what is to come. In the next article, we'll look more at the DOS API, and terminate-and-stay resident programs, as well as hooking interrupts.