Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by NCommander on Tuesday July 14 2015, @04:00PM   Printer-friendly
from the replacing-2000s-tech-with-early-80s-tech dept.

Most system administrators working with a large number machines will be at least passingly familiar with LDAP, or it's Microsoft's incarnation as Active Directory. Like most organizations, we used LDAP to organize shell account information for SN's backend servers, and spent the last year and a half cursing because of it. As such, we've recently replaced LDAP with a much older technology known as Hesiod, which is a DNS-based system of storing user accounts and other similar information. Given Hesiod's unique history (and relative obscurity), I though it would be interesting to write a review and detailed history of this relic, as well as go more in-depth why we migrated.

In this novel:

  • Why We Dumped LDAP
  • Project Athena
  • Overview of Hesiod
  • Drawbacks
  • In Closing

Read past the break for a look at this piece of living history.

Why We Dumped LDAP

One of the golden rules of system administration is "if it ain't broke, don't fix it". Given that LDAP is generally considered critical infrastructure for sites that depend on it, its worth spending a few moments explaining why we replaced it. Our LDAP backend was powered by OpenLDAP, which is generally the de facto standard for LDAP servers on Linux. In our experience though, OpenLDAP is extremely difficult to configure due to storing its configuration information within the LDAP tree itself (under cn=config), and being incredibly difficult to examine its current state, as well as recovering from any misconfiguration. In practice, I found it necessary to dump the entire LDAP configuration, modify the raw LDIF files, and then reimport with slapcat, and then pray. Painful, but manageable since, in practice, the overall server configuration shouldn't change frequently.

Unfortunately, every aspect of OpenLDAP has proven to be painful to administer. In keeping with the idea that none of our critical infrastructure should have single points of failure, we established replica servers from our master, and configured client systems to look at the replicas in case the master server take a dive (or is restarting). While a noble idea, we found that frequently without warning or cause, replication would either get out of sync, or simply stop working all together with no useful error messages being logged by slapd. Furthermore, when failover worked, systems would start to lag as nss_ldap kept trying to query the master for 5-10 seconds before switching to the slave for each and every query. As a whole, the entire setup was incredibly brittle.

While many of these issues could be laid at OpenLDAP (vs. LDAP itself as a protocol), other issues compounded to make life miserable. While there are other LDAP implementations such as 389 Directory Server, the simple fact of the matter is that due to schema differences, no two LDAP instances are directly compatible with each other; one can't simply copy the data out of OpenLDAP and import it directly into 389. The issue is further compounded if one is using extended schemas (as we were to store SSH public keys). As such, when slapd started to hang without warning, and without clear indication as of why, the pain got to the point of looking for a replacement rather than keep going with what we were using.

As it turns out, there are relatively few alternatives to LDAP in general, and even fewer supported by most Linux distributions. Out of the box, most Linux distributions can support LDAP, NIS, and Hesiod. Although NIS is still well supported by most Linux distributions, it suffers from security issues, and many same issues with regards to replication and failover. As such, I pushed to replace LDAP with Hesiod, which was originally designed as part of Project Athena.

Project Athena

Hesiod was one of the many systems to originate out of Project Athena, a joint project launched between MIT, DEC, and IBM in the early 80s to create a system of distributed computing across a campus, eventually terminating in 1991. Designed to work across multiple operating systems, and architectures, the original implementation of Athena laid out the following goals:

  • To develop computer-based learning tools that are usable in multiple educational environments
  • Establish a base of knowledge for future decisions about educational computing.
  • Create a computational environment supporting multiple hardware types
  • Encourage the sharing of ideas, code, data, and experience across MIT

As such, work coming from Project Athena was released as free-and-open source software, and provided a major cornerstone in early desktop and networking environments that are commonly in use today such as X Windows, and Kerberos.

As of 2015, 34 years after Athena was started, its underlying technology is still at MIT today, in the form of DebAthena.

Overview of Hesiod

Moving away from the history, and onto the actual technology itself, as indicated above, Hesiod is based in DNS, and takes the form of TXT records (the TXT record type itself was designed for Hesiod, as was the HS class). A sample Hesiod record for a user account looks like this:

mcasadevall.passwd      IN TXT          "mcasadevall:*:2500:2500:Michael Casadevall:/home/mcasadevall:/bin/bash"
2500.uid                IN CNAME        mcasadevall.passwd
mcasadevall.grplist     IN TXT          "sysops:2501:dev_team:2503:prod_access:2504"

For those familiar with the format of /etc/passwd, the format is obvious enough. Out of the box, hesiod supports distributing users and groups, printcap records (for use with LPRng), mount tables, and service locatator records. With minor effort, we were also able to get it to support SSH public keys. Since Hesiod is based on DNS, data can be replicated via normal zone transfers, as well as updated via dynamic DNS updates. Since DNS is not normally enumerable in normal operation, CNAME records are required to allow lookups for ids to be successful.

New types of records can be created by simply adding a new TXT record. For instance, for each user, we encode their SSH public keys as a (username).ssh TXT record. The standard hesinfo can properly query and access these records, making it easy to script:

mcasadevalllithium~$ hesinfo mcasadevall ssh
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA4T3rFl8HondKnGq3+OEAoXzhsZL3YyzRIMCFQeD6aLLHCoVGAwUs3cg7bqUVshGb3udz5Wl/C4ym1aF5Uk5xaZWr2ByKZG6ZPFQb2MZbOG+Lcd5A14gSS2+Hw6+LIoMM8u6CJvIjbTHVI2wbz/ClINDEcJC0bh+YpuaKWyt2iExHATq153ST3dih+sDDK8bq6bFMKM8sdJHl9soKGo7V7i6jIn8E84XmcdTq8Gm2gt6VhOIb/wtr1ix7nxzZ7qCxAQr//FhJ8yVsmHx7wRwkndS7muPfVlVd5jBYPN74AvNicGrQsaPtbkAIwlxOrL92BsS6xtb+sO2iJYHK/EJMoQ== mcasadevall@blacksteel

As such, Hesiod is easy to expand, and provides both command line applications, and the libhesiod API to both query and expand the information Hesiod is able to deliver, and can be deployed to any environment where a sysadmin can control DNS records. As of writing, a set of utilities to integrate and easily manage Hesiod on Amazon EC2's DNS Service (known as Route53) exist in the form of Hesiod53.

Drawbacks to Hesiod

Hesiod inherits several drawbacks due to being based upon DNS. Primarily, it can be affected by various cache poisoning attacks, or hijacking upstream DNS servers. These weaknesses can be mitigated by implementation of DNSCurse or client-side validation of DNSSEC records (standard DNSSEC does not autheticate the "last mile" for DNS queries). Like NIS, if password hashes are stored in Hesiod, they're world-readable, and vulnerable to offline analysis; for this reason, Hesiod should be deployed alongside Kerberos (and pam_krb5) for secure authentication of users and services. At SN, we've been using Kerberos since day 1 for server-to-server communication (and single-sign on for sysadmins), so this was trivial for us. Other organizations may have more difficulty.

Furthermore, under normal circumstances, DNS records can not be enumerated, and nss_hesiod will not provide any records if an application queries for a full list of users (for example, getent passwd on a shell will only return system local users). This may break some utilities who are dependent on getting a full list of users, though in over a month of testing on our development system (lithium), we weren't able to find any sort of breakage.

Finally, although this problem is not inherent to Hesiod, at least on Linux systems, attempts to query users not in /etc/passwd can hang early boot for several minutes. The same issue manifests itself with use of nss_ldap and SSSD. As of writing, we have not determined a satisfactory workaround for the problem, but as our core services are redundant and support automatic failover, a 5-10 minute restart time isn't a serious issue for us.

Finally, although most UNIX and UNIX-likes support Hesiod, there's no support for it on Windows or Mac OS X.

In Closing

Due to its ease of use, we're expectant that Hesiod will drastically reduce the pain of system administration, and removes a service that has proven to be both problematic, and overly complex. While I don't expect a major upswing in Hesiod usage, in practice, it works very well in cloud environments, and for those who find the use of LDAP painful, I highly recommend you experiment in evaluating it as long as one is mindful of it's limitations

I hope you all enjoyed this look at this rather obsecure, but interesting piece of history, and if people are interested, I can be tempted to write more articles of this nature.

~ NCommander

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Informative) by VLM on Wednesday July 15 2015, @02:46PM

    by VLM (445) Subscriber Badge on Wednesday July 15 2015, @02:46PM (#209396)

    Ah OK. Traditionally password files and LDAP and all that were used mostly for login passwords. You know, for keyboard interactive telnet and ssh sessions. Like 90% of the world's login password is "Password" and we need to store that someplace.

    Now a days its all about the kerberos to do actual hand typed password stuff, and between machines its all about ssh shared keys so no password is involved. If my id_rsa.pub is in your authorized_keys I can log in no password required.

    So ironically the passwd file is all about all the demographic data that is everything but passwords. Like the numeric filesystem UID for VLM is 1003 here, or VLM user enjoys /usr/local/bin/bash more so than /bin/dash as his login shell, or /etc/groups says vlm is part of wheel group and sudoers file says wheel group members get to have their way, etc. Hand typed passwords all live in /etc/shadow since the 90s anyways which is another long story.

    The other question you had was update sync and frequency. Well thats pretty much what puppet and friends live to do, all day long, like their primary purpose is someone changes a file on the puppetmaster and within 15 minutes every machine has that new file, where the new file may very well be /etc/passwd or /etc/groups or whatever. There are other less ... forcible ways to have puppet modify passwd and group files than just overwriting files, but KISS principle, here's the golden copy of the password file now copy it everywhere.

    You'd be surprised how rarely someone changes their homedir or login shell so its not like you have to update /etc/passwd very often. Literal hand typed passwords are changed in kerberos. "between two machines" logins are all ssh key based and have nothing to do with /etc/passwd shadow or group beyond like "where is vlm's homedir so I can look at his .ssh/authorized_keys to see if "he" can log in here?" This is an interesting way to break into a non-SSL secured LDAP distributed system BTW (bad guy MITM the LDAP and says uh, sure, just today vlm's homedir is /autofs/badguy/naughty which happens to have bad guy's ssh key info not mine). Another hilarious one is a "special" LDAP response by an attacker that says today VLM's UID is going to be zero, yeah that guy.. So secure your ldap!

    Stereotypical puppet config manager style is you overwrite local changes. That's kind of the point of centralized configuration management. Its considered "good form" or "polite" to put some kind of comment at the top of any file distributed by puppet something like "# This file is distributed by puppet" and if you can chmod it to be read only to kind of drive home the point, then do so. You'll still end up with people experimenting and then wondering why all their manual changes revert at the quarter hour mark.

    I suppose it depends on cluster purpose. One cluster with one user doing one parallel thing (rendering frames of 3d movies or whatever) doesn't have the issues that some kind of institutional general purpose 1000s of different users educational cluster would have.

    Starting Score:    1  point
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3