Stories
Slash Boxes
Comments

SoylentNews is people

posted by NCommander on Wednesday July 03 2019, @11:00AM   Printer-friendly
from the ada-is-unfortunately-obsecure dept.

So here's a question for the SoylentNews community: do you scrap a working prototype that served as your proof of concept in favor of recoding it in a more mainline programming language? Let me provide some context:

A few months ago, I began work on writing a comprehensive system on proofing and validating the behavior of DNS recursive resolver known as DNSCatcher. To briefly summarize, DNSCatcher acts as a cross-check system for DNS to validate if the records returned by your DNS server actually match reality. Due to the design and implementation of the DNS protocol, recursive resolvers such as your router's or ISP can essentially lie about the status and contents of any DNS record it chooses, in part due to limitations of the current implementation of DNSSEC. I presented some of my initial work at both Internet Freedom Festival 2019, and more in passing at the ICANN 65 Meeting and got unexpectedly high positive responses. As such, I'm currently in the process of working to try to secure funding and bring this work forward in both a sustainable manner and with the hopeful intent of eventually standardizing the protocol and mechanisms as an RFC. My current proof of concept is on github here.

Now, the part I need to tackle is: do I keep going with what I have or restart from scratch. The proof of concept was written in the Ada programming language, which I am rather concerned will drastically limit community involvement and uptake. I choose Ada for a very specific reason when coding this, but as I feel I need an outside opinion on the best course to take before pushing on, hence this post. Past the fold, I'll go into my reasons for coding it in Ada, the alternatives I'm considering, and hopes of further feedback from the community.

So why did I choose Ada? Well, when I started the project, I had the following requirements I wanted to meet.

  • High Performance with Low Overhead
  • Ease of Deployment and Distribution
  • Scalability to the level of BIND or better in processing DNS requests in real time
  • Realistically deployable on embedded devices such as routers
  • As a single person project (as of writing), I wanted the following attributes from my programming language:
    • Strong typing
    • Reduced or eliminated requirements on using pointers or raw memory access to reduce risk of introducing security exploits
    • Excellent support for asynchronous multithreading.
    • Verification and strong static analysis tooling (in Ada's case gnat check + SPARK)
    • High level of abstraction of basic system primitives to ease in portability
    • High level of difficulty in accidentally blowing a foot off

Ada is designed for high-reliability systems, and the most common implementation is GNAT, which is built around GCC, which makes it a highly portable language itself. Furthermore, Ada's core language has extremely powerful multitasking and object oriented features out of the box. It also incorporates formal validation and programming-by-contract features to help ensure programs are bug free. During the development of the prototype, I felt very justified in my choice as Ada made implementing some of the more difficult aspects of a DNS server relatively trivial. It was also powerful enough to handle the mess that is the DNS wire protocol with a relatively minimum of fuss and only of a handful of forced casts in the entire codebase; a good mark in making sure your code is both correct and preventing stupid mistakes.

However, Ada is unfortunately obscure in the land of software development, which means its an uphill battle to attract programmers to contribute to it. It also suffers a distinct lack of native libraries, and thus for some functionality, it's required to bridge out to C APIs (which Ada does make relatively straightforward). It also doesn't help that the Ada compilers in some distributions (such as Ubuntu) are buggy out of the box and can fail in unusual ways. Since I want this project to hopefully go mainstream, I now need to face the difficult question if I should push through or scrap my current code and rewrite it, and if so, what to write it in.

My chief considerations taking into account the criteria I listed above has made me look at the following options: C, C++, Go, or Rust. To a lesser extent, I'm also considering Java, despite the fact that the presence of the JVM would essentially preclude deployment on embedded devices and have a higher memory cost. These all have a lot of upsides and downsides, and I'll share my thoughts on each. I'm open to other options that the community may put forward but ATM these are the languages I'm leaning strongest towards. Let me write my thoughts on each, and maybe the community can help me make a decision on what to do.

Starting from the top, let's look at C:

C has a reputation as "portable assembler" which is well earned. It is also the programming language I have the most experience with. It is also extremely commonly used by FOSS software, and most major DNS software is written in C such as BIND, Unbind, and dnsmasq. As such, the barrier to entry for community contributions and adoption is relatively low. Everything can run C code, but C itself isn't very portable without a lot of work, and the standard library is bare bones to say the least. Even simple acts of opening sockets require #ifdef's for Windows (to initialize Winsocks) vs. Linux, and multithreading is both non-standardized and can be complex. Furthermore, since C operates so closely at a bare metal level, a single mistake can be a security vulnerability, which is painful. On top of that, C has very poor support for Unicode, making supporting internationalized domain names (IDNs) much more difficult than necessary. Furthermore, working with SQL databases from C is a frustrating experience due to how C handles strings, and any attempts to create a web interface on top of DNSCatcher would almost certainly be better done in any other programming language. My opinion is that C is a relatively poor fit for this project, but I can't rule it out entirely because it also has the lowest barrier of entry for bringing others on board, and the easiest to deploy once ported to a given environment.

C++ is my least favorite option, but deserves consideration as well:

Almost everything I wrote about C is also true for C++. Furthermore, C++'s standard library is much more powerful with the STL providing complex data types out of the box (which were directly inspired by and/or copied from Ada), which helps reduce the amount of code that needs to be written. Combined with Boost, it is a very powerful programming language. However, I find C++ difficult to work with, and extremely hard to debug when something goes wrong, and the language itself has more than a few surprises. In the course of my career, I have worked and debugged multiple C++ codebases, and have found myself agreeing with more than a few of the points of the C++ FQA. The largest advantage C++ gives me over C is native object oriented interfaces, and a better standard library.

Next up on the list is the Go programming language (Golang):

Looking at golang as an option for this project is something of a mixed bag. I have used Go before in building part of the FTL video streaming service used as mixer as it's original ingest daemon. An additional plus is that Golang's toolchain generates static binaries which are easy to deploy and it's relatively straightforward to cross-compile with Go to various different architectures. Golang itself also has excellent support for multitasking via goroutines and channels. However, these capabilities come with a cost. The first is that Golang itself doesn't support asynchronous actions well; most API calls are blocking; and channels themselves are synchronous. This means that for high performance you either have to have hundreds to thousands of goroutines at once, or handle manual locking through a mutex. This further complicates issues with handling ordering of operations when working with an underlying database and DNS transaction tracking. While not insurmountable, Go also has pain with interacting with pre-existing C code.

While Go code can be compiled to a shared library, and C code and be integrated via cgo, the use of goroutines can cause complications when pre-existing C code spawns its own threads, or Go code is called from threaded code. While Golang provides runtime.LockOSThread which can help mitigate issues with code that needs to execute on the main thread (and thus maintain TLS logic), it's quite difficult to do in practice. Furthermore, Golang lacks strong typing like those provided by Ada and Rust, and what types it does have get hampered by a lack of generics. On the upside, Go has excellent support for creating server applications and HTTP interfaces out of the box, which does drastically reduce the amount of programming required or the need to bring a second programming language in to handle front-end operations. As such, I have very mixed feelings on using Golang for this project, although I concede it is a viable alternative.

Last on our native compiled language is Rust, the option I'm mostly leaning towards if I go for a rewrite:

Rust itself shares many of the features I value in Ada such as extremely strong typing, features that inherently reduce the risk of unintentional behavior or security exploits through the borrow checker. In addition, being built on LLVM, Rust's portability is almost as good as Ada/GCC. Unlike Ada, Rust has an excellent collection of add-on packages and libraries through its crates system, and in some ways it surpasses Ada as it can better handle memory deallocation than Ada's Unchecked_Deallocation method. Rust also provides good frameworks for handling import of data, and providing web interfaces via the 'diesel' framework.

There is, however, a price to be paid for all this. Unlike Ada, which has special class types and constructs for interacting with Tasks, Rust's threading model is quite different. Data can either be shared through Rust's channels (which are similar in concept to Go's), or through a shared memory structure that is regulated through mutexes. Channels allows easy processing of multiple consumers to one processor, but life gets difficult if you want to have multiple processors. Data integrity is handled by the borrow checker, but in certain cases, it is impossible to determine if a borrow is safe at compile time. For these cases, Rust provides Rc and Arc, which are reference-counted variables that implement the borrow checker rules at runtime and abort execution if the rules are violated.

In general, I have found working with Rust to be somewhat of a mixed bag. Like Ada, if my code compiles, I'm reasonably confident it's going to work. Ada also has a reputation that the compiler can be extremely pedantic, which is well earned. Rust, on the other hand, is even more specific due to the way mutability and data ownership works within the language. While the mantra of "the borrow checker is always right" can make one endure, more than once have I wanted to eject my computer out the window when coding with Rust. I have also found at times I have occasionally had to refactor code significantly if I end up using data in a way I didn't initially expect. While these aren't show stoppers, it does mean developing in Rust can be relatively challenging — even compared to Ada — even if the language is more mainstream.

Finally, the last language on the list is Java, which is unique in which it doesn't compile to native code:

I hesitated on even including Java on this list, but of all the non-compiled languages, it's the closest one to meet my needs. While the JVM is unfortunately both heavy and memory hungry, Java does offer a sandbox which eliminates entire classes of security bugs and exploits by the sheer nature that it isn't running native code directly. Java itself has excellent support for threading and multitasking, as well as a very rich core library. J2EE also adds a fairly usable (if somewhat heavy) frontend interface for developing webfrontends and sharing code between the DNSCatcher server and frontend components. Java is the most popular programming language in the world, according to the TIOBE index. Although Java itself doesn't have the greatest reputation in the FOSS space, due to historical issues with Sun, and now Oracle, it's still a very viable choice if I'm willing to give up deployment to embedded devices, and brings a fair bit of relief in the number of ways I can blow a foot off due to the nature of the JVM.

Anyway, SN community, that's my thoughts on my options. As of my writing this, I'm leaning towards either rewriting on Rust, or staying with Ada, but I'm keeping an open mind and hope you guys can help either highlight options I'm unaware of, or help make this decision with confidence.

~ 73 de NCommander

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.