Stories
Slash Boxes
Comments

SoylentNews is people

Journal by cafebabe

An ex-housemate was a part-time touring musician. He had a normal job but he also toured internationally as a keyboard player in various bands. Obviously, he multiple musical synthesizers with different features and, obviously, some of them suffered rough treatment.

Well, one synthesizer got dropped a fair distance and didn't work well after that event. It sorta worked but it was described to me and a friend as playing multiple notes when only one key was pressed. We immediately suspected what was wrong. Computer keyboards and musical keyboards are usually arranged as a switch matrix. In this arrangement, there is a conceptual rectangular grid of switches (controlled and monitored by a microprocessor). The matrix is sequenced such that the microprocessor powers one row of switches and the status of individual switches can be selectively read. If this grid is approximately square, it minimizes the input and output wires to the microprocessor and therefore reduces the cost of microprocessor's physical packaging.

So, it was very probably a trivial case of crossed wiring. What was the cause? We were told that the synthesizer had landed on an obstruction (a base drum?) squarely in the middle. That's a good clue. A full-size synthesizer is significantly larger than its circuitry but to make the gadget is equally weighted, the complicated parts are usually placed in the middle. But what causes crossed wires rather than total failure? We had a working theory. Electro-magnetic shielding.

Being programmers who don't carry screwdrivers, the musician supply tools. We removed a few screws and found that the (unusual) four row connector had been stamped into a cheap and nasty folded plastic laminated sheet of really thin aluminium. The sheet was pressed against the connector by the cracked, indented case. That was the cause of the crossed wires. Well, we ripped that sheet out and re-assembled the synthesizer with a full set of screws.

It worked first time.

Our musician friend was surprised and confused. If the synthesizer worked without that dull shiny sheet then why was it included? Surely that made it less reliable and increased the cost?

Because, regulations. (Do we have to explain FCC Part 15 to a musician?)

That's ridiculous. It is a synthesizer. It isn't going to make head-in-microwave-oven levels of electro-magnetic radiation under under circumstances. And how does a folded sheet prevent radiation leaking from the edges? It doesn't.

We all learned something from that. Our musician friend probably learned the most but it left him very disconcerted that his trusty synthesizer worked fine by completely removing a factory-fitted component.

This came to mind because I've been evaluating a Raspberry Pi for a commercial venture requiring hardware control. The result of the evaluation is that a Raspberry Pi (or one of the numerous clones) is not suitable for any task requiring hardware control but not for obvious reasons. I'd be extremely uneasy using a Raspberry Pi or equivalent to control the environment of a fish tank. Yes, thousands of people doing this or more critical tasks such controlling blood insulin levels with a Raspberry Pi or an Intel Edison. However, some people have no better choices.

I have no love for the Raspberry Pi. It was initially selected over the clones or other platforms based on a group conversation at my local hackerspace. We performed a back-of-an-envelope calculation to estimate the quantity of unused Raspberry Pis. Our lower-bound estimate was 1.8 million functional but unused Raspberry Pis. How does this come about? Well, geeks get the concept and think "Oh, that's a great gadget! I'll tinker with one when I've got some spare time." That spare time never arrives and another geek gadget sits in a cupboard and gets forgotten.

Indeed, the Raspberry Pi I have on loan was purchased for tinkering and also to teach a specific kid about programming and electronics. After a brief flurry of activity, it was placed in a cupboard and forgotten for two years. I may be able to evaluate a more recent model which is currently resident in someone else's cupboard. This proves that point that there is a vast pool of surplus units. But why is this such a concern?

This is important because it allows potential customers to determine when components become scarce or expensive - independent of our efforts to maintain supply. Indeed, there is a paradox of long-term supply. If care is taken to ensure that customers have long-term supplies from second-sources then sales from the primary source increase and second-sources become less important. Further re-inforcement occurs because long-term supply increases re-sale value and decreases depreciation. In the worst case, if a customer pursues a venture which is nonviable, losses are reduced because capital assets with long-term support return more pennies on the dollar. This in itself make our venture and our customer's venture more viable.

So, there's an argument for ensuring that even the most awkward to obtain component is a commodity. The remainder of the exercise is a trade-off between cost, functionality and estimated future supply.

Exponential Decay

My estimate is that currently available models of Raspberry Pi will be widely available for more than 10 years. Further development and sales may extend that beyond 20 years. How is that estimated? An exponential decay of working units can be calculated from a known starting value. The only other figure which is required is lambda, the rate of decay over time.

Some of these figures are widely known or can be easily computed. For example, invulnerability to a particular virus or network attack is about 22 days. That is, half of the vulnerable computers get patched or retired in 22 days. Half of the remainder get patched or retired in the next 22 days. Half of that remainder get patched and suchlike until the remainder is negligible. Is it really 22 days? Well, desktop computers may be unused when users go on holiday. Servers may be unmaintained. And millions of network-enabled devices have no upgrade path.

Lambda for server application retirement is about 15 months. Just consider stuff that comes and goes on the Internet. Yeah, some of it sticks around. However, I'm aware of a music company and record company which continuously re-write their mission critical software. Feel free to poke holes with the figures but it should be obvious that there is a large gulf between a running server and secure server. It is also foreseeable that a customer may have a mission-critical application and no hardware or secure means to run it.

If you have a large system, this type of attrition is visible as you work. If you have 3000 harddisks, about 10 fail every day. You may do a quick back-check here and think I'm crazy for suggesting MTBF [Mean Time Between Failure] for a Seagate harddisk is about 300 days. I'm don't doubt that you have a desktop computer with one harddisk which gets lightly used for 40 hours per week and lasts more than five years. Put 144 in a rack, run your application at [redacted] throughput and watch them shake each other to pieces. Not too many of those last for five years. And the ones that survive aren't worth running due to advances in technology.

Even you think that a harddisk lasts five years or more, are you sure you're not invoking a bath-tub curve survivor bias? My bad experience of a single unit purchase was, of all times, on behalf of my musician ex-housemate. In addition to lowering my average, it made me look like an unreliable idiot.

Background

I'm a little intimidated by hardware. I understand Boolean logic and I've even made a circuit with an analog integrator and a VCO [Voltage Controlled Oscillator]. However, I'm intimidated by the scale of integration. At a push, I'll perform server hardware maintenance. I've even reset the root password on an SGI Indy. But I won't open a laptop and I certainly wouldn't attempt to repair a cellphone or tablet. These credit card size computers are very definitely in the latter camp. Almost all of the credit card size computers use Micro USB power and Micro SD storage which is interchangeable with phones and tablets. Indeed, many of the cheap tablets have a circuit board which is, erm, about the size of a credit card computer.

I've also had bad experiences with the accompanying software. Most recently, our venture was burned by a Texas Instruments ARM evaluation board. After about US$50 and 12 person hours, we got absolutely nowhere. Not even deployment of the blinking LED example. (Apparently, something might have worked over USB if the development software wasn't installed in a virtual machine. I don't know what facilities it requires from a bare-metal installation but that's *exactly* why it wasn't trusted with such access.)

Some people don't have these concerns. They'll download anything from anywhere and run it on any insecure combination of hardware and operating system. Obviously, it'll fail at inopportune times but that doesn't deter the ignorant from extracting maximum goodwill. For example, here's a short conversation started by someone with far less knowledge and far more bravado:-

[Idiot sidles up to me in hackerspace.]
Idiot: Do you know anything about software? [I'm so glad he didn't say coding.]
Me: Maybe.
Idiot: Do you know anything about for loops?
Me: Yes. Is this for a project?
Idiot: Maybe.
Me: Is this for a paid project?
Idiot: Maybe.
Me: Am I getting any of the proceeds?
Idiot: No.
Me: [Very unamused.] Fuck off.
[Idiot sidles up to the nearest person who was out of earshot and repeats enquiries.]

I don't mind helping students or the intellectually curious but I'm not free tech support for your hipster venture, especially when I've got my own commercial venture. We haven't spoken since but further investigation revealed that the idiot obtained a paying contract to make a robot. He had successful installed development software, copy-pasted example code from the Internet and got servos moving. However, he has no freaking idea how to write software.

In another example, I know an old electro-mechanical engineer who got a Raspberry Pi streaming with XBMC within four weeks. That was starting from a Windows desktop (and while coping with a senile, dying parent). Apparently, if you start from a Windows installation, getting a bootable Micro SD card is one of the hardest steps.

Maybe I'm encountering bias where people are loud about their successes. However, I'm astounded that it took me six weeks to get confident with a Raspberry Pi. By confident, I mean installing the PerlPowerTools (117 Unix utilities written in Perl), a Perl init system, compiling a database from source code with a custom set of compiler flags and then adapting SQL stored procedures. Unfortunately, very little of this has been repeatable.

I started with a two year old version of Raspian [Raspberry Pi Debian Linux]. This was state-of-the-art when placed in a cupboard and I saw no reason for an immediate upgrade, especially if the upgrade included systemd. So, I began with tempered bravado. My first task was to establish a network connection. Given that I started with the Upstart init system, this was achieved removing the Micro SD card, mounting it on my desktop and adding the following to $MOUNT/etc/rc.local:-

ifconfig eth0 10.55.44.33 netmask 255.255.255.0

I set my desktop to an address in the adjacent /27, 10.55.44.65. My next task was to find the username and password for the default account, if any. $MOUNT/etc/passwd indicated that there was an account pi with UID 1000. The password was unknown. The first attempt, password failed but removing the Micro SD card and copying my desktop's password from /etc/shadow was sufficient to bootstrap. (I've subsequently been told that the password pi would have worked. On more recent versions, the default password has been changed to raspberry. Either is highly insecure and should be changed before network deployment.)

The loaned Raspberry Pi required a suitable box to prevent accidental damage. The typical solution is to laser cut or 3D print one of the numerous, freely available designs. I couldn't be bothered with this. It was supplied in a small plastic box and I considered using scissors to punch holes into it for power and network. Instead, I used a surplus plastic box which previously contained hummus. Pro-tip: Punch the holes for network and USB where the lid is easiest to open. For comedy value, name the host according the label on the box. For my installation, /etc/hostname was set to hummus01 and the box has been further labeled:-

HUMMUS01
10.55.44.33
255.255.255.0

As a further riff on this joke: We put the soup into soupercomputing!

Splunking around the legacy 2GB installation, I found two non-JavaScript graphical web browsers, C, C++, Perl, Python, Ruby, Scratch, Squeak and support for a specific type of accelerometer and minimal support for GPIO [General Purpose Input and Output wires]. With one online purchase and some copy-pasta code, it should be possible for a bright and motivated 11 year old to make an interactive application which uses switches and/or an accelerometer. The current 4GB installation adds Java, Mathematica, Minecraft and significantly more out-of-the-box hardware support. This is ideal for its primary purpose of education and the learn curve has been smoothed in many instances. For example, a directory of example Python programs all have the line:-

import pygame

This appears to handle window creation, sprites and input mechanisms. With further hand-waving, the remainder is intended to be a contemporary version of Logo. Ignoring the shovelware, the filing system is incredibly similar to an x86 Debian deriviative - with the exception of /lib/arm-linux-gnueabi or /lib/arm-linux-gnueabihf rather than /lib/i386-linux-gnu or suchlike. However, bootstrap and virtual memory differs significantly. If no storage card is present or readable, the HDMI output is a bi-linear interpolation of a 2*2 grid of red, yellow, green and blue pixels. This looks quite pretty but indicates significant failure - usually PEBKAC [Problem Exists Between Keyboard And Chair] failing to re-insert a card. Further errors are indicated through a table of blink lights which has changed between revisions. So, three blinks on one revision means something completely different on another revision. Doh!

The first 128KB of a suitably formatted storage card is not allocated to a partition but quite obviously contains bootstrap code for the primary BroadCom VideoCore processor. (Although the operating system is centered around ARM processor(s), ARM is the junior partner in the hardware design.) The 128KB block, in a well-known location, provides enough intelligence to read a standard partition table and a FAT32 filing system in primary partition one. This provides more bootstrap for the VideoCore processor and an ARM kernel. Previously, a hard-coded set of bootstrap files allocated 64MB, 96MB or 128MB RAM to the VideoCore (or a skeleton 16MB). Contemporary versions provide "overlays" for hardware and a file to provide Linux kernel parameters. This is very different to primary partition one being formatted in ext3 and (effectively) containing /boot/grub.

The typical swap partition is omitted. I presume this due to a variety of reasons including distribution of the operating system as a raw image requiring dd onto the target storage card. Instead, /etc/init.d/dphys-swapfile (and configuration parameters in /etc/dphys-swapfile) sizes and places the swapfile, /var/swap. Unfortunately, swapfile performance is atrocious. After variously increasing and decreasing a swapfile to work in a limited space, 111 extents in an ext4 filing system incurs little overhead. Unfortunately, storage bandwidth is a very significant limitation.

A single-core Raspberry Pi is broadly comparable to an IBM T40 laptop with 512MB RAM and an Intel Celeron Mobile processor from the year 2006. RAM and network bandwidth is similar. For tight loops, processing power is similar. The Raspberry Pi omits screen and keyboard but is significantly cheaper, smaller, more ruggedized and draws about 1/10 of the energy. However, some of the energy reduction is not due to improved manufacturing techniques but due to a significant reduction in bandwidth from processor to memory and from memory to storage. ARM processors have small, deterministic processor caches. This is of minimal concern when running Perl, Python or Java bytecode but it really grinds when doing intensive tasks, such as compilation. This is especially true if compilation spills into virtual memory.

Many find it laughable that a system with 512MB RAM is described as embedded. However, it has become painfully slow to self-host a contemporary operating system without 1GB RAM per core for multiple cores. Although MySQL Server 5.0.x would compile on a Pentium 90 with 32MB RAM in 4.5 hours, MySQL Server 5.7.x requires at least 815MB RAM - or about 36 hours if you haven't got this headroom. (gcc and clang require slightly less RAM or much more time.) Indeed, FreeBSD doesn't consider ARM to be a primary platform because it would take an estimated four months to compile and test the 19000 supported packages. On amd64, this works within eight hours due to the deployment of 24 core hyper-thread compile farms. However, given that FreeBSD for amd64 has gained ARM and MIPS emulation, and native 64 core ARM servers are imminent, better support from FreeBSD will become the norm.

Returning to difficulty with swap, the BroadCom BCM8255 VideoCore bootstrap SD card interface becomes mmcblk0 and, in particular, get acquainted with the ARM Linux kernel process mmcqd/0 using 100% CPU power when communicating with /var/swap on /dev/mmcblk0p2. A benchmark with iozone found that my installation has about 0.4MB/s of storage bandwidth. If you're accustomed to 20MB/s from an ATA disk, divide that by 40 and round it downward. If you're accustomed to 50MB/s from SATA, try 1% of that transfer rate. In use, this isn't an obvious problem. For example, it is possible develop interpreted scripts in a responsive environment. The problem is that significant backlogs of uncommitted state can accumulate. In scenarios where 300MB RAM is allocated to disk buffers, it may take more than 10 minutes before a system can be safely switched off. I first encountered this after typing sync and finding that it took more than six minutes to complete.

This merely requires patience, understanding and a reliable source of power. However, an alarming complication can be found when viewing logs. The storage cards don't have 100% fidelity and it is possible to see log entries in which writes to specific sectors timeout after 120 seconds! This default should be raised significantly in proportion to RAM and throughput. 1200 seconds may be insufficient for a model with 1GB RAM.

I looked to see if I could increase performance using the mount -o noatime trick. Unfortunately, it had already been applied. /etc/sysctl.d/98-rpi.conf also contains vm.swappiness=1 which, presumably, aids responsiveness.

Some latency may be specific to my installation but there is a strict upper bound on storage communication speed. Despite SD cards and Micro SD cards using a serial protocol, they have eight pins or so. Have you ever wondered about the extra pins? Well, a card and host may negotiate dual, triple or quad channel communication. Unfortunately, the BCM8255 hardware datasheets indicate that SPI for the SD host is decidedly single channel. Furthermore, the clock rate is an even division of the Raspberry Pi's master 750MHz clock - capped at 15MHz due to capacitance of the circuitry. This leads to an upper bound which is less than 2MB/s - and speed steps which are somewhat short of the best case. (14.4MHz, 13.9MHz, 13.4MHz, 12.9MHz, 12.5MHz and so on.) And, obviously, this throughput excludes protocol overhead.

The 0.4MB/s speed limitation is notably absent from a USB2.0 to SD bridge which sustained approximately 9MB/s with the same card. A Raspberry Pi is unable to saturate a 400Mb/s USB2.0 interface and transfer speed remains inferior to sequential ATA access. However, 9MB/s is a significant improvement. Furthermore, if multiple swap partitions or swapfiles are configured, a Linux kernel will allocate virtual memory in proportion to each volume's responsiveness. Unfortunately, default init scripts only permit multiple instances of swap if they are bodged into /etc/rc.local. A further problem is that USB2.0 to SD bridges tend to be cheap units with a short MTBF [Mean Time Between Failure]. This would drag down an already flaky system.

The most significant problem with reliability is power supply. 7W at 5V is 1.4A. This is drawn from a USB power regulator which typically supplies 0.5A or less to a phone or tablet. Indeed, such gadgets are able to sip energy to the extent that a drop a voltage causes momentary disconnect. This allow a regulator's reservoir capacitor to re-charge. However, the VideoCore is a constant power drain. A VideoCore typically forms the brains of a DVD player and this places it in a different league for power consumption. The fourth generation's inclusion of an ARM core, FPU and MMU was speculative and feature matching. A push from techies based in the same city overcame the NDA [Non-Disclosure Agreement] limitations which usually preclude such devices from wider use. In its home market, the chip is sold at cost and then optimized codecs are licenced for fixed production runs. It is quite a departure from usual practice to sell the chip for educational use or general use and then give away two codecs. Old habits die hard and an official codec pack can be purchased by end-users. However, I have yet to find anyone who has deployed an official or unofficial copy.

Returning to power, it would be a reasonable assumption that a device with a Micro USB power connector could be used in conjunction with a Micro USB battery booster. These devices are buck/boost circuits; typically arranged around one, two or four 18650 lithium ion batteries. This battery size is very common in products exported from China and is somewhere between Duracell AA and C batteries. Unfortunately, I discovered that a battery booster is completely unsuitable for use with a Raspberry Pi while demonstrating the Perl init system and the PerlPowerTools at my local Perl User Group.

It booted on the third attempt, which is, unfortunately, normal behavior when cold booting with an empty reservoir capacitor. (Yet another reason why this type of computer is unsuitable for hardware control.) However, the system became unusable within five minutes to the extent that it was not possible to run the Perl version of the du [disk usage] command. Repeated invocations led to the same error because the corrupt sectors were already cached.

Root cause analysis found one byte of corruption when loading a 3KB Perl script (of which less than 2.5KB was active code). However, the system was initially able to load a 4MB kernel and a 1.5MB Perl interpreter without difficulty. I strongly suspect that sustained load causes voltage drop which then causes increased ripple from the external boost circuitry. The Raspberry Pi's regulator and reservoir capacitor cannot smooth the energy to the extent required for reliable operation. Given that the reservoir capacitor is a wet electrolytic capacitor, this effect will become increasingly likely as the capacitor ages, dries and decreases in capacitance.

Analysis reveals that the longevity of Raspberry Pi systems is greatly compromised by the power electronics. Power circuitry is a common cause of failure but it needlessly severe in this design. It is relatively easy to solder additional capacitance in parallel. (Is this a tacit acknowledgement?) This should be considered prior to long-term deployment.

The deal-breaker, which precludes serious usage of a Raspberry Pi (or any of the clones), is the lack of ECC [Error Correction Code] memory. For video decode, a bit error is overwhelmingly likely to cause a momentary glitch on screen before being expunged by the next codec key frame. In other applications, a bit error is much more severe. The lack of error detection or correction is completely unsuitable for long-term unmonitored tasks or, indeed, anything requiring hardware control. I've witnessed corruption which makes software randomly unusable. It isn't just the inability to run a 3KB script. make easily gets snagged during compilation. And it is typical for the process to stop at the same place each time. It is therefore disconcerting that some errors can be averted with a reboot. Unfortunately, there are deeper problems with the directory structure which cannot be averted.

After gaining experience and confidence with a loaned Raspberry Pi, I attempted to make long overdue improvements to signage at my local hackerspace. I'd asked for a password and/or source code. Neither was forthcoming. The most information I received was that the source code was on GitHub. This is marginally more specific than saying it was on the Internet. After several weeks of inaction, I just pulled the storage card and attempted to locate the script. I quickly found the script in one of the 22 home directories. (Oops!) However, iterating changes led to a situation where even the original script failed to run because the interpreter was corrupted after repeated power cycling. A more significant problem was that fsck only makes cursory checks and that fsck.ext4 -f -n found that more than 200 errors had accumulated in the directory structure. Sectors were double allocated to icons and Python scripts.

I'm not stupid enough to foobar that further and I didn't walk away from the problem as if nothing happened. I explained to three or four responsible adults how I'd borked the display. The original system administrator took the opportunity to upgrade and copy accounts across. Unfortunately, this means one more computer has succumbed to systemd. That really sucks because inaction would have been beneficial.

Regardless, a back-check on the loaned Raspberry Pi storage card found that identifiable errors in the directory structure accumulate at the rate of approximately one per day. The working theory is that directory sectors get cached in non-ECC RAM and corrupted over time. Perfectly good data then gets over-written with bad data when the cached directory structure gets modified and written back to storage. This is entirely consistent with observations from large-scale systems. A further complication is that random sectors receive writes due to storage communication errors.

I've yet to cover the main problem with contemporary Linux distributions. PulseAudio, udev, dbus and systemd are written by people who have been brought up using Microsoft Windows and think this type of design is acceptable in a reliable system. (Designs with a lack of source address are a particular bugbear.) They aren't smart enough to be kernel developers and this is combined with an attitude of "Works for me!"/WONTFIX/"I've got mine!" which makes Linus Torvald's famously brusk and forthright style look friendly. Ignoring the second system effect, there's the ADHD behavior of "Ooh! Binary logs! Ooh! DNS! Ooh! Embeded web server! Ooh! QR codes! Ooh! Consoles! Ooh! Bluetooth headset! Ooh! Virtual servers!" combined with an attitude of "It's free!" which completely misses points about code quality or TCO [Total Cost of Ownership].

There's other stuff which looks suspiciously like a re-run of Dual EC DRBG. A single corrupt byte incurs silent journal truncation? Combine that with forced log rotation and it becomes trivial to hide intrusion. Or maybe this gem from /var/log/kern.log generated by the May 2016 distribution of Raspbian:-

random: systemd urandom read with 64 bits of entropy available

WTF? Is there any reason to devise and widely distribute an entropy implementation which is worse than Bruce Schneier's Yarrow or Fortuna? Could something be devised which a least requires at *full* cabinet of equipment in the NSA's virtualized cracking farm? Y'know, the current implementation probably requires a rainbow table of 10TB or less. Could we at least raise the bar to well-funded adversaries only?

If you're a casual user, you may think these concerns are academic. However, as someone who has looked under the hood (and thrown out most of what was found), I assure you that the quality of system components has declined significantly over three of four years. A quick summary is as follows:-

  1. Your system may incur 20 seconds or more of unnecessary boot time. In the 1980s, Steve Jobs regarded one additional second as unacceptable.
  2. Your system will succumb to bit-rot at eight times the rate of older systems.
  3. This assumes you won't get hit by one of the 60 or more unidentified critical security bugs.

I'd like to substantiate each statement.

Boot time with systemd is unnecessarily slow. The primary author of systemd, Lennart Poettering, claims the opposite. This may be true when developing systemd in a virtual machine and the host operating system caches files. However, in real scenarios, this is insultingly false and has been widely debunked. In one case, a server takes more than 10 minutes to perform a memory integrity check before it potentially saves 0.5 seconds with systemd's parallel boot. Wow. Such vast savings. That was entirely worth 260000 lines of source code.

Let's take a more mundane example. The Upstart init system is 120KB. /lib/systemd/systemd is 1106KB and support binaries are almost the same size. /bin/systemctl alone is 498KB. On my loaned Raspberry Pi, the systemd binaries take more than four seconds to read from storage. Furthermore, /lib/systemd/systemd accumulates more than 10 seconds of processing time from the single core processor. That isn't wall-clock time. That excludes kernel processing and storage communication overhead. Overall, it takes about 50 seconds to boot a Raspberry Pi with a default systemd configuration. Out of the 435MB of RAM and 100MB default of virtual memory available to userspace processes, 90MB is already used. Comically, a further 100MB of RAM is initially used to cache files and directories accessed during boot.

Emptying the contents of /etc/dbus-1/system.d reduces program overhead from 90MB to 60MB. Replacing /sbin/init with something sane further reduces overhead to 30MB. Saving 60MB is very significant in a system with 435MB. It also reduces boot time by 20 seconds.

Extraneous code, which is multiple times the size of its predecessor, has to be retained with almost complete fidelity. However, it has already been established that directory structure bit-rot occurs at the rate of one sector per day and this excludes bit-rot to files due to random writes and other causes. Reducing the total number of sectors involved in the boot process reduces susceptibility to inevitable boot failure. Empirically, it is possible to reduce dependencies by a factor of eight. This is the difference between having a system which works for three months or two years.

Then there's the whole thing with LOC [Lines Of Code].

Programmers write programs. Sometimes, those programs work. Very often, programs have bugs. In aggregate, a bunch of competent programmers will write a fairly fixed proportion of lines with bugs. Oddly, this is around one bug per 50 lines of program - regardless of the programming language used. A proportion of these bugs are serious and a smaller proportion are security flaws.

Let's take some examples which are close to hand. Our start-up recently wrote a program to transform files. It has about 50 lines of active code - and two known bugs. That code doesn't even have any loops! The Perl init system has about 400 lines of active code - with five known bugs and more suspected. Overall, this fits the observation of having approximately one bug per 50 lines.

Edsger Dijkstra (or Donald Knuth or maybe someone else) noted that testing can only confirm the presence of bugs. It has also been noted that software wears in rather than wears out. So, would you rather run software which was written last week by an obnoxious kid or would you rather run software which has been run on five million computers for 10 years? The latter reduces problems by at least a factor of 10. Although, the remainder can surprise. As examples, a critical Microsoft Windows bug was found after 15 years and a severe GNU bug was found after 18 years. Some of the innocuous but more numerous bugs may hang around for more than 25 years before being fixed.

However, Lennart Poettering and Kay Sievers collectively write more than 300000 lines of code - code which crashes kernels, leaves computers permanently borked and hogs RAM and processing power with no discernible benefit - and the attitude is that bugs - and even workarounds - are someone else's problem. Submit patches, we're told. I'm not even sure that patches would be accepted because that would be an admission of error in addition to being software which was NIH [Not Invented Here].

There is a strong argument to write code in a high-level language unless strictly necessary. Separate to making a project tractable, it reduces the quantity of bugs and the quantity of severe bugs which become apparent after deployment. There's no magic here. This was known in the 1970s. There's also a strong argument to fix bugs in specification and design rather than code. That's the reason why a productive programmer at NASA writes four lines of code per year and the result is more reliable than a military spec airframe.

Anyhow, I mentioned the Perl init system to a Lisp programmer and this immediately caught his attention. Indeed, he's taken it as a challenge to write something smaller. We're like a two-handed sketch version of a well known cartoon about software ostensibly being written in Lisp but actually hacked in Perl. However, the long-term solution may be neither C, Lisp nor Perl. It is, however, entirely compatible with the GNU philosophy and the BSD philosophy. /sbin/init could invoke make in a directory where the default rule is to perform all of the boot tasks. Even if it was a heinous recursive script, it would be unlikely to attain 1% of the size or complexity of systemd. The limitation is that make would require an extension to catch and handle the POSIX signals SIGCHLD and SIGTERM. However, if this could simplify (or unify) SysV init, daemontools, Upstart, systemd and numerous other systems in a few lines of code then it would be worth pursuing.

One of the many outright lies propagated about systemd was that it would aid portability. No, it hasn't. Indeed, if this was the intention, it could have been achieved with another iteration of the LSB [Linux Standard Base], zero lines of code and zero bugs.

I'll finish with an exchange with another programmer who is now quite bored with init stuff:-

Geek: You can have a cookie if you promise to stop doing strange things with init systems.
Me: You know I can't promise that.
Geek: [Laughs and gives me a cookie anyhow.]

Epilog

While repeatedly trashing a loaned Raspberry Pi installation and another installation at my local hackerspace, the problems encountered with virtual machines, systemd and Windows have led to very real consequences for end-users. I wasn't to blame but this is illustrative of the type of scenario we wish to avoid when deploying infrastructure.

Over the last 30 days, my local hackerspace's laser cutter has been out of action for more than four days solely due to problems with virtual machines, systemd and Windows.

After a fairly serious hardware hacking injury requiring two rounds of surgery and the possibility of amputating a finger, idle chatter about authenticating access to machinery was hastily implemented. I assumed that it was cobbled together with minimum effort while avoiding any really obvious future limitations. However, this assumption was wrong. Root cause analysis has found the exact opposite scenario. It has been cobbled together using as many industry standard components as possible for the purpose of resumé padding.

I thought access to machinery used an ad hoc protocol to a fixed network address. No. It used DNS [Domain Name System], DHCP [Dynamic Host Configuration Protocol] and LDAP [the Lightweight Directory Access Protocol] which all run in separate virtual machines. I have misgivings about using DHCP in such a small environment but we'll let it slide because it allows volunteers to apply industry standard practice. Furthermore, the infrastructure is maintained and security fixes are applied. Unfortunately, due to resumé stuffing, any new whiz-bang feature is also being applied. That includes systemd.

This is where the first incident started. A monitoring system missed 1-3 pings at five minute intervals. This occurred daily within a three hour window during a quiet period. The initial assumption was that yet another anti-social asshole had found yet another method of being anti-social by cold rebooting important servers without warning but only when it was quiet. This led to a small amount of needless recrimination before discussion with local independent retailers found they were similarly affected. So, we were all being affected by a daily electrical brown-out.

With a straightforward init system, servers handled this scenario in a predictable manner. However, systemd execution order is not straightforward. This leads a combinatorial explosion of states between the virtual hosts. This leads to scenarios in which a particular dip and duration in energy is sufficient to leave LDAP non-functional. Two people wasted six hours each waiting in vain for the laser cutter to become operational and similar amount of time has been wasted on a workaround. Specifically, a UPS [Un-interruptible Power Supply]. I'm wholy unimpressed because it goes completely against the adage "Never fix a hardware problem in software. Never fix a software problem in hardware." But there we are. Deficiencies in systemd are being rectified with money from hackerspace donations.

Since then, a separate software incident left the laser cutter inoperative. The computer with the laser cutter CAM software upgraded to Microsoft Windows 10 for reasons which may never be determined. Unfortunately, the application is incompatible with Microsoft Windows 10 and therefore the laser cutter stopped working for the second time within a month. Fixing Windows was cheaper and easier than fixing systemd. And during this period there has been no fault with the hardware. Downtime has been entirely due to flaky software.

So, here's some things in my local hackerspace listed in decreasing order of reliability:-

  1. A Chinese hobbyist laser cutter.
  2. Microsoft Windows.
  3. systemd.

That's messed up - and I'm being really polite about the situation.

 

Reply to: Slackware?

    (Score: 2) by turgid on Thursday June 02 2016, @06:39PM

    by turgid (4318) Subscriber Badge on Thursday June 02 2016, @06:39PM (#354170)

    At the risk of sounding like a broken record, when I got my Raspberry Pi (1.5 years ago?) I tried something like raspbian and realised it was garbage straight away and put the Slackware port on it and never looked back. Mind you I haven't used it for ages. I was just using it as a headless host on my LAN for compiling code (stupid, useless home-made stuff) remotely.

    Before that, at work I'd been developing for Arago Linux on a couple of TI dev boards, Sitara and DaVinci IIRC, with Umbongo and then Mint on my workstation.

Post Comment

Edit Comment You are not logged in. You can log in now using the convenient form below, or Create an Account, or post as Anonymous Coward.

Public Terminal

Anonymous Coward [ Create an Account ]

Use the Preview Button! Check those URLs!


Score: 0 (Logged-in users start at Score: 1). Create an Account!

Allowed HTML
<b|i|p|br|a|ol|ul|li|dl|dt|dd|em|strong|tt|blockquote|div|ecode|quote|sup|sub|abbr|sarc|sarcasm|user|spoiler|del>

URLs
<URL:http://example.com/> will auto-link a URL

Important Stuff

  • Please try to keep posts on topic.
  • Try to reply to other people's comments instead of starting new threads.
  • Read other people's messages before posting your own to avoid simply duplicating what has already been said.
  • Use a clear subject that describes what your message is about.
  • Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything, even moderated posts, by adjusting your threshold on the User Preferences Page)
  • If you want replies to your comments sent to you, consider logging in or creating an account.

If you are having a problem with accounts or comment posting, please yell for help.