Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


My Ideal Processor, Part Foo+1

Posted by cafebabe on Sunday April 22 2018, @08:25PM (#3171)
0 Comments
Hardware

This is part one of a four part proposal for a trustworthy computer consisting of ALU, registers, memory interface, network cards and compiler. Part one concerns ALU and registers.

It is possible to prototype processors using FPGA [Field Programmable Gate Array] development boards. They have numerous advantages and dis-advantages. They are small, relatively cheap, relatively fast and can be re-purposed for other projects but they typically incorporate DRM and require the use of untrusted software. If the purpose of the project is to create a trustworthy computer, an untrusted FPGA would not be the final solution. Therefore, we have to "keep it real" and devise a solution which can be implemented without an FPGA. It may be common for the solution to be deployed on an FPGA but a solution which does not specifically require an FPGA has numerous advantages. For example, any solution which vastly under-utilizes an FPGA may be implemented as a multi-core processor. Whereas, any solution which is crammed into an FPGA may be expensive or impractical to implement in other forms. So, the intended scale of integration is DIP chips in sockets which can be hand soldered. However, trade-offs of other implementations are acknowledged.

There are numerous techniques to make minimal processors. It is possible to apply the design principles of the Apollo Guidance Computer to make something which can be programmed like a 6502 or similar. Unfortunately, it will be significantly slower than a 6502 because it will use 64KB EPROM or similar with 350ns access time. Further circuitry, such as counter chips, may require a system to run below 1MHz. A lack of instruction pipe-lining and 64 bit registers will also require significantly more clock cycles per instruction. It would be unrealistic to assume 10% performance of a Commodore 64. Regardless, in the same manner that it is possible to get a 6502 to add 64 bit numbers, a smaller data bus and ALU does not prevent 64 bit data from being processed on minimal hardware. Likewise, memory bank switching allows a very large amount of memory to be addressed by a processor. It would be faster and more efficient to not use multiplexing of an ALU, data bus or address bus. However, the minimal implementation is cheaper, more tractable and may be replaced with more substantial implementations.

The system will be based around one 8 bit ROM. One output from the ROM determines read or write. The other seven outputs determine which internal register to read or write into an internal accumulator. This has the obvious defficiency, described in NAND2Tetris, in which instructions will typically be a series of read then write operations. This can obviously be improved with the use of two ROMs and no intermediate accumulator. However, this incurs Amdahl's law because there are also cases where efficiency does not double.

The native machine has 128 × 8 bit read-only registers and 128 × 8 bit write-only registers. Of these, half of the read-only registers provide immediate constants. Therefore, any value from zero to 63 can be loaded into the accumulator. From this, it is possible to:-

  • Short jump within a 64 instruction page.
  • Medium jump within a 4096 instruction block.
  • Long jump across the whole ROM.

Some of the registers provide access to ALU functionality. In particular, it is possible set or clear a carry flag by writing to specific locations. Within the native machine, the window to main memory may be eight bytes or less. However, the virtual machine is not aware of this restriction. Anyhow, a minimal system proposal requires:-

  • 5 × 4 bit binary counter chips for the native program counter. This is arranged into six bits, six bits and four bits to allow page, block and global jumps.
  • 2 × 8 bit latch chips so that the parts of a medium jump or long jump can be applied synchonously. These are not essential but they avoid a large number of trampolines in the micro-code.
  • One 64KB EPROM or similar.
  • One 8 bit latch for the internal accumulator.
  • Numerous latches for all of the general purpose registers.
  • Latches for general register selection.
  • Latches for temporary registers.
  • Latches and counters for the virtual program counter.
  • Logic chips for the ALU. This includes internal flags which are not available via the virtual machine.
  • Logic chips for register decode.
  • Logic chips for virtual instruction decode.

Within the 128 × 8 bit write-only registers:-

  • Three addresses allow the native program counter to be set.
  • A range of addresses allow ALU flags to be cleared or set.
  • A range of addresses allow ALU operands to be written.
  • Three addresses allow register operands to be selected.
  • Eight addresses allow writes to a general purpose register.
  • Four or more addresses allow main memory address to be specified.
  • One or more addresses allow writes to main memory.
  • Four or more addresses allow the virtual program counter to be set.
  • One address allows the virtual program counter to be incremented.
  • Four or more addresses allow virtual stack pointer to be set.
  • Virtual instructions may be written into specific addresses for the purpose of splitting byte-code fields. This functionality is copied directly from the design of the Apollo Guidance Computer.

Within the 128 × 8 bit read-only registers:-

  • The first half of the address-space allows constants from zero to 63 to be loaded into the internal accumulator.
  • A range of addresses allow ALU operations to be queried.
  • 24 addresses allow reads from general purpose registers.
  • One or more addresses allow reads from main memory.
  • Four or more addresses allow the virtual program counter to be queried.
  • Four or more addresses allow virtual stack pointer to be queried.
  • A range of addresses conditionally provide a subset of known values. This allows bit-field decode, conditional execution and jump tables to be implemented within the native machine.

The native machine does not have subroutines or conditional instructions. It is not a general purpose 8 bit micro-processor. It is micro-code for the purpose of implementing one specific computer architecture. Conditional sequences, such as loading variable length instructions into a bit-field decoder, require a terminating jump at the end of the sequence and multiple jump tables for opcode and addressing mode. These conditional addresses can be obtained from dedicated register locations.

It is also possible to return to the beginning of the fetch-execute cycle via the use of a dedicated register. This allows an interrupt to be handled without incurring additional clock cycles to test an interrupt flag. Given that there are no conditional instructions, this is the most logical implementation. The read of the interrupt vector also allows atomic sampling of an interrpt signal and therefore it avoids the obscure corner case where a transitioning interrupt signal level causes indeterminate behavior across the hardware.

It is not strictly neccesary to implement the virtual machine's program counter with dedicated counter chips. However, omission would require a significant number of clock cycles to perform an increment operation via an ALU. This differs from the Apollo Guidance Computer in which all hardware counter increments stole bus cycles to perform increment via the ALU. However, in the case of the Apollo Guidance Computer, all counters were the width of the ALU. In the proposed design, all counters are significantly wider than the ALU and therefore increment operations require dedicated hardware or micro-code. While increment of the virtual machine program counter is an optional quick win, bi-directional increment/decrement of stack occurs far less frequently and is more difficult to implement as a dedicated unit. In both cases, the virtual machine's architecture constrains read and write operations within one instruction cache line and data cache line. Within a micro-coded implementation, this allows the bottom byte of the address to be incremented without concern for ripple carry into other bytes of the address. To achieve this, unaligned read and write operations are split into multiple virtual machine instructions. You'll see why this is important in the next section.

Many native instruction sequences will be read, write, read, write and the occasional poke to clear a flag or suchlike. This could be implemented with a pair of micro-coded EPROMs and no intermediate accumulator. However, this increases cost, slows development and doesn't double performance. Regardless, it is a trivial option to increase performance after initial development.

My Ideal Processor, Part Foo+0

Posted by cafebabe on Sunday April 22 2018, @08:18PM (#3170)
0 Comments
Hardware

In response to CID634544 from UID791:-

I have no interest in price/performance anymore. Whatever it is, it is. I'm extremely interested in price/security/openness. How open is the architecture? Are there any blobs anywhere? Is there a security chip like the Intel ME? If so, do I have access to the source code? Can I compile my own security engine to run on this dedicated security processor?

Those are the questions I have now. I would build a system today with far less power than the bleeding edge processors out there, but all the security we want and need. I firmly believe now that security can ONLY be obtained with full and absolute transparency. No security is obtained through obscurity of the methods and processes.

Intel and AMD can both do whatever the hell they want, but the first company to deliver the security we truly need will start getting a lot of orders. Even if they're lower on the performance/feature totem pole.

I have been working on a paractical solution to your problem. It would have been preferable to have a solution prior to Dec 2017, when Spectre and Meltdown became widely known. However, a solution remains urgent. I have an outline design in which all components are open to inspection. This will be described in several parts.

The first part is a minimal, micro-coded system which implements a processor with 64 bit registers but, to reduce costs, not a correspondingly dimensioned ALU. You said that you would accept this solution irrespective of its speed. I am concerned that you may wish to place further constraints on your specification because my proposal is likely to be significantly slower than a Commodore 64 or Apple 2. Obviously, if computers with a 16 bit address bus were suitable, you'd be using them already. However, if you compare, for example, the market viability of Tandem's high-availability, 16 bit, stack processors against my concerns about transistor switching speed and ALU latency and system integrity, you'll see that a native, flat, 32 bit address-space is a beguiling false economy from which I have freshly emerged.

The second part is the memory interface and card bus system. this also covers physical considerations, such as enclosures.

The third part covers contemporary expectations about network interfaces. Unfortunately, this may require a workaround involving bit-banging with an Arduino or similar. Allowing such an untrusted unit into a system requires all data to be distributed across multiple network interfaces. In the trivial case, this only provides link layer security across redundantly wired LANs. In restricted cases, this may also work across WANs.

The fourth part concerns writing a compiler for such a system. I have a viable proposal which allows a complete bootstrap from an insecure environment. However, all of the work on trustworthy hardware is moot if trustworthy software cannot be maintained.

Make Your Own Boxes With Rounded Corners

Posted by cafebabe on Sunday April 22 2018, @08:12PM (#3169)
0 Comments
Hardware

(The tools and materials lists look like a Dave Barry homage but this is not a spoof.)

Tools: bandsaw and all associated safety equipment, tape measure, coarse file, sandpaper.

Materials: two or more ball pit balls, drain-pipe, sheets of wood or plastic, glue, (optional) piano hinge, (optional) suitcase latches, (optional) aluminium strips, (optional) Mecanno pitch 5/32 inch nuts and bolts, (optional) yoga mat or child safety mat, (not required) 60000 feet of tram cable.

In the 1970s, rounded corners were a statement of finesse and sophistication. In the television series, UFO, Colonel Edward Straker's office has a lava-lamp style mood screen and is white with rounded corners with one of the rounded corners being a mini-bar. Totally shagadelic. It also helped that they dressed like Sgt. Pepper's Lonely Hearts Club Band. The white, rounded corners are copied in UFO's submarine cabins. When the third season of UFO was cancelled and work started on the inferior Space 1999, Commander John Koenig's office had more than a passing ressemblance to Ed Straker's office. These design elements and others are often mis-attributed to a Steve Jobs wet dream but since the 1970s, audio and computer equipment has cycled around box colors of white, biege, brown, gray, silver or black. Everything except white, silver and black has been dropped from this rotation but rounded corners on physical products remain a neccesity when consumers are a danger to themselves. For example, consumers are barely able to operate a micro-wave oven safely. It is from this situation that website styling follows physical products. If all of your products have rounded corner then, for consistency, web site style may be similar.

For a very long time, I've wanted to make boxes of arbitrary size with rounded corners and I feel like a dumb-ass for arriving at a solution after so long. The answer came about two months ago and, since then, I've been loudly telling people about it. The answer is to use ball pit balls for corners, drain-pipe for the edges and arbitrary sheets for flat sections. If a ball has radius (or diameter) which is the same or larger than the drain-pipe then the ball can be trimmed to size. I was able to find ball pit balls and drain-pipe with 68mm diameter. Unfortunately, the palettes are horribly mis-matched. Drain-pipe is typically available in white, gray, brown, black and (rarely) green. Unfortunately, gray and brown tend to be large diameter sewage pipes. My local supplier only had 50mm and 68mm pipe in black. To contrast with edge color, ball pit balls are only available in garish kid colors, such as bright purple, lurid pink and lime green. So, edges and corners are likely to be in clashing colors. Regardless, it is relatively trivial to manufacture boxes of arbitrary size.

I assumed that ball pit balls are ridiculously cheap because they are required in such large quantities. With a two inch diameter or similar, a hypothetical 10×10×10 block of balls is quite small and 10000 balls would barely fill a closet shower. A typical ball pool must have hundreds of thousands of balls and therefore they must be cheap in volume. Indeed, they are. 200 are less than US$20 and volume pricing becomes increasingly favorable. I was also able to get 10 for £0.99 but many retailers don't stock them during winter. The plastic is presumably ABS, BPA or similar plastic commonly used for food containers.

Unfortunately, the balls are sold by diameter and not thickness. They are incredibly thin to the extent that two or more layers should be glued together. Given that they are incredibly thin and flexible, any two sections of ball effectively have the same radius. Therefore, two or more layers can be easily glued together. The balls are presumably manufactured with injection molding. The balls have two smoothed injection points at the "poles" and a seam at the "equator". This facilitates cutting one ball into eight demi-semi-hemi-spheres when only using blunt scissors. With a little practice, this can be achieved with tolerance of 3mm (1/8 inch) or better. However, the balls are manufactured very cheaply and the molds are not spun before the plastic sets. Therefore, one hemi-sphere may have notably thicker plastic. To compensate, place the four heavier pieces in one pile and the four lighter pieces in another pile. Then match a piece from each pile when gluing layers together. This achieves the most consistent thickness after gluing. If cutting tolerances are bad and a piece from the top of one pile doesn't match a piece from the top of another pile then rotate one pile around with the intention of improving tolerances. There is a permutation explosion of possible matches but rotating one pile should obtain fairly optimal matches.

I tried three types of glue and there is no particular preference. I tried unbranded crazy glue which is presumably an aroma compound because it smells of acrid mint. I also tried water-proof silicone sealant and kiddie PVA glue. All work well. However, all leak from the sides, all glue corners badly and all take a long time to set. Presumably, this is due to ingress of air mostly occurring between the thin seam between the layers of plastic and the liquid glue being quite deep behind this seam. Predictably, crazy glue makes the most mess. Silicone takes more than one week to set. PVA gives the brightest finish. I am concerned that layers may flake due to ingress of water. In which case, silicone may be preferable.

My efforts with a hand saw have been completely laughable. The cheapest saw was completely impractical. However, a really good hand saw makes an unbearable resonance when sawing a plastic tube. This is completely impractical in my apartment at any time of day. Unfortunately, I discovered this after purchasing a 2m drain-pipe. I didn't want to take the full length of drain-pipe to my local makerspace. This led to the humorous situation of me, wearing a Hello Kitty tshirt, in a quiet street, sawing drain-pipe. This allowed me to take six small sections to my local makerspace. This was enough to make two boxes or one disaster. Unfortunately, the bandsaw was unavailable and a wooden bench vice was unsuitable without use of rags to hold pipe in one place.

Anyhow, the plan is to cut pipe lengthways in to 1/4 sections. This provides four edges with a 90° curve. Admittedly, this requires very accurate sawing. In volume, a jig to hold the pipe would aid such accuracy. The best part is that it is possible to cut all of the pipes without measurement because sections of the same length are always used in multiples of four. Sections of a particular length are always parallel to each other. If it aids visualization, each dimension (length, width, height) may be regarded as one section of pipe exploded into four complimentary curves of the same length. From this, it is possible (or even desirable) to mix pipe colors to aid identification and assembly of pieces. It is only after cutting edges that accurate measurements are required. This is to make the flat sections of the box to the correct sizes. These may be a mix of wood or plastic of any any color and of varying thickness.

So far, I have only described the outer surface of the box. An unseen layer is required to hold the outer layer together. For this, I considered purchasing a US$50 flight case and diassembling it. However, a thought experiment of case asembly is sufficient. A flight case typically has eight molded plastic corners. Each of these corners has three tongues. Cut aluminium can be slotted over each tongue. The extruded aluminium also has flat edges which allows the visible external panels to be glued. The tongues and edges are all missing from my design. I've only described the external shell but it can be held together with a second layer of flat (or curved) pieces. This may be aided with foam. A thick and solid type of foam may be sourced from yoga mats or child play mats. The thickest, dense foam is intended to be used a gymnasium flooring or play area flooring. [Insert Mark Twain "but I repeat myself" joke.] This thick foam is available in continuous rolls or inter-locking sections.

A box can be made with two sections which are hinged together. This may be achieved with piano hinge which is, conveniently, available in piano length sections. It may also be cut to length without the hinge falling apart. (Indeed, it is possible to make unrounded boxes through the use use of piano hinge. A friend made transparent cuboid computer cases where all 12 edges used piano hinge bolted to plastic sheets. Where all eight corners are formed from freely hinged edges, none of the edges have any freedom of movement and the box is relatively rigid. On this basis, you may want to order surplus piano hinge.)

Alternatively, strips of aluminium can be used to make a card frame system. I originally envisioned a system which is a multiple of 2 inch × 4 inch × 4 inch (5cm×10cm×10cm) in a flat orientation with one inch or so of padding and rounding on all sides (left, right, front, back, top, bottom). Within one unit, it is possible to fit three or more separate circuit boards. This would be a minimum of 12 square inch (300cm2) of circuit board. Nowadays, across two units, is it possible to fit nine credit card computers. Some of those credit card computers are quad-core and 64 bit, 16 core processors are foreseeable. That would be 144 cores in 2 inch × 4 inch × 8 inch (5cm×10cm×20cm). It is also possible to make units where the external dimensions approximate 19 inch rack boxes. For example, 4 inch × 16 inch × 16 inch (plus 68mm diameter tubing) easily fits within 4U. (Note that this is entirely compatible with the car dash-board design and allows equipment to be used during journeys and then removed without drawing blood.)

I considered making a rounded box, stereo audio amplifier for a partially-sighted friend. We previously bought speakers to watch Breaking Bad. However, these speakers used impressively hair-thin wires and speakers which required the molded box to hold the magnet against the coil and cone. Cable strain relief was a knot in the wire. This only cost £1 (approximately US$1.50) and had no amplifier but I was unaware of the construction until I tried repairing a strained wire at my local makerspace. The response to my laughter was akin to "What are you laughing at? Have you not seen inside recent Chinese exports?" I was unable to repair the cable strain and my friend's attempt to fix it. So, we watched the end of Season 4 and all of Season 5 audibly squinting in monophonic. Now I can make a small box with a headphone jack, two speakers, one of 10 PAM8403 stereo audio amplifier purchased for robotics and a 5 Volt USB charger. Unfortunately, I've only got black drain-pipe which isn't particularly good. My partially-sighted friend who regularly loses his black, Casio F-91W terrorist watch and his black, Dell mouse. Regardless, it is possible to make four of the corners red [port, left audio channel] and four of the corners green [starboard, right audio channel]. Actually, I wonder why stereo phono leads don't follow international shipping standards.

I can also make a box for the alarm clock project. A friend kindly ordered 15 large, red, seven segment digits. That's two sets of six for HH:MM:SS (or YY/MM/DD) plus spares. However, this is a case where I received significantly more than I expected. The description said one inch *segments* but I assumed that the entire seven segment module would be a metric inch tall. Oh, no. The segments really are one inch and therefore the digits are more than two inches tall. That should definitely be readable when I'm drowsy and bleary. However, digits this size don't fit into the hummus tubs which I use for credit card computers and prototyping. I now require corresponding lengths of drain-pipe to be cut without incurring serious injury.

My attempts at hardware hacking may be slow and abysmal but, to one of my friends, I look like a fricking wizard; especially after a friend stepped on a Casio F-91W watch and I repaired the catch with a blue paper-clip (to match the watch's blue trim).

Make Your Own Camera Tripod

Posted by cafebabe on Sunday April 22 2018, @07:56PM (#3168)
0 Comments
Hardware

Tools: scissors, (optional) drill.

Materials: six bamboo sticks, string, (optional) washers, (optional) nuts and (optional) bolt.

A friend suggested making a camera tripod from bamboo and the result was surprisingly good to both of us. I only wanted three bamboo sticks for testing and maybe a few spares. After some comedy at my local hydroponic shop, I purchased one pack of 25 bamboo sticks rather than three packs of 25 sticks. Optimistically, this was enough to make two or more tripods with plenty of opportunity to make repairs.

Bamboo sticks taper and therefore sticks can be used in pairs such that the cross-sectional area of the bamboo remains fairly constant along the length of each tripod leg. At the splayed end of the tripod legs, string should be knotted around each pair of sticks and the knots should be spaced so that sticks lean by about 15°. At the center, all sticks should be closely knotted in pairs and then knotted together and/or held together with elastic bands.

The platform for a camera can be made from an off-cut of wood. Cut three tapered ridges which take into account the off-centeredness of the stick pairs. With sticks wedged into the wood block, this platform is sufficiently stable to rest a camcorder. Although the picture may not be level, it is more than sufficient for motion capture. A level picture can be obtained with the use of a Gorilla Grip or similar. Alternatively, it is possible to add a standard camera clamp by drilling a hole and adding one bolt, one or two washers and a nut. Apparently, there is one standard bolt size for still cameras and one standard bolt size for motion cameras. The extra sturdiness was required for the weight of film cannisters. However, as digital cameras are decreasing in size, the small bolt size is becoming increasingly common for still cameras and motion cameras.

Hardware And Software For Lucid Dreaming, Part 1

Posted by cafebabe on Thursday April 19 2018, @08:25PM (#3158)
6 Comments
/dev/random

Over the last few months, my access to the Internet has been very restricted. I reverted to reading heavily. In particular, after a few iterations of selecting interesting web pages from Wikipedia.Org, I have more than 7000 web pages to read. I thought that I could read through the bulk of long pages with the aid of text-to-speech software. Unfortunately, I fall asleep while pages are being spoken. It is also difficult to pause. This isn't a huge productivity gain and it also leads to less rested sleep. It is fairly similar to sleeping with a radio switched on.

After bothering to implement a script which pipes text to espeak or similar, it seemed like a waste not to use this functionality. I wasn't sure what should be spoken but large quantities of text aren't received verbatim and are likely to be detrimental. Perhaps something in a loop would be more effective? Or timed? Ah! It is possible to make lucid dreaming hardware with a micro-controller and was one of many possible projects when my ability with micro-controllers improved. However, this project has now been reduced to a script for a laptop or desktop computer and the optional use of headphones. (Indeed, it appears that I've had the ability to do this for many years but only had the imputus due to lack of bandwidth leading to a huge backlog of text. If I had more bandwidth, I'd probably be listening to podcasts or similar.)

In my local makerspace, there were a few issues of Make magazine. One issue had instructions for making a lucid dream machine from sunglasses, a micro-controller, red LEDs and a momentary switch. Before going to sleep, wear the sunglasses with the micro-controller and push the switch. The micro-controller is programmed to do precisely nothing for four hours. It then blinks the LEDs a few times every five minutes. If a person is dreaming during one of these blinks, it may be interpreted within the dream as car brake lights or similar. After several incidences over many nights, this should be sufficient to prompt "Ah! I'm dreaming!" and in the long-term, this should be sufficient to bootstrap lucid dreaming without a micro-controller contraption. The micro-controller (or script) aids the relatively difficult first step of lucid dreaming.

The article in the magazine noted an unusual side-effect which is definitely worth repeating. I have not encountered this problem but it is entirely plausible. People using lucid dream aids are more likely to experience false awakenings and this greatly increases the chance of urinating during sleep. Apparently, this is worse among the type of people who keep a dream diary. Apparently, after a lucid dream, a person "wakes", writes in their dream journal, goes to bathroom, "wakes", writes in dream journal, goes to bathroom, "wakes", skips journal because urination becomes more urgent, goes to bathroom, "wakes", bathroom, and suchlike. This cycle can occur eight times or more and you only have fail a reality check once with a full bladder before waking in a urine soaked bed. Welcome to reality. Make sure that you note the incident in your dream journal.

With downsides noted, it is possible to obtain similar functionality with a small script to sequence text-to-speech messages. From experiences in dreams, I suspected this would be more effective than LEDs. One night, I fell asleep with 24 hour news on television. In the dream, I was in an attic with other people. I attempted to watch the television in the attic in the dream but the view was repeatedly obstructed by items in the attic or other people. Despite continual obstruction, I did not leave the attic. This was how a brain integrated an audio channel without its video channel. It could not fake the video channel nor mask the audio channel and so was in a situation where it required the presence of a plausible audio source without video. From this, I know that it is possible to convey more information than blinking LEDs - up to 100% fidelity with zero feedback. However, interpretation is extremely random.

The micro-controller design stays dormant for four hours. In Perl or similar this would be sleep 4*60*60. LED blink is replaced with echo "This is a test." | espeak or similar. I recommend that default rate of speech is slowed from the default. Despite this, I've found that framing errors occur with a repetitive prefix. For example, "Alert! Alert! Alert! Alert! This is a dream!" gets interpreted as "Lerta! Lerta! Lerta!" and the message is missed while you ponder "What's 'Lerta'?" This may be a semi-deliberate action from a brain which is attempting to hold together a coherent experience.

I mentioned my project to a friend. My friend suggested writing a phone app because accelorometers can be used to estimate a dream period. I may have further conversations with my friend because I have more ambitious plans. Said friend introduced me to the SCP Foundation, which is a mix of Cthulu mythos and a warehouse of artefacts; possibly inspired by a scene from an Indiana Jones film. My plan is to make stateful messages which can picked up at any point and prompt a dream narative akin to SCP: Containment Breach or Five Nights At Freddy's. You may ask "Are you insane? Deliberately inducing nightmares?" and I would answer "People watch horror films and play zombie games. Bang for buck, this may be much more effective." At the very least, it should be obvious that lucid dream software (or any closed source accelorometer app) should be inspected very thoroughly; in a manner which does not apply to other software.

Since mentioning the project to my friend, I've conducted four nights of testing. The first was a complete failure due to incorrect insertion of a headphone jack. The third night was unsuccessful due to timing being completely wrong. I suspect this experiment makes me more sensitive to auditory input from other sources. On the third night, I may have interpreted some drama among house-mates. However, making enquiries about events which may or may not have occurred may induce more drama.

The second night was quite good. In the dream, I was in my local makerspace despite it not looking like my local makerspace. After receiving one of the messages, I recall being slouched over a chair, with headphones around my neck, talking to a person in the dream about lucid dreaming software. There are numerous logical faults with this situation. Most significantly, if I hear a message from the software, it is because an instant of the software is running and the reason it is running is to provide auditory prompts while I am dreaming. The fourth night was a long science-fiction dream. At one point, I was Captain Janaway and Neelix told me that I looked ill. When I gained some notion that it was narative, the dream shifted to an office dream where the text-to-speech was interpreted as a hateful door entry system and therefore ignored. It then shifted to a scenario where a house-mate who creates drama was attacked by a giant with Thor's hammer.

Overall, I've made more progress in four nights than would be expected with blinking LEDs. Unfortunately, I may not have anything further to report on this topic for an extended period. Regardless, here is some example code which can be adapted:-

#!/usr/bin/perl

# Example Prompts For Lucid Dreaming
# (C)2018 The Consortium.
# 20180417 finish

# requires espeak to be installed.

$wait=3*60*60;
$rand=90;

$pre="Alert!";
@say=(
  'This is a dream.',
  'You are dreaming.',
  'This is not real.'
);

while(0==0) {
  sleep($wait+rand($rand));
  $wait=$wait/6+30;
  open(OUT,"| espeak -s 60");
  print OUT join(' ',$pre,$pre,$pre,$pre,$say[rand(scalar(@say))]),"\n";
  close(OUT);
}

Addendum 1: The micro-controller implementation may induce photo-sensitive epilepsy. Risk may be reduced by avoiding flashing sequences from 2Hz to 55Hz and only using LEDs which are either red, green or blue. Risk of epilepsy can be eliminated by using the audio implementation.

My Ideal Operating System, Part 1

Posted by cafebabe on Tuesday April 17 2018, @03:01PM (#3152)
1 Comment
OS

I'm quite impressed with the concept of an exo-kernel. It is one of many variants in which functionality is statically or dynamically linked with user-space code. This variant could be concisely described as applying the principles of micro-controller development to desktop applications and servers.

In the case of micro-controllers, it is typical to include what you need, when you need and deploy on bare hardware. Need SPI to access Micro SD and read-only FAT32 to play MP3 audio? Well, include those libraries. Otherwise don't.

In the case of desktop applications, it is possible to include stub libraries when executing within one process or a legacy operating system or include full libraries to deploy on bare hardware or a virtual machine.

In the case of a server, the general trend is towards containers of various guises. While there are good reasons to aggregate under-utilized systems into one physical server, peak performance may be significantly reduced. For x86, the penalty was historically 15% due to Intel's wilful violation of Goldberg and Popek virtualization requirements. After Spectre and Meltdown, some servers incur more than 1/3 of additional overhead. Ignoring performance penalties, container bloat and the associated technical debt, the trend is to place each network service and application in its own container. This creates numerous failure modes when they start in a random order. This occurs because init systems avoid race conditions within one container but if each service runs in a separate container, this trivial safeguard is defeated.

Regardless, in the case of a server, an application may require a JavaScript Just In Time compiler, read-only NFS access to obtain source code for compilation and a database connection. All of this may run inside a container with externally enforced privileges. However, there is considerable overhead to provide network connections within the container's kernel-space while the compiler (and application) run in user-space. In the unlikely event that a malicious party escapes from the JavaScript, nothing is gained if network connections are managed in a separate memory-space. If we wish to optimize for the common case, we should have application and networking all in user-space or all in kernel-space. Either option requires a small elevation of privileges but the increased efficiency is considerable compared to the increased risk.

Running an application inside in a container may require a fixed allocation of memory unless there is an agreed channel to request more. People may recoil in horror at the concept of provisioning memory and storage for applications but the alternative is the arrangement popularized by Microsoft and Apple where virtual memory is over-committed until a system becomes unstable and unresponsive. The default should be a system which is secure and responsive as an 8 bit computer - and having an overview of what a system is doing at all times.

Similar arrangements may apply to storage. It is possible to have an arrangement where a kernel enforces access to local storage partitions and ensures that file meta-data is vaguely consistent but applications otherwise have raw access to sectors. If this seems similar to the UCSD p-code filing system, a Xerox Alto or my ideal filing system, that is entirely understandable. Xerox implementations of OO, GUIs and storage remain contentious but storage is the least explored.

The concept of an exo-kernel makes this feasible at the current scale of complexity and has certain benefits. For example, I previously proposed use of an untrusted computer for multi-media and trustworthy computers for physical security and process control. Trustworthy computers current fall into three cases:-

  1. Relatively trustworthy micro-controllers of 40MHz or more. These have limited power dissipation and may be programmed on-site to user requirements. This limits the ability to implement unwanted functionality. It may be possible to access micro-controller memory via radio but this is a tedious task if each site has a bespoke configuration.
  2. Legacy 8 bit computers of 2MHz or less. Tampered firmware must work within very limited resources. It is also slow and difficult to tamper with a system which is constructed 10 years or more after an attack.
  3. A mini-computer design which is likely to run at 0.1MHz or less. Cannot rely upon security by obscurity but a surface mount 8 bit micro-coded mini-computer simulating a 64 bit virtual machine is, at present, an unusual case for an aspiring attacker.

In the previous proposal, there is a strict separation of multi-media and physical processes with the exception that some audio functionality may be available on trustworthy devices. This was limited to a micro-controller which may encode or decode lossy voice data, decode lossy MP3 audio or decode lossless balanced ternary Ambisonics at reduced quality. Slower devices may decode monophonic lossless balanced ternary audio at low quality. The current proposal offers more choices for current and future hardware. As one of many choices, the Contiki operating system is worth consideration. It originally a GUI operating system for a Commodore 64 with optional networking. It is now typically used on AVR micro-controllers without GUI. I previously assumed that Contiki was written in optimized 6502 assembly and then re-written for other systems but this is completely wrong. It is 100% portable C which runs on Commodore 64, MacOS, Linux, AVR, ARM and more.

How does Contiki achieve cross-platform portability with no architecture specific assembly to save processor registers during context switches? That's easy. It doesn't context switch because it implements shared-memory, co-operative multi-tasking. How else do you expect it to work on systems without super-user mode or memory management? I've suffered Amiga WorkBench 1.3, Apple MacOS 6 an RISCOS, so I know that co-operative multi-tasking is flaky. However, when everything is written in a strict, subset of C and statically checked, larger implementations are less flaky than legacy systems.

Contiki's example applications include a text web browser and desktop calculator. Typically, these are compiled together as one program and deployed as one system image. The process list is typically fixed at compilation but it is possible to load additional functionality into a running process. This is akin to adding a plug-in or dynamic library. Although it is possible to have dynamic libraries suchlike, this increases system requirements. Specifically, it requires a filing system and some platform specific understanding of library format. Although there is a suggested GUI and Internet Protocol stack, there are no assumptions about audio, interrupts or filing system. Although Contiki is not advertised as an exo-kernel, it is entirely compatible with the philosophy to iclude what you want, when you want.

With relatively little work, it would be possible to make a text console window system and/or web browser and/or streaming media player with the responsiveness, stability and security of an Amiga 500 running Mod player on interrupt. It is also possible to migrate unmodified binaries to a real-time operating system. In this arrangement, all GUI tasks run co-operatively in shared memory in the bottom priority thread. All real-time processes pre-empt the GUI. If the GUI goes astray, it can be re-initialized in a fraction of a second with minimal loss of state and without affecting critical tasks. This arrangement also allows development and testing under Unix via XWindows or Aqua. In the long-term, it may be possible to use Contiki as a scaffold and then entirely discard its code.

If media player plug-ins are restricted to one scripting language (such as Lua which runs happily on many micro-controllers), it is possible to make a media player interface which is vastly more responsive than Kodi - even when running on vastly inferior hardware. As an example, an 84MHz Atmel micro-controller may drive a VGA display and play stereo audio at 31kHz. Similar micro-controllers are available in bulk for less than US$1. Although this arrangement has a strict playback rate and no facility for video decode, it is otherwise superior to a 900MHz Raspberry Pi running Kodi.

My Ideal Filing System, Part 2

Posted by cafebabe on Thursday April 05 2018, @04:54PM (#3125)
4 Comments
Software

I might get funding for a de-duplicated filing system. I took the opportunity to send this to a friend with a background in physics, accounting, video editing, COBOL, Java and Windows:-

I'm feeling flush so I'm sending you a letter. Don't complain that you only receive demands for money. You also get the occasional note from a nutjob.

I won't mention health because it can be extremely frustrating to read about such matters. However, I assume that we're both ambling along to some extent. I made moderate progress with electronics and processor design but nothing of particular merit. Gardening is also progressing. I'm growing some parsley for when we next have fish.

I think that I stopped my ex-boss from bidding on chunks of old super-computer clusters. Even if GPUs weren't cheaper, we'd soon get caught by the horrendous cost of electricity. Due to Moore's law, a 10 year computer is likely to require at least 30 times more energy per unit of computation. I think this also stopped various cryptographic currency schemes which would presumably run on said hardware.

I presume that my ex-boss continues to seek funding from investors diversifying out of minerals. That would explain continued focus on hardware and high finance. My ex-boss recently suggested a scheme involving file storage. The unstated assumption would be storage is not out-sourced to a third-party. In particular, this would avoid the dumb-ass case of out-sourcing storage to a direct competitor.

I was initially hostile to this idea due to previous research. The technology is viable but the market invariably has one party which accidentally or deliberately prices storage below cost. It only takes one idiot (or genius) to skew the market for everyone. There are also periods when one party freely hosts illicit content and flames out. MegaUpload and RapidShare are two examples.

Anyhow, there is an outside chance that the third iteration of my de-duplicating filing system may gain significant funding and widespread deployment. We had a long discussion about de-duplication during a two hour walk. I'm not sure if you followed all of the details. Aplogies if you followed very closely.

The first iteration of my work was prototyped inside a database. It used fixed length records and implemented block de-duplication. Deployed systems systems typically use anything from 1KB blocks to 4MB blocks (with alarming variation in the checks for unique data). Larger block sizes are better for databases and video. Unexpectedly, 4KB works particularly well with legacy Microsoft Office documents and you can confirm for yourself that these files are exact multiples of 4KB. However, block de-duplication fails horribly with PKZip files. This is important because PKZip is the basis for numerous contemporary file formats, including Java Archives and .docx

The second iteration of my work was attempt to migrate to variable length blocks. The implementation split files at markers typically found within JPEGs and PKZip files. However, I never found a general case for this process and work was slowed because performance was awful. Compression improved the best case at the expense of the worst case. It also led to the problem of how to efficiently store thousands of different lengths of compressed blocks. When prototyping in a database, this can be delegated to the database but how would this be implemented efficiently in a file of raw disk partition? A solution must exist because it was widely used on desktop computers via DoubleSpace's Stacker and MSDOS6. The computer scientist, Donald Knuth, possibly in 1964, described Fibonacci lengths for memory allocation. This is more efficient than the obvious powers of two because the ratio grows by 1.6 and therefore waste is reduced. With some concessions for powers of two, this is implemented in Sun Microsystems' Arena memory allocator. The same principle applies to storage. For prototyping, a crude arrangement is a de-normalised database with one fixed length database table for each Fibonacci length. There is an analogous arrangement for raw disk.

None of these prototypes handled random access or full-text indexing, like EMC's systems. I was also concerned that, in the most pessimistic case, 3/8 of storage was wasted and, in the typical case, 3/16 is wasted. Waste can be reduced by tweaking figures. However, it should be apparent that there are diminishing returns that cannot be eliminated - unless the uncompressed data is stored in blocks of Fibonacci length.

That's the basis for the third iteration. Although it overcomes previous limitations, it has not been prototyped because it is expected to require far more work than the previous iterations. This is particularly true after a similar number of iterations prototyping full-text indexing. Someone sensible and numerate should ensure that talk of Fibonacci numbers, golden ratios, factorials, random walks, birthday paradoxes and Poisson distributions are not a numeralogical fringe lunatic heading towards a third educational but uncompetitive implementation. (If you're mad enough, there may also be opportunity to define your own rôle in this venture.)

I've been surprised how far prototypes scale. 1KB blocks with a 32 bit primary key allow 242 bytes (4TB) to be stored - and this was never a limitation when it only transfered 11 blocks per second. Search engines are similar. It is ridiculously easy to spread search terms over 4096 hash elements and compress the data to increase retrevial speed by a factor of four. This works for millions of web pages without difficulty. However, at scale, a comprehensive query syntax requires at least six tiers of application servers.

For Fibonacci file de-duplication, the base case is a one byte block. Or perhaps a one bit block. That's obviously ridiculous but it is feasible to map 13 byte sequences to an 8 byte reference and 21 byte sequences to a different set of 8 byte references. You may be incredulous that such a system works. If you are able to refute this assertion then you would save me considerable work. However, if I am correct then we may have the basis of a venture where units historically sold for US$20000 and clusters up to US$3 million. These figures are optimistic but managed systems generally sold for about 20 times more than the cost of the hardware.

I still like your idea of putting a weedy computer into a 4U box primarily so that it may display a huge logo on the front. Unfortunately, implementation is likely to require substantial quantities of RAM. Rules of thumb indicate that ISAM databases typically scale about 10 times further than cache RAM. More advanced schemes, such as InnoDB, typically scale about 100 further than cache RAM. ZFS scales about 1000 times further than cache RAM. Given the proposed structure, it is extremely optimistic to allocate one byte of RAM per kilobyte of storage. Systems which fail to observe this ratio may encounter a sudden performance wall. This precludes the use of very small computers without RAM expansion. In particular, a Raspberry Pi is not suitable to control a dozen hard-disks. A Raspberry Pi is not sufficiently reliable but it should give an indication that proper server hardware is required with all of the associated costs.

I hope to convince you that such a system would work. I would achieve this by defining a huge number of cases and mostly showing that cases can be reduced to previous cases. The most trivial case is opening an uncompressed text file, inserting one comma and saving the file with a different name. In this case, blocks (of any size) prior to the change may be shared among both versions of the file. If the block size is large then the tail of any file has minimal opportunity to obtain the benefits of block de-duplication. However, if the blocks are 21 bytes or smaller then then common blocks are likely to occur among five versions or so. In the worst case, the probability of 21 files having no common tail is the inverse of the factorial of 21. This is less than 1/(5×1019). And for the 22nd case, there is zero probability that savings do not occur.

From empirical testing, formal names, common phrases and URLs a easily de-duplicated. Likewise, the boilerplate of database websites has common sequences. Take 1000 web pages from any forum or encyclopedia and it is trivial to reduce the sequences by a factor of three. You may be concerned that this scheme fails when it encounters the first binary file. This is not the case. Identical files are trivial to de-duplicate. Identical files with staggered offsets (as is typical within email attachments) have a strict upper bound on duplication. Compressed files are similarly ameniable. For example, within PKZip archives, compression re-starts for each file within the archive. THerefore, for each unchanged file file within an archive, there is also a strict upper bound for duplication. Of particular note, this overcomes all of the shortcomings of implemented prototypes.

The most interesting cases occur with remote desktops and video. From discussions about the shortcomings of Citrix, I argued that video in a web page blurs the distinction between desktops and video. However, it remains common practice to use distinct techniques to maintain the crispness and responsiveness of each while reducing bandwidth, memory and processing power. De-duplicating video greatly reduces the value of streaming quadtree video but it does not eliminate it. Therefore, the numerous cases of video compression should be regarded as one case within this proposal.

In the trivial case, consider a desktop with one byte per pixel and each window (or screen) being a grid of 64×64 pixel tiles. Previously, each tile was divided into ragged quadtrees and each leaf used different encoding under various non-obvious quality metrics. (This is why I was using pictures and video of Elysium, Iron Man and Avril Lavigne as test data.) The current proposal works more like your suggestion to use RLE [Run-Length Encoding]. SPecifically, on a Fibonacci length boundary, of 13 pixels or more, contiguous pixels (plain or patterned) may be shared within one tile, between tiles and across unlimited application window redraws. Most curiously, contiguous pixels may be shared by unlimited users within an office. This allows screen decoration, window decoration and application interface to have references to common sequences of pixels across desktops. This can be achieved in a relatively secure manner by assigning references randomly and dis-connecting clients which attempt to probe for unauthorised chunks of screen.

In the general case, each user has access to zero or more transient storage pools and zero or more persistent storage pools. Each pool may be shared within one trust domain. This includes public read-only or read/write access. Each pool may be encrypted. Each pool may be accessed via 8 byte references and/or file name. Each pool may have a full-text index. And pools may be used in a union because this allows text, HTML, archives and multi-media to be indexed and searched in a useful manner.

However, the feature that I think will interest you the most is the ability shoot, edit and distribute video without using lossy compression. This arrangement has a passing ressemblance to Avid proxy format which stores 16×16 pixel tiles as a 32 bit lossy texture. Instead, 64×64 pixel tiles (or similar) are stored as 64 bit lossless references and no proxy is required. You may think that pixel dance would prevent de-duplication but you are not thinking on a large enough scale. Four variations of brightness across 13 samples is a maximum of 226 (64 million) variations. However, these variations are not restricted to local matches. Variations may be matched almost anywhere across any frame of video. Or, indeed, any data in the same storage pool. Admittedly, savings for the first million videos or so will be dismal but returns are increasingly likely as the volume of data increases. A well known result from search engines is that unique search terms grow proportionately to the square root of the number of documents. This applies to a corpus of documents of any language and any average length. I'm applying the same result to byte sequences within multi-media. For 13 bytes (104 bits), the square root is 252 permutations. Longer sequences occur less frequently but retain an essential supporting rôle.

Lossy compression of audio and video provides better than average matching because pixel dance is smoothed and the remaining data has correlation. Data may be locally compressed with some efficiency but the remaining data is skewed towards some permutations. Any skew improves the probability of matches beyond chance. This becomes ridiculously common when there are exabytes of data within the system. This may seem like an article of faith with shades of network effects and dot com economics but there is an upper bound to this lunacy.

Sun Microsystems found that 2128 bytes is the maximum useful size for a contiguous filing system. In classical physics, the minimum energy required to store or transmit one bit of information is one Planck unit of energy (or the absence of one Planck unit of energy). Very approximately, 264 Planck units of energy is sufficient to boil a kettle and 2128 Planck units of energy is sufficient to boil all of the water in all of the world's oceans. Therefore, operations on a sufficiently large filing system are sufficient to cause environmental havoc. My insight is to take this bound, apply well known results from search engines and resource allocation, and reduce everything successively down to references of 64 bits. If that isn't sufficient then it may not be possible for any solar system to support requirements. In this case, the filing system may be installed adjacent to, inside, or using a black hole. No known vendor supports this option.

I suggested that I work on a presentation while my ex-boss worked on a business plan which explained how to avoid obvious problems. I reduced a market overview and USP [Unique Selling Point] to 23 slides. I haven't seen any business plan but I optimisitically assume that slides have been shown to investors. As a contingency, I am now working on a business plan. This is an extremely competitive market with an alarming number of failures. I prefer to offer on-site storage to the private sector. We may have to offer off-site storage. In the worst case, we may allow one instance to be used as a dumping ground. Sale of apparel is definitely unviable. Advertising is also unlikely to work. Ignore the intrusiveness of advertising; to the extent that it is a security problem. As advertising has become more targetted, the signalling power of an expensive advert is lost. This is why adverts on the web are somewhere between tacky and payola.

In a public storage pool, each sequence of bytes requires an owner and access count. Each owner would be billed periodically for storage and would receive a proportion of retrieval fees. However, an owner may receive a charge-back if fees have been received for trademarks, copyright and public domain. This would be at a punative rate for the purpose of discouraging abuse. Ideally, illegal content would be assigned to a locked account and this would prevent copies being distributed. Unfortunately, retaining a reference version of illegal content is invariably illegal.

In this arrangement, it is expected that the most popular accounts would receive payment, a mid tier would obtain indefinite free use and a bottom tier would pay a nominal fee to access data. Anti-social use would be met with contractual and economic penalty. However, persistent trolls have been known to collectively raise US$12000 or more. So, penalty may have to be very steep.

Illegal content varies with jurisdiction. By different chains of reasoning, it may be illegal to depict almost anyone. In some jurisdictions, depiction of women is illegal. In other jurisdictions (including the ones in which we are most likely to incorporate), equality applies. Therefore, it would be illegal to block depictions of women without also blocking depictions of men. By a different chain, if people are created in God's image, if Allah is God and if depiction of Allah is illegal blasphemy then it is obviously illegal to depict anyone. By another chain, identity and self-identity may also make depiction illegal. Unfortunately, this cannot be resolved by asking someone what they want to view because profiling nationality is illegal in Sweden and profiling religion is illegal in Germany.

However, I have yet to mention the most ludicrous case. In some jurisdictions, such as the UK, illegal child pornography extends to the depiction of cartoons and text files. In the most absurd case, a sequence of punctuation may be illegal. In the case of a de-duplicated filing system, some sequences of punctuation, possibly by the same author, may appear in multiple files. In some files, a sequence of punctuation may be legal. In other files, the same instance of punctuation may be illegal. A similar situation may occur with medical information. A sentence recorded by a doctor may require "secure" storage. The same text may be published by someone else. Some content may be licensed on a discriminatory basis; most commonly this involves separate cases for private sector and/or public sector and/or individuals. This can be resolved contractually by defining all access as commercial. Oddly, this doesn't affect GNU Public License Version 2 but excludes a large amount of photography and music. See previous cases for photography. There is no technological solution for these problems. Indeed, it should be apparent that almost everyone has files which would be illegal in one jurisdiction or another.

Competitor analysis has been more fruitful. One competitor has so much correct and so much completely wrong. TarSnap.Com can be concisely described as a progression of my second prototype. Data is split in a manner which covers all cases. This was achieved with a mathematical proof which leads to the use of a rolling hash. This sequentially splits a file into variable lengths with a skewed Poisson distribution. However, throughput (and compression) is probably terrible because it uses the same zlib compression algorithm and, entertaining, a defensive architecture due to historical buffer overflows in zlib. Marketing and encryption are good but this is all wasted by out-sourcing storage to a direct competitor. Indeed, the system is only viable because the sub-contractor does not charge for the extensive internal bandwidth required to perform integrity checks. So, data is encrypted and the only copy is sent directly to a rival where it is mixed with two tiers of other clients' data. That'll be spectacular when it fails. Unfortunately, that's the competent end of the off-site storage market.

I investigated implementation of a general purpose de-duplicating filing system. It has become accepted practice to develop and deploy filing systems outside a kernel by using FUSE, PUFFS or similar. This increases security, isolation and monitoring. It also allows filing systems to be implemented, in any language, using any existing filing system, database or network interface. In the manner that it is possible to mount a read-only CDROM image and view the contents, it is possible to mount a de-duplicated storage pool if it has a defined format within any other filing system. With suitable hashing and caching, a really dumb implementation would have considerable merit. For my immediate purposes, I could use it as a local cache of current and historical web sites.

Anyhow, please back-check my figures. Optionally, tell me that it is an impractical idea or consider defining your own rôle writing monitoring software, accounting software, video editing software or whatever you'd like to do.

Yours De-Dupididly

Evil Doctor Compressor

I didn't include the slides but the collated text is as follows:-

Fibonacci Filing System

The Problem

  • The volume of digital data is growing exponentially.
  • Per hour, scientific data which is synthesized and extrapolated is greater than all scientific observations from the 20th century.

Moore's Law

  • Components on a silicon chip double every two years.
  • Digital camera quality doubles every 17 months.
  • Computer network capacity doubles every nine months.
  • We are drowning in data!

Coping Strategy: Compression

  • Techniques to reduce volume of data include:-
    • Content-aware compression.
    • Content-oblivious compression.
  • If content is known then compression may be lossy or lossless.
  • Lossy compression is typically applied to audio and video.

Code-Book Compression

  • Succession of compression techniques are increasingly effective but require increasing amount of working memory:-
    • PKZip - Uses tiers of small code-books.
    • GZip - Refinement of PKZip.
    • BZip2 - Burrows-Wheeler Transform.
    • LZMA - Uses code-book with millions of entries but requires up to 900MB RAM.

File De-Duplication

  • A filing system may eliminate storage of redundant data.
  • Covers all cases because it applies to all data handled by all applications.
  • Techniques work at different levels of granularity:-
    • Hash de-duplication applies to checksum of file.
    • Block de-duplication works on fixed-size chunks.
    • Byte de-duplication provides tightest compression.

Hash De-Duplication

  • Used in closed systems to provide a modicum of tamper-proofing.
    • Used in legal and medical systems.
  • Used in open systems to provide a document reference.
    • Root cause of Dropbox and MegaUpload's privacy and copyright problems.
  • Does not work with changing data.

Block De-Duplication

  • Implemented by LLVM, ZFS and other systems.
  • Commonly used to provide daily, rolling snapshots of live systems.
  • Block size is mostly immaterial.

Byte De-Duplication

  • Most effective scheme.
  • Most fragile scheme.
  • Most energy intensive scheme.
  • Candidates for compression may require up to 0.5 million comparison operations.
  • May be implemented with custom hardware to reduce time and energy.

The Unique Selling Point

  • General trend is towards larger block sizes.
  • This provides an approximate fit for the throughput of hard-disks and solid-state storage.
  • However, unconventional block sizes approximate optimal efficiency without the processing overheads.

Fibonacci Numbers

  • Common sequence in mathematics.
  • Discovered by X in X when studying breeding of rabbits but occurs throughout nature:-
    • Flower petals.
    • Pine-cones.
    • Snail shells.
  • Used by humans in Greek architecture and computer memory allocation.

Counting Theorem

  • Fibonacci numbers may be used in compression schemes if the counting theroem is not violated. Specifically, there must be a unique mapping:-
    • From data to code-word.
    • From code-word to data.
  • We propose:-
    • Mapping 13 byte data to eight byte code-word.
    • Mapping 21 byte data to eight byte code-word.

System Capacity

  • LZ compression schemes, such as PKZip, use tiers of code-books with thousands of entries.
  • LZMA compression schemes use one code-book with millions of entries.
  • Proposed scheme use two code-books each with 16 billion billion entries.
  • This is sufficient to hold a maximum of 557056 exabytes of unique data and any multiple of duplicate data.

Theoretical Upper Bound

  • ZFS uses 128 bit addressing on the basis that:-
  • ℏ × 2128 is vast.
  • Where:-
    • ℏ is the minimum detectable quantity of energy
    • and "vast" is enough energy to boil all of the water in all of the world's oceans.

Infinite Monkey Test

  • Can an infinite number of monkeys hitting keys on an infinite number typewriters exhaust storage? Yes but with difficulty.
  • Historical text files used 96 symbols.
  • For 13 byte sequence, this is a maximum of 0.0029% of the possible input permutations but it exceeds the capacity of the code-book by more than factor of three billion.

Infinite Monkey Test

  • Capacity is exceeded if input occurs without duplication.
  • This becomes increasingly difficult as data accumulates within a system.
  • Can be achieved maliciously if pseudo-random is designed such that it never repeats.
  • This case can be countered with a traditional file quota arrangement.
  • It is also trivial to identify anomalous use.

Birthday Paradox

  • Why is it increasingly difficult to exhaust storage capacity?
  • This is due to the counter-intuitive birthday paradox where it is more likely than not that 21 people share a birthday.
  • Worst case for matching is approximately the square of the number of candidates.
  • This bridges the gap between a code-book with 264 entries and upper bound of 2128 (or less).

Video De-Duplication

  • Standard video encoding may greatly reduce volume of data. However:-
    • Compression between adjacent pixels may occur due to reduction in contrast.
    • Compression is unlikely to involve more than 16 frames of video.
  • A global code-book provides additional opportunities for processing lossless or lossy video.

Audio De-Duplication

  • Trivial to share silence in uncompressed audio.
  • De-duplication of near silence increases as examples become more common.
  • Easier to de-duplicate compressed audio.
  • AMR (used in cell phones) uses scheme which minimizes latency and survives high loss and corruption. Data is also fixed-length.
  • Duplicates become increasingly common with duration of conversation.

Clustering

  • An unlimited number of computers may participate in one storage pool.
  • Each node is allocated a unique numeric range within each code-book.
  • New code-words are replicated and distributed using standard techniques.
  • Duplicate mappings may occur within a concurrent system. Normally, this is inconsequential.

Full-Text Indexing

  • Where block size is smaller than an average sentence (and smaller than a long word):-
    • There is an upper bound for the quantity of search-terms within a fragment of text.
    • There is an upper bound for search-term length within a fragment of text.
    • There is strictly zero or one search-terms between adjacent fragments of text.
  • Reduced size and scope means that a search engine is almost a corollary of de-duplication.

Applications

  • Traditional applications include:-
    • Enterprise document and mailbox storage.
    • File versioning and database snapshots.
    • Media distribution.
    • Clustered web caching.

Applications

  • Novel applications include any permutation of Project Xanadu, search engine, media broadcast, remote desktop and virtual reality:-
    • Distributed storage and caching with full-text index.
    • Hyper-text and hyper-media with robust citations and micro-payment.
    • Multi-cast, lossless, live, streaming surround sound and/or high-definition video.

My Ideal Alarm Clock, Part 1

Posted by cafebabe on Tuesday January 16 2018, @08:15PM (#2932)
4 Comments
Hardware

After thinking about micro-controllers and high availability using a clock as an example, I realised that I'm in the unusual position of being able to make my ideal alarm clock. This is only slightly more advanced than the generic quiz game buzzer implementation with the addition of interrupts, LED multiplexing and switch de-bounce. This is all within my ability. Or, at least, it should be. Indeed, I've seen a founder of the Arduino Project struggle with an 8×8 LED matrix. This person also described frying pan technique for surface mount soldering. Well, I've seen their software and I've seen their hardware and I conclude that making an alarm clock can be achieved with similar perseverance. Specifically, I have:-

  • Derived an H-Bridge from first principles.
  • Likewise for MOSFET rectification.
  • Put a rectified sine wave through a D-Class amplifier to obtain bi-directional motor movement with more control than an H-Bridge.
  • Used a combination of tilt switches, analog integrator, VCO [Voltage Controlled Oscillator] and servos to make an approximation of the Nekko EEG Cat Ears.
  • Arranged decimal counter chips to light 2×10 LEDs in sequence.
  • Written PIC assembly, 6502 assembly, Arduino C and similar.
  • Written signal handling software in C and Perl.

However, I:-

  • Struggle to deploy compiled code due to the opaque protocols being used.
  • Dropped modulo operator functionality somewhere while avoiding patronizing user interfaces.
  • Failed to get DACs working but this may be Voltage level incompatibility.

I have reached the stage, which is common among electronic engineering students, where making a clock is the test of ability. This is not a certain project but much else remains theoretical if this does not occur. Anyhow, let's do this properly. We may even get a MISRA C alarm clock.

Specification

Functionality For Version 1: Buttons for inputs. Probably about six. Multiplexed, seven segment LED digits for output. May be jumbo size. Probably about six. Specifically, HH:MM:SS in 24 hour format only. One modulated square wave in audio range to drive speaker. Can be derived from software or multiple oscillators. One digital output stays high when alarm triggers. Can be used to make clock radio. Or, because we're geeks, drive a relay for coffee maker or similar. Serial output in format compatible with gpsd. In combination with ntpd, this allows home network time to be set from alarm clock.

Functionality For Version 2: Three channel PWM output for dawn light prior to alarm.

Functionality For Version 3: Battery back-up RTC [Real-Time Clock]. This allows clock to count time when unpowered.

Much of this can be implemented with decimal counter chips. Counter chips with decoded outputs are preferable for Nixie Tubes while BCD outputs are preferable for seven segment displays.

Full functionality requires software. Due to laziness, I suggest using an Arduino Nano (Atmel AVR micro-controller with 8KB ROM). This also allows clock to be powered from USB. For example, a wall-wart phone charger. With or without RTC, date handling is awkward. Will conceptually reserve two places of LED multiplexing for century. So, display may be CCYY/MM/DD or YY/MM/DD or HH:MM:SS. This requires either 8×7 outputs or 6×7 outputs. With either a four digit year or a two digit year, it is possible to calculate day of week and have a seven day alarm clock. The trivial use case is setting the alarm one hour later for Saturday and Sunday.

With some changes to software, it is possible to make joke clocks which don't have Monday. It may also be possible to make Julian/Gregorian, Chinese or Mayan mode. Nominally, day of week output uses LED seven segment decimal point or similar. Internal calculation may or may not start from Sunday being zero. If possible, day of week calculation should extend beyond Year 2099.

After thinking about this problem for three days, I wrote most of the software in one programming session. Buttons are defined as "prev", "next" "up", "down", alarm "toggle", "snooze", "start" and "stop". From this, it is possible to set year, month, day, hour, minute, second and seven alarms to hour, minute and second. Like the buzzer game, all buttons may be wired high or low. This also allows "toggle" to be a momentary switch or a latching switch. "Snooze" suppresses alarm for 180 seconds plus randon jitter. (The fixed 300 seconds of Symbian 70 allows me to fall asleep 18 times or more over 90 minutes or more.)

Pressing "stop" once cancels the alarm sound but leaves the relay active. Pressing "stop" again cancels the relay. Pressing "start" enables the relay but not the alarm sound. Unfortunately, without the serial functionality, the compiled program requires 7KB or the 8KB ROM. I've not exhausted the I/O of Arduino Due but have exhausted I/O of Arduino Nano with 16 multiplex outputs, 8 digital inputs and 3 digital outputs. I considered mixing common cathode and common anode seven segment displays. Effectively, half of the LEDs would be wired backwards in an attempt to get more output. Unfortunately, this is a subset of the infamous Charlieplexing where LEDs are lit by a difference of Voltage between I/O pins. Although it is possible to control n(n-1) segments for large n, it works best when the total lit segments is 10 or less. There are also current limits. Furthermore, wiring arrangements may be fragile.

It is possible to use external shift registers but multiplexing the multiplexing will make the software particularly obtuse and I may have difficulty getting this working. In the most extreme case, it is possible to reduce I/O to seven lines: input stream, output stream, clock, latch, speaker output, relay output and dedicated output to gpsd. I'm not controlling a mains relay via shift registers. Nor am I sending serial via a shift register.

I checked code quality and the majority of warnings derive from #include <Arduino.h>. Regardless, the case statements to set time are too long. This may also be why compiled output is 7KB. That's too much for a clock and significantly more functionality is planned. Switch de-bounce also requires improvement.

After an US$8 retail purchase of a mix bag of LEDs and LCDs, which, crazily, includes a 1 row × 16 column LCD text display, I'm also considering a really fancy power-on light test. Something like a PWM snaking figure-of-eight across each digit.

What Kind Of Shonky Software Do People Write?

Posted by cafebabe on Tuesday January 16 2018, @08:12PM (#2931)
2 Comments
Software

I've been curious about programming standards and now that I'm actively writing software for process control (rather than wrangling other people's software without success) now would be a good time to make the software as reliable as possible. Arguably, writing in C or C++ is the first mistake but it is hard to avoid when, for example, I2C devices only have library code written in C. Even a person who refused to deploy C would be required to read it and use it for reference tests.

I found an old, unauthorized copy of the MISRA C 2004 Specification. This was originally developed for vehicle safety and is now used in medical, aerospace and other industries. And now you can add the space cadets in the hydroponics industry.

I feared looking at MISRA because I was under the false impression that it involved bone-headed stuff like writing conditions backwards to prevent accidental assignment. Feel free to do that but the prefered method is static analysis. Overall, the MISRA guidelines are very good to the extent that I'm considering a license for the current version.

Reading the standard proceeds as follows. A right-minded person initially thinks "What kind of shonky code are people writing?" Sooner or later, this is followed by "Ah, I need to change something in my current project." This is followed by "I need to change something in a previous project." This is followed by "I need to change everything I've ever written." This is followed by the later realization that not even a typical example of "Hello World" is written correctly. (Specifically, no function prototype or library call return code check.)

Writing C to MISRA standard is an attainable goal but almost everything falls short. I mentioned this to some programmers. No-one thought any of it was idiotic but some parts of conversation began with "I don't do that because ..." and omissions included declarations, error checks and default cases. I defend multiple returns from a function under specific circumstances in which execution is effectively a try ... catch construct across two functions. However, that works best in a server environment for the purpose of avoiding memory leaks. For MISRA's focus, embedded applications, even the use of dynamic memory allocation is discouraged. And that removes one of the main uses for multiple returns. An improvised try ... catch can also be used to fail fast. This seems problematic until the difference between library code and application code is considered. Library code throws errors which cannot be handled due to lack of context. Application code catches errors because the context for handling them is known. Unfortunately, "Never test for an error which you cannot handle." and an embedded system with all unhandled exceptions tied to "reboot" may lead to device failure mode in which there is a constant cycle of reboots and no further diagnostic information.

So, what kind of shonky software do people write? I let a Lisp hacker look at the code for a virtual processor. The code review can be summarized as "Your code embodies everything that's wrong about C." followed by particular ire for a portable attempt to define a 16 bit signed integer which is clearly labelled as working on gcc and clang but possibly not further. After "Well, at least you didn't use recursive macros." I noted the part where 4 byte BER fetch is implemented as a macro which is nested within itself.

That's a problem. Some embedded C is stuck in a timewarp between Kernighan & Richie's traditional C and the ISO C 1999 Standard. My style is closest to the ISO C 1990 Standard and that works fine for embedded software. However, between 1969 (when C was being developed on a 16 bit, two's compliment computer) and 1999 (when int16_t was standardized), it wasn't possible to reliably define a signed 16 bit integer and, after almost 50 years of C, it remains problematic in some cases.

This is part of a wider problem where implicit assumptions and conflations are common reasons for software being inadequate. Things like currency handling, date formats (especially two digit years), character encoding, disability support - and the magnitude of numbers.

It also fits the observation that the installed base of technology is half of the age of the technology unless steps are taken to actively skew the age. This was certainly the case for web browsers before automatic updates became common. However, it seems to apply to programmers. Effort to drop C entirely (often by people encouraging WebAssembly) is an over-reaction (and contradictory). However, tests and standards have to be greatly improved.

I wrote some regular expressions in Perl4 to test quality of C. Yes, the old joke is "Now I've got two problems." More seriously, dodgy scaffolding is allowed by MISRA C 2004 but not in deployed software. In this case, you can verify that nothing I've previously written will pass all of these tests. The worrying part is that I only wrote five easy tests out of about 100.

After unpacking the archive on a Unix system, make quality applies tests to hello.c and finds a missing function prototype but doesn't find the missing return check. Flags on the compiler find these problems and produce a warning that argc and argv aren't used. Technically, these should also be const. So, that's six problems with "Hello World". gcc and my crude checks find an overlapping subset but important problems are missed.

begin 644 code-quality-check-dist-20180112-104505.tar.gz
M'XL("+&16%H"`V-O9&4M<75A;&ET>2UC:&5C:RUD:7-T+3(P,3@P,3$R+3$P
M-#4P-2YT87(`[1IK4]PXDJ_QKU#(D'D0C^UYAA!SQ1)RH0ZRV4`J=8O98&QY
MQH=?*WM@"+"__;HEOP8F9)*09*O6316VI%:KN]4OR6.%-I7_G)B>FUS(UIA:
MI[+MQHG<4;6GJJ9U9$WM]=6^LF>>4L?UZ-)7@`HP[/?QJ0W[:OF)T!OV.DM:
MM],9J`.U/QPNJ5J_W^DM$77I!\`D3DQ&R-(_%!XIDY@I)VZ@T.",^+#+DO2(
M;(%5D-^$59`MM`JR/37]R*,D,P3`:FPUT4S(P9C"C"`.6>)._+;T",8R^R&H
MWD22]C;_LZUS\B]V]@_TM]N;+_:VR<[K_8/-W=V<*!$6:,:Q.PK:D9>V3[S0
M.BV:SB2P$C<,9(^6D"(6)F'19#29,#X\IIX7MBUI:TL?69:TO[WU3I?ER&2F
M3\S8#&1@T3K5->FW=YN[NCPB\GO3\^`_G2;,)')$;3-(7(L`HJV[<;CV=&WM
MF;:VM@8X\=BTPW-XB4(W2"B33>8F8VA;)G@1.E;V#LH<!3@C8:Z5"':3BXC&
MT.>[('$PFNVTPN",LA@$E=[L[&[G/(]&ENR[@3RF9A2['ZG>[0P'3\G-83J-
MS,#6NP-IZ]?=7W79<9D)6AD!/3T,J/3ZW1[0=%)V0EC*\5`2)PAE5#^UY8^4
MA7'1`Q(`CX$9I'V@G2C"+M],QM+^SN\%C[XYE=T`\"D\XB"643R/ZCV8:5,'
M]!2%$;R'OIO(#LR@<JI`09J:S+L0%&`B]*7$/!<F<"H^92,J@XY@]X($.1I9
M,2#XV5N<OWEF/FPZN$>,>J%IIX*!N=AR'%'K9EL62"6)QB&&1IM&R5@?D#FR
M,FI-8,?.:(JD2OMOME'+C(;,AI6Y)2,W64=FR[E**0@U+C6\,(QX2&;NR00Q
M2V-G%':.(Z2&(L;<R)2M2+8\V.9\IRS*I*W-K5?`#+?$8F5]@/H4G4@LUK6B
M;9Y0K]SQOXD?X0SIY>[FO_5:`[VI26H-]!U\HJ7B$VT.GVAE^$3KX$_0!Q\'
M5IJ2!([V3'H`S2WL1)K-S%^)'(I724KS$V#*`E7>OH5]1=K*;"!8##N++8MA
MYY%I,?0\""V&?B.P29+E43-`L9DOT$EG@R@V/5."B0?CZ#$C"#@7'UP5T,!+
MP&`Q?A`'HA7$!K*J0@!C@*?".E.3C<#05`)1*Y[X9`,W:!,VZ"_I@64F>0M0
M,9@3^91T"B3I`7"13RC6AH4AUB`^#NEYOWQLFPEPL/+?%7_%EE=>K>RM[!^W
MDVE"'C\&0I@2FJ0L`M@#LSZX&E!L6#9IMQ'1/[5=1NJUQHN=MTW0XOZ[EVJS
MUCC8`3;JP)05H83DV`/);()H^VKSF-3;;67^G#)QS/VR'$P@FKB6')X'&(`P
M$GQBP39,J'^*&:75)%=7Q+BY@/6-U,#U8LJU+S/G3L%&'UW0QAKG_U-H8E%!
M^03P.U\R8?KQ2[`3-J'IGJJ%E<`D??DX.K?1S*A-ZO&TW5*FTWK1%C7HH2JO
M'?%_K4-5.\J:7?&&PYV\LY^_#<3;++UO)76\C)N9"H!R@@3SK'NY;-K"F+D"
M"NE15WJ)&)JK7F_5;TT$;T`MW)C)%7-K.K?Z@^9<(G?7?]:"]?^-\'>?];\Z
MU$3]W]-ZW7Y_`/7_8-@;5/7_#ZG_'^8'@(@R+%9Y_<]+_I<A(YM\QWV*)=9.
M@&6^[?*JX8[Z/R__U4%6_C\B8>!=D/.00?5SCA4R5"P7A)=SX**8!(+$Y.5-
MG%S`00!J'>+&)$S&E)V[,27QQ'%<RP4T6.!\#(>%!A224'39C1K67OKS_8,7
M.Z\WFDUR*1'B.FGW7\H?AYOR[T>'?Q@-<'YB)$>K#71KZ/QPM-KD/2VC@>-&
M$]ZRGDO%#04M0FH\,^LU;1V:U[/D&ZYSQ?DID[J$I>"_\?#YAGZDXZM^9#25
MG"!/R00XWG[[EBS7U&?DW&18ZCXC61%`N'A<V9`B>&^=0-XF9KXE;2-8%AQ=
M?\[/[\?_LV+I7L__VJ"KI?[?[W0'7?3_?E^K_/_OX/];Y"6>"L&]$Q9ZY/W8
MA=/)3DQ>APF,0T5Z#N[[RP79)+^@;?RSHT)*.U:,$SO,)K1;_S*N18,SF)/F
M3S5M-A5E%*[/LFB<-)R07>7QQ3C)YA:!)%_RT##JAK%\E-$I<V/4D0FC/G=L
MF8\M%V-"C3D3)1T4RY8IE!%R(M?\_P+"'/*Y9<JW8R-E+&2?B8QH%I,D=FU*
M0H><!E#+DQ$<SWV396$2@4[=I*%.>UHSY9/`^39C\^$=.K^\F\7%PK=GXNF?
MW[-8J4>)D)IS>/UMX?P[Q_\;!]3[B?^=/KR+^]^AVAWVEM2.J@ZJ^]^?'O\A
M]F?&O$N#43)>)+IK6AK=RW?`YAF]]V@O?.J#%4Z"1%?7I9H5>A,_N-T16V%$
M>0?Z>&F<-XO1_.WS>:0(?O`'AU<EC-=GNA_-Z<1`8@0&.UJM96,0>/(`V88`
M\_BQX"*/[F695G6/[X*8D<:O&2'G8Q1BKJ[.J5_O.Q%"4VA2F\/A9Q@4<V9B
M,G)H7->*Z)L-B!D;`_4>PC*0B]'42H1G0W(F8[H3JV7.T\TIS5V?QVB)SUI"
M_0@ORAHSRE%*Z,TL82$51-\8=@H"7R\J>"(S1Y2(=8702+Z<(Z_+8M]T(I&>
MD*N20C+A2H(5JE'*F,VLQN$R%9MWASP9R[E<Q7:5.<_8*A:>PU79GY02ZBQ;
MA:J_A*U/J?1'IO/OG/^SJ_5[/?]UU'YV_]/MPW\\_PVT?I7_?W[^?Y-_C_RI
M![MX<D).*8V$-_.O;GH\=IV$YU>1?(HV\ELTT:G%#$@D#1R#L'A%IPEEP97X
M/G,%Z2:]O2G2C!@B]7HY9?)/*_1/4E^OY[G/IM:EX.%:%RO-Y+!\RF5YBC-O
MBC2SZK7T)85((\WBZYBFC4OC>K%DGG8U#M>-RZ.F\ABUW*AI3VJ=)[5N4Z$N
MUBG`"9R)J`DG_QJ%S6KPCT.G]"(F*R!]?K1^F'/*!034Z^:777A9'B@C35@P
MO4[L$")]$"9D#.&VE-#*L?4.WIRYO%E?Q5LIE\YE+>=]0=X*O=VA-K2@G&$2
M4%+"6:3LL5W'H0R\:+Z6S<`N2P;<IM+]O`-I!7_C_)]_S[[7^]]N?YCE_V&O
MB_>_0U6MSO]_B_O?O8F7N/B[K[<B,^U#VJ;%YZ#\AN#G%@BU.&'F!3_%NP'>
M`2YVC/^^I^""D]M'6N-$N%)Z!PI9NU1ZI#/3T_KMP_#L65@@;VC??@S&CTLI
M.0);[T/N2@N2^-9I>$:XF]?6MV1KM^X6F&]?=CL!N0:I\3Z!<8=`?F:?*:.0
MPR8!&E!*5!P6!?M5$OOZ^)_^6.FKUL`@/^CU/OG]7QOT\M__]O#^5^MWM>K[
M_P\!I45>\1^6O0^99Y.6`K'9#2QO`C'@>9S8;M@>;_`?>Q'?=(,&OIAL9#VQ
MQB9KM?#]K.2D3F-9D#M'<@_![?C-BG#.AMJLG+"""BJHH((**JB@@@HJJ*""
C"BJHH((**JB@@@HJJ*"""BJHH((**JC@GN'_5VX!IP!0````
`
end

(Usual instructions for uudecode process.)

My Ideal Processor, Part 10

Posted by cafebabe on Tuesday January 16 2018, @08:10PM (#2930)
0 Comments
Hardware

(This is the 61st of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I've been working on a virtual processor and I've got benchmark figures. The virtual processor is intended for process control but it may have general use or use on FPGA, GPU, optical computer or mainframe. In particular, there is a market for object code portability and high uptime across x86, ARM, AVR and other processor architectures.

I've been working on the instruction set and I've worked through a long list of decisions which are loosely connected. These aren't choices which can be made in isolation and without consequence elsewhere in a design. However, each choice makes minimal progress because a sensible set of choices may lead to something impractical. For example, 64 registers × 64 bit requires a 4 kilobit context switch. Numerous registers are great. Wide registers are great. But fast interrupts are also great.

After going through this process repeatedly for more than three years, I've devised a fairly conventional processor architecture (3-address, Harvard architecture) which provides cross-compatibility for 8 bit, 16 bit, 32 bit, 64 bit or possibly larger registers and address pointers.

I wrote a fetch-execute instruction interpreter and then called it from a loop which tries all permutations of the 2^28 instructions. This crashed repeatedly. The optional floating point encountered divide by zero. After #include <signal.h> so that SIGFPE (floating point exception) could be set to SIG_IGN (ignore signal), it was trivial to set SIGUSR1 and SIGUSR2 to trigger virtual processor interrupts. It optionally outputs a PID file to make signal invocation easier. That's great fun. With one Unix command, it is possible to send virtual interrupts to a virtual processor.

Further testing found further problems in approximate order of difficulty. Apparently, I missed a bound check on a virtual stack. After these problems, the test ran smoothly. I expected the first full run of 256 million instructions to require about 30 minutes. It finished within five minutes. That's approximately one million instructions per second. After brief disassembly and seeing 140 byte stack references, I applied the standard flags. This reduced the frequency of stack references and their severity to 40 bytes. I also obtained the usual double performance. That's approximately two million virtual instructions per second for integer performance and 0.4 MIPS for a mixed integer and float workload. Optimistically, on a Raspberry Pi Model B Revision 2 with 750MHz clock and 697 BogoMIPS (approximately 14 instructions per 15 clock cycles), it takes a minimum of 132 clock cycles to go around the fetch-execute loop but many of the instructions exit early due to bogus opcodes, bound check failures or similar.

Despite being a 64 bit SIMD implementation which is vastly more powerful than a 16 bit computer from the 1980s, this arrangement captures less than 1% of the efficiency of the host architecture. I looked at the disassembly more closely. My code has addition of 8 bit integers and this involves a loop with eight fixed iterations. gcc correctly unrolls this loops and converts it to SIMD instructions. However, this doesn't occur for floating point instructions. This may be a limitation of the compiler or the target architecture. (ARMv6 with Neon.)

What am I trying to achieve? Optimistically, it is possible to achieve 1/3 efficiency. That would be one clock cycle to fetch a virtual instruction, one clock cycle for branch fan-out to each case statement and one clock cycle to implement a virtual instruction on native hardware. There is no need to return to start because duplicate fetch and decode can be placed at the end of each case. Therefore, bytecode interpreter execution flow may bounce around unstructured code as required.

Nowadays, I presume that fetch and branch incurs instruction pipeline stall of at least two clock cycles. Also, I'm attempting to implement 8 register × 64 bit on hardware which is 8 register × 32 bit. Furthermore, operations on arbitrary virtual registers (nominally R0-R7) requires transfer to internal registers (nominally A, B and C) where cases may perform A=B+C or A+=B*C. This particular step would be easier with a stack virtual machine but difficulties occur elsewhere.

Current efficiency is a mystery. Maximum efficiency is a mystery. My implementation won't match the execution speed of Dalvik which is deprecated in favor of ART [Android RunTime] which is claimed by Oracle to be have less than 1% of the efficiency of HotSpot in some circumstances. That's a familiar figure.

My interpreter and registers may exceed the size of the native processor cache. Like Dalvik, the opcode is in the least significant bits. This avoids bit shift prior to branch. That sounds clever and planned but it isn't possible to put an opcode elsewhere and get a broad range of 1 byte BER packed instructions. However, when testing instruction patterns sequentially, the cases of the instruction interpreter are called in rotation. This may incur significant cache churn. If this is the case, I might get significantly more throughput by testing a more realistic distribution of opcodes.

If this was intended to run games and apps, it would be fast enough to run Leisure Suit Larry but not fast enough to run Fruit Ninja.