Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


cafebabe (894)

cafebabe
(email not shown publicly)

Journal of cafebabe (894)

The Fine Print: The following are owned by whoever posted them. We are not responsible for them in any way.
Sunday August 07, 22
03:35 PM
/dev/random

I've been absent from SoylentNews for quite a while. Reading week extended to two years and my Internet access has been limited; in part because I thought that my favorite Internet cafe closed. I'd like to participate in discussions but I am, invariably, at least one week behind.

In my absence, I've been designing vehicles. Although, with limited funds, limited Internet access and limited opportunity to test, this has largely been restricted to the endless task of making a car computer. I'm astounded how little innovation has come from Silicon Valley and vehicle manufacturers over the last few years. I'm also astounded that my vaporware remains competitive with Tesla. Specifically, the general availability of the XV6401 matches the Tesla CyberTruck.

I find it bizarre that car companies, graphics companies, start-ups and Silicon Valley behemoths are all competing to make autonomous vehicles which can be rented like a taxi. However, their failure has been apparent. Apple has stagnated into minimal growth and scams. Searches of Apple products peaked in 2013. Sales of Apple hardware peaked in 2016. In-app purchases peaked in 2019. In 2022, Apple is re-negotiating in-app purchases with large companies, such as Uber. This might become more widespread by 2025. Until then, enjoy Apple's television with Jennifer Aniston and Apple credit cards with Goldman Sachs. (For reasons which escape me, they didn't use a vampire squid as the logo.) Likewise, the three year cadence of Tesla products stalled years ago. Felon Musk has been promising a fully autonomous vehicle Next Year(TM), almost every year from 2014 to 2021. Similarly, the Real Model Y is also available Next Year(TM). The hope and the shipping product increasingly diverge. Felon Musk promised over-the-air updates which add functionality. Instead, Tesla renters butt dial a US$6000 upgrade which isn't transferable to the next sucker. Well, did you expect integrity (or an ecological product) from a guy who is his own step-uncle and has nine children via three women? That we know of. What a keeper.

Ignoring people who literally own countries, Elon Musk became the richest person in the world by flipping a B2B marketing turd that no-one remembers then investing at the tail end of PayPal. After flipping that B2C turd to EBay, Elon Musk invested in the tail end of Tesla which took a modified version of the Lotus Elise with borked gearbox and got it safety certified to drive on UK roads. Worryingly, after more than a decade, Tesla remains dependent on Lotus product developments. Regardless, Tesla launched Elon Musk into the big league of too big to fail. Specifically, every venture is now a B2G government welfare scam. This includes $2 billion of direct eco subsidies, $5 billion of indirect eco subsidies, $16 billion for StarLink (because we really needed another Iridium) and every tunneling contract. All of Musk's ventures are structured around government funding. The minority of B2C is disproportionately to government employees - because hardly anyone else can afford $50000 on eight year credit. You can fool some of the people all of the time but after Elon Musk's ex-wife admitted illegal activity to permanently take down a gossip site, it appears that Elon Musk may have followed similar techniques to keep stock levitating. Even here, we have scams within scams. For two weeks, Tesla accepted BitCoin - and then it didn't. Not due to new information but due to some bunkum about carbon emissions or something. However, you might have missed the part where sales of the temporarily inflated BitCoin increased Elon Musk's personal fortune by $150 million. That was a clever transfer of wealth from Tesla's goodwill via the crypto ponzi suckers to Elon Musk's personal wealth.

I thought that Elon Musk's ventures would collapse around the privately held SpaceX but it now seems more likely to collapse around the publicly held Tesla - which remains unable to achieve volume and is losing money on every vehicle manufactured. This is especially true when electric vehicles get less eco subsidy and the cost of electricity has increased. Now that the astro-turfing has failed, Elon Musk is trying to peg Tesla's worthless stock to other Silicon Valley turds, like Twitter which lost 15% of its value when it banned Donald Trump. Elon Musk - who requires adult supervision when using Twitter - may try to escape a $1 billion exit clause after establishing that the majority of Twitter accounts and Tweets are astro-turfed. You can't scam an honest man. However, some scams look really obvious to other scammers.

Jeff Bezos - who self-identifies as an astronaut - is diversifying out of Amazon. You should do the same because Amazon is going out of business. Amazon is running out of staff to exploit - staff who are violent or urinate in bottles to achieve unrealistic deadlines. Amazon has fired more than 20 million staff; mostly for performance which is acceptable in similar roles at other companies. If you're stupid enough sell on Amazon, you don't want to be too good or too bad. If you're drop-shipping from China and using Amazon hosting then expect to be cut out of the loop by Amazon's full time Chinese negotiators. Likewise, if you've been banned from selling on Amazon then expect to pay a quasi-para-legal more than $1000 to file your appeal - which will be read by a member of staff who is paid less than $15 to read your appeal in less than four minutes.

Seattle's big four (Boeing, Starbucks, Microsoft, Amazon) are all going to be unionized. In 2017, I suggested that computer professionals in small companies should unionize for professional development and to avoid exploitative conditions. This was met with all kinds of nonsense such as seniority (when you're the only computer person in a company) and interfering in commerce (when conveying truthful market information). This was disproportionately from anonymous messages - some of which were likely to be astro-turfing. Anyhow, people with more recognition than you, with better pay and conditions, think they're being screwed. You're probably being screwed too. Here are the most important points to remember. Corporations are people and money equals free speech therefore discussing your salary with colleagues is an inalienable right. No pay rise or pay rise below inflation is a pay cut. Asking for a raise (or seeking more pay elsewhere) is the rational response to inflation and never its cause. Requiring references is market asymmetry. Exclusive contract is unfair contract. In many places (UK, Germany), automated hiring and firing is illegal. And many people in the UK are unaware that it is illegal for one shift to start within 12 hours of another shift. If you finish at 10PM, for whatever reason, it is illegal to start work before 10AM.

The film and television industry would like to thank Netflix for all of the revenue and research. Friends and The Office accounted for more than 3% of video watched. Netflix pays to see if it is successful and Netflix has become a revolving showcase of television which will be exclusively available elsewhere. Meanwhile, original content from Netflix, Apple, Facebook, Google and everyone else has less than half of the success of established studios. Wow. Who knew that it took experience and established techniques to make profitable entertainment? Meanwhile, Facebook continues to plagurize video and YouTube has devolved into unwatchable adverts. A video may be preceded by double unskippable adverts which are longer than the content. Or a video may have double unskippable adverts every five minutes. People stopped using Lycos when AltaVista had more content and less adverts and people stopped using AltaVista when Google had more content and less adverts. Well, people are ready to move on. Unfortunately, Google's 10 nuclear power station base load of servers is highly multiplexed given that most nodes simulteously operate as search spider, query engine, mail server, database replication, video transcoder, streaming server, neural network trainer, protein folding, real time speech translation and a long list of miscellaneous tasks. Anyone stupid enough to start a competing service gets cloned within one week while failing to match Google's costs. However, nothing is permanent at Google except the services of most interest to spooks: search and mail. Google has discontinued more than 100 services - including two social networks.

Regardless, all of that flatland Khan Academy back catalog is worthless when 3D motion capture requires less bandwidth. The transition to imersive 3D will be slow and painful. However, future generations will establish a floor of expected quality. When I was a kid, I didn't watch black and white films. Kids today won't watch NTSC. Kids of tomorrow won't watch flat bitmaps. This definitely won't occur before 2023 but I have no idea how long it will be deferred. SnapChat has wanted 3D glasses since at least 2017. Projects, such as CastAR, have been in gestation since 2015 or before. However, until a sane format emerges and sane hardware for viewing, it will be /dev/null for experimental content.

Part of the general stagnation may be due to the chip shortage which has become quite severe. Taiwan chip manufacturers recently overcame drought with the aid of reverse osmosis hardware. Chip manufacture is now affected by a global neon shortage. Apparently, more than half the suitably pure neon was refined in Ukraine. There is an estimated 1-6 month supply of neon - and that estimate is from two months ago.

I ordered some Atmel AT28C256 EEPROMs in Dec 2021 and I'm expecting delivery in Aug 2023. When Atmel aren't supplying the brains for most Arduino products, Atmel supplies the same 12V tolerant microcontrollers to vehicle manufacturers. Atmel also supplies low end touchscreen sensors and programmable logic. Atmel is very good at packaging components in DIP and probably do so at a loss. This allows easy prototyping on 0.1 inch "breadboards". However, this is exactly how Atmel gained a huge quantity of timewasters who want newbie help or low volume quantities of components. I presume that Atmel makes big batches of DIP components infrequently; possibly once every three years. Ignoring the entirely absent EEPROMs, hobbiests have been quite boastful that Atmel's DIP components are unaffected by the chip shortage. However, Atmel products are becoming rare, Atmel isn't updating products - or support software. Atmel's Windows only programmable logic applications haven't been updated for more than 10 years; in part because this would benefit conversion software written by rivals. Anyhow, Atmel's products, finances and support are increasingly dicey and this may result in all Arduino development migrating to ESP8266, ESP32, ARM or RISC-V. This will horribly fragment hobbiest development. Supported libraries for Arduino Uno (Atmel AVR) and Arduino Due (Atmel ARM) differ more than they share - and this is the most similar case.

I've been working on a 6502 system for automotive applications. I thought this was a niche within a niche. However, Mike Naberezny (the owner of 6502.org and author of Py65) recently discovered a 6502 in the dashboard of a Volkswagen Jetta. I have moderate interest in the 6502's 16 bit extension. Firstly, it has the awful designation 65816 which is supposed to denote 8/16 bit operation. Secondly, it achieves this with mode bits which make the instruction stream ambiguous. Regardless, it was used in Apple IIgs, SNES and remains available to present - or at least until the last batch of TSMC 600nm DIP chips is exhausted. Hobbiest 65816 computers are available with up to 64MB RAM (Feonix) and this project only forked from Commander X16 due to different objectives. Despite being the easier project and despite rejecting the 16 bit extensions, Commander X16 has been emulator only vaporware for years and may collapse now that the VERA graphics FPGA is available separately.

Despite the constant fragmentation in computing, something rather wonderful is happening. The cost and capability of one unit of a bespoke computer is now more favorable the mass produced, 16 bit systems of the 1990s. 128KB of 15ns SRAM is less than US$1 and 600nm chips (manufactured with optical lenses) can be be run much faster than their official speed. Running 14MHz 6502 or 65816 at 20MHz isn't really considered overclocking. Indeed, 25MHz or 28MHz is possible with care. 30MHz is possible with overvolting. A few more tricks like asymmetric clocks may make 35MHz or 40MHz possible. Obviously, this isn't recommended in critical applications. However, it shows the scope of what is possible. Indeed, a Commodore VIC20 with 10 times the processing power, 10 times the RAM and 10 times the graphics is cheaper and more reliable than a vintage VIC20.

There is a qualitative difference too. 30MHz 6502 can simulate itself running at 1MHz. This may be Wirth's law in action. (Increases in hardware are more than soaked in inefficient software.) However, it allows an 8 bit processor to be soldered to 8 bit RAM, 8 bit ROM and 8 bit I/O. This minimal 8 bit system may then operate as a 32 bit virtual machine.

Using text to speech, I've easily read more than 20 million words. I've got to the point where I'll scrape a website, de-duplicate the text - and then find that's I've only got one million words to read. Whatever. If I make an effort then I'll finish within two weeks. Initially, I started using text to speech because I wanted lucid dreaming. Specifically, I wanted floating and flying dreams. Unfortunately, this did not occur. I have come to an understanding with my subconscious that I cannot have lucid dreams and deep, intuitive insight. This is because the dream period for intuitive leaps is the same period for lucid dreaming. It is one or the other and I choose insight. Unfortunately, I discovered this after a series of increasing oblique deceptions to ignore the audio stimulus followed by increasingly unpleasant dreams.

Regardless, I've discovered the weird concept of MallWorld dreams. I have no idea if it is a neural artifact or the collective unconscious. Regardless, the common but ambiguous themes are striking. Most significantly, the endless toilet dream should be familiar to anyone. An endless school, hospital, office or retail space may also be familiar. Actually, since discovering MallWorld, I find it odd that I've had two ship dreams but zero spaceship dreams. Likewise, I don't dream about cartoon characters. Given that I've binge watched huge amounts of science fiction and cartoons, I find this to be curious. Others report a similar variety of dreams.

Sunday May 01, 22
03:23 PM
/dev/random

Sometimes the comedy writes itself:-

Unfortunately, this means the future will be broken.

Monday October 11, 21
10:17 AM
Hardware

(This is part of an occasional series which began with Make Your Own Camera Tripod and Make Your Own Boxes With Rounded Corners.)

Tools: Scissors, needle.

Materials: Broken umbrella, elastic, sewing thread, (optional) glue.

A while back, I invited my friend to a beginner ballet class. I hoped my friend would become a regular and that we could meet often but unfortunately not. My friend was satisfied to attend one class and never repeat it. Unfortunately, this one class remains notable. My friend wore a home-made circle skirt during the class. Furthermore, my friend chose to wear this with the most incongruent velor leggings. It was very anomalous and it obviously made an impression. About 16 months after this class, the dance teacher asked me "How's your friend? Pieris? The one with the skirt?" Oh, jeepers. Said teacher had seen hundreds of new people since Pieris and yet few, if any, had made an equal impression.

Pieris is obsessed with making circle skirts. When Pieris isn't making circle skirts afresh, Pieris is often making retrospective improvements or repairs. Pieris has identical length circle skirts in 15 plain colors and various designs which are striped, checked and patterned. If the material is available, Pieris has a circle skirt in Gingham black, red, yellow, green, blue or purple - in 1/4 inch, 1/2 inch, 3/4 inch, 1 inch, 1+1/2 inch and 2 inch widths. I've made four circle skirts with Pieris and, unsurprisingly, I've received a circle skirt as a Christmas gift from Pieris. I'd be only mildly surprised if Pieris made circle skirts in all official tartan clan patterns. Indeed, Pieris had at least five tartan patterns in a collection of more than 160 designs. It would be fair to say that my friend's primary interest in life is circle skirts and it is possible that my friend has only appeared in public wearing a home-made circle skirt for at least ten years.

My friend, Pieris, now looks beyond pristine fabric for clothes-making. A bonkers example came, at a bus garage, in the rain, when Pieris saw a broken, discarded umbrella in a litter bin - and salvaged it to make a leopard print circle skirt. When you only have a hammer, every problem looks like a nail. When you're obsessed with making circle skirts, anything may be the next circle skirt. Unfortunately, this type of insanity is contagious. I mentioned umbrella skirts to friends and they thought it would be funny to make a skirt from a stereotypical red, yellow, green and blue golfing umbrella. It was rather uncanny to find such an umbrella on the way home; especially when the weather wasn't wet or windy. That's how and why I started salvaging discarded umbrellas. After this bizarre incident, I've collected more umbrellas; mostly large black ones. I intend to use them as material for a junk couture beginner sewing class but opportunities to teach have been limited. Given it has now become a trendy activity to recycle unlikely materials into clothes, I'd thought that it would be worthwhile to forward some wisdom.

Firstly, very large umbrellas are preferable unless you are making child size clothing or a very short skirt. A zipper is rarely practical with thin plastic and therefore the skirt is likely to be elasticated. This requires a waistband circumference which fits snugly over hips - or wider to prevent tearing. This is likely to reduce length by 6-8 inches (15-20cm) - or significantly more if the elastication is to be hidden within the material. Actually, elastic is likely to be visible unless you begin with a particularly large umbrella.

Secondly, a dark or highly patterned umbrella is preferable. For example, leopard print. White or pink umbrellas tend to be too revealing - and especially so when wet. Short or bright umbrellas work as an underskirt to add volume but they are not suitable on their own. A bright underskirt may be preferable with dark elastication to avoid unsightly seams. In all cases, 3/4 inch, 1 inch, 2cm or 2.5cm wide elastic is highly preferable.

Thirdly, an umbrella is invariably discarded because the frame is busted. The covering may have light scuffing but is otherwise in good condition. If the pattern is quite busy then minor holes can be repaired with glue. One exception to covering quality - which I have discovered the hard way - is that an umbrella discarded by a fence, wall or tree may be covered in dog urine. If the urine has dried, this may not be immediately apparent. A busted umbrella found on a park bench or in a litter bin is very unlikely to have this problem.

Fourthly, in the season when discarded umbrellas are most plentiful, it may be prudent to carry sharp scissors. This eliminates awkwardly carrying a broken umbrella frame. Removing the material from the frame is a relatively dangerous operation which may be hazardous to eyes. I've found that a relatively safe method is to hold the umbrella handle under one arm while cutting the fabric from the frame with the other arm. Furthermore, cut at the bottom of the umbrella. This keeps your head near to the fulcrum of the umbrella rather than the most dangerous stray ends. The material in middle of the umbrella can be cut very sloppily because a larger hole will be made for the waist.

Fifthly, opinions differ regarding umbrella trimmings. Pieris removes plastic or metal ends which affix the covering to the ends of the umbrella spokes. I prefer to keep them because it weights the skirt and reduces inadvertent blustery moments. Likewise, Pieris removes any strap which is used to keep an umbrella furled. I believe that straps remain convenient for storage. However, it should be noted that I'm vastly less experienced and it should be particularly obvious that I've spent less time considering umbrella skirts in detail. For a beginner, leaving the strap eliminates one source of error and, anyhow, if you're going to wear an umbrella, own it and don't leave any doubt. (Who knew that umbrella skirt manufacture had ideological differences?)

Finally, an umbrella skirt is very suited to wet weather given that it is invariably water-proof and/or quick drying. Plastic material also matches water-proof nylon and vinyl clothing. A particularly suitable combination is red nylon jacket, black umbrella skirt, red leggings, black boots and, of course, a black umbrella in good condition to match the skirt.

Tuesday September 28, 21
05:10 PM
Software

For previous unconventional ideas regarding video codecs, see:-

My plan for a video codec is quite vast. Ultimately, it is a replacement for XWindows and Wayland. However, I like to keep it real. Therefore, I've been thinking about the minimal subset. I believe this would be known as MVP [Minimum Viable Product]. In my case, the MVP runs on a Commodore 64.

There are a few fundamental concepts about video encoding and I question some of them. The first concept is lossless/lossy compression. The quantity of video is usually too vast to store, transmit or display without whittling it down somehow. Traditionally, this is described as a lossless or lossy conversion. However, this misses a more fundamental concept of content aware and content oblivious compression. Only content aware compression can be lossy. Indeed, video compression is often arranged into two tiers which cover both cases. The interesting tier is the content aware compression where several liberties are taken with the raw data. The boring tier is often mentioned in passing as "entropy encoding". This will do the tedious task of, for example, Huffman compressing the quantized DCT [Discrete Cosine Transform] wave amplitudes.

This leads to the second concept: image kernels. Images have spatial coherence and video additionally has temporal coherence. One pixel looks like its neighbors. One frame looks like the next. Such properties can be used for many types of trickery. A fundamental concept of many video encodings is that individual pixels cannot be updated. Unfortunately, resolution has often been the enemy of quality. Cinepak AVI allowed 2*2 pixel blocks to be updated. Unfortunately, the specification of Cinepak is vague and the color-space is a bodged binary approximation of YIV. Regardless, it is possible to play Cinepak AVI on a 8MHz ARMv1. By the time we get to JPEG and MPEG1, the H.261 DCT is commonplace and the 8*8 transform is often used with 16*16 macro-block subsampling. HEVC moved to 2*4 and 4*4 blocks (and popularized a natty variant of a quad-tree) and we might get back to 2*2 blocks in future codecs. (There is no need for a 1*1 kernel because that would be lossless and therefore redundant.)

Via the tile mosaic theorem, a two dimensional variant of the hand waving Central Limit Theorem, it is possible to construct a tile agnostic image/video codec such that tiles are chosen to minimize or eliminate encoding error. Furthermore, subsets may be selected for particular applications, such as low bandwidth encoding, low latency encoding, symmetric encode/decode, minimal energy consumption, maximum resiliance or compatibility with legacy formats. More choice means a better fit. However, more choice does not imply over-choice.

This leads to the third concept: block transforms. I've recently discovered a very crude 16*16 image kernel which can be encoded and decoded on 8 bit hardware. It comes from questioning the prevalence of H.261 DCT. When the Joint Photographic Expert Group crowd-sourced an image encoding, the result was used extensively in JPEG, MJPEG, MPEG1, MPEG2, MPEG4 and elsewhere. This trick has only recently fallen out of fashion after more than 25 years of use. There are many ways to encode waves and they are very loosely equivalent. This includes Maclaren expansion, Taylor series, Fourier Transform, Discrete Fourier Transform, Cosine Transform, Discrete Cosine Transform, various techniques involving waves, chirps, wavelets and chirplets - and Walsh functions. Many of these only work in one dimension. Thankfully, in all cases, two dimensional data can be reduced to one dimension with a zigzag or spiral transform. Maclaren and Taylor are useless for image/video because they don't work with discrete pixels. Nor do they lead to any useful reduction of data.

Fourier has the disadvantage of producing two sets of co-efficients: one set for sine and one set for cosine. Some audio codecs, such as early versions of RealAudio, solved this problem by discarding all phase information. This reduces data by a factor of two while leaving one set of ordered co-efficients. Unfortunately, trumpet and drums sound particularly horrendous after this reduction. While this is borderline acceptable for low bandwidth audio, it doesn't work for imagery. DCT solves this problem. The half steps of cosines leads to one ordered set of small, integer co-efficients which preserve phase information. Technically, DCT could be applied to audio but more advanced techniques had already been discovered. Indeed, you probably wouldn't apply DCT to audio because the processing power requirement is O(n^2) for n audio samples. Predictive encoding is preferable for audio. Meanwhile, image compression with chirps, wavelets and chirplets were a minority interest due to the required processing power. Although, admittedly, chirplets are extremely useful for photo stitching. It is mainly through JPEG2000 and the Red One camera that lossless wavelet spirals went from a spy satellite image format to a film industry intermediate format.

Starting from the question "Can I play video on a Commodore 64?" the answer is "Yes, but the quality will be appalling." A Commodore 64 doesn't allow independent choice of background color within each tile. This will result in horizontal banding; similar to the limitations of Cinepak. Ignoring the Commodore 64's four color per tile, half resolution mode, the remainder falls into the neglected case of Walsh tiles. Quality is going to be terrible for multiple reasons. Firstly, it can only represent the most prominent wave whereas JPEG DCT can represent a collection of waves. Secondly, Walsh tiles can only represent hard edges. In the general case, smooth edges are preferable. Thirdly, pixel ordering determines fringing artefacts and there is no good ordering which eliminates all cases of fringing. Minority cases where a hard edge is preferable, such as titling, will also be bad because the fringing is bad. The easiest way to minimize these undesirable interactions is to have a hierarchical encoding; much like hierarchical JPEG. Unfortunately, this isn't possible on the target hardware. Fourthly, There is also the problem that an 8*8 transform only leads to 64 encodings. That's six bits per byte. That's highly undesirable because it discards 1/4 of the potential video bandwidth. That's something which should definitely be avoided when playing video on a low power system.

The encoding limitation can be overcome with one of many ugly hacks. For example, it is possible to store four tiles in three bytes. This is the opposite of uuencode or binhex encoding where three bytes of binary data are held in four bytes of text. Unfortunately, the target hardware is very poor at unpacking bit fields due to only having single bit rotate instructions. Single bit rotation adversely impacts more advanced encoding schemes, such as Lempel-Ziv or Shannon-Fano compression. Therefore, it is preferable to use the remaining two bits for an unrelated purpose. For example, different schemes to minimize fringing or text symbols for titling.

A Commodore 64's 8*8 tile size is the major limitation. While the byte wide tiles and a square ratio seem suited to 8*8 DCT encoding (or a substitute), the choice of JPEG DCT was a compromise between image quality and compression ratio with the available processing power. If more processing power was commonly available, JPEG DCT might have been 16*16 pixels. Or maybe something unusual, such as 12*12 pixels. This would have led to 144 or 256 quantized co-efficients being entropy encoded with a different scheme to handle the potentially larger steps between non zero co-efficients.

Asking the question "Can I play video on idealized hardware and map it down to a VIC20 or Commodore64?" is a productive line of thought. I had already chosen 16*16 tiles and the ability to set foreground and background independently within each tile. Now I have 256 tiles and I can redundantly define tile bitmaps on lesser hardware. However, I have yet to define a zigzag/spiral encoding. Nor do I have a set of 256 Walsh functions. I hadn't taken into account that Walsh functions have their own ordering problems. However, after staring at the Wikipedia and Wolfram diagrams for about one hour and then faffing a script, I generated a subset of one dimensional Walsh functions as follows:-

# perl -e '$w=8;for($y=1;$y<=$w;$y++){for($x=0;$x<$w;$x++){if(($y*$x+$y/2)%($w*2)<$w){print"1"}else{print"0"}}for($x=$w-1;$x>=0;--$x){if(($y*$x+$y/2)%($w*2)<$w){print"0"}else{print"1"}}print"\n"}'
1111111100000000
1111000011110000
1110011100011000
1100110011001100
1101101100100100
1001011010010110
1010010101011010
1010101010101010

where $w must be even and should be a multiple of four, +$y/2 is a fudge factor to increase orthogonality and %($w*2)<$w is a shader idiom to produce stripes. (See Christmas 2020 JavaScript archive for example stripe shader.) The fudge factor is particularly important. It ensures that half of the bits flip with each increase of frequency. This is a very minimal requirement for a set of Boolean orthogonal functions. It ensures that encodings are not bunched and are chosen with equal probability.

This is encouraging; in part because PETSCII already has a subset of Walsh tiles. Specifically, a dithered tile, squares in opposite corners and half filled tiles vertically and horizontally. Actually, this led to the next question "Can I encode video on a Commodore64?" and the answer is yes for specific encodings. The insight comes from PETSCII. It is possible to encode the horizontal and vertical frequency of each tile separately. For the horizontal frequency, the average of each column is reduced to 16 RGB values and the nearest pattern is found. This process is repeated for the vertical frequency. When the frequencies have been found, the 256 pixels can be partitioned into two sets of 128 and the average color for each partition can be determined as a linear sum before being right shifted by 7 bits (divide 128). None of this requires multiplication and, assuming 8 bit per channel video, none of this requires handling anything larger than 16 bit integers.

So far, I've only defined 8 of the 16 Walsh functions. I require another 8 Walsh functions. If we consider Walsh functions as a Boolean version of Fourier, we've only got the cosines. We also lack zero frequency (described in JPEG as the DC co-efficient). Staggered pairs of Walsh functions allow textures to be approximated as integer waves with 90° offsets. (This approximates positive sine, positive cosine and - by reversing foreground and background - negative sine, negative cosine.) This doesn't work well for diagonal textures or textures which aren't an integer frequency. In particular, like JPEG, it doesn't handle waves which are longer than the block size. The preferred solution, like JPEG, is a 2:1 hierarchical encoding to trivially increase the block size. Increasingly detailed passes encode residual data and, possibly, a final entropy pass allows lossless encoding. The entropy pass is common in audio codecs, such as MLP [Meridian Lossless Packing] but absent from JPEG.

A Walsh tile is encoded as two nybbles where each nybble represents a frequency and phase. For display only, the arrangement of frequency, phase and nybble field is immaterial. Encodings map to bitmap patterns which raster to display. No further processing is required. For downward compatibility, encoding choice is very important. Firstly, I choose alternating bits as encoding zero and all ones as encoding one. The remaining 14 encodings are frequency pairs for 1-7 full waves. Secondly, I choose a discontiguous field arrangement such that the top bit for the horizontal and vertical nybbles are in bit 6 and bit 7 of a byte. The reason for this is extremely non-obvious but has historical precedent. 6502 peripheral chips, such as 6522 VIA and 6551 ACIA have the most important flags of the status register in bit 6 and bit 7. When a peripheral chip status register is copied to 6502 accumulator, bit 6 is copied to the oVerflow flag (also in bit 6) and bit 7 is copied to the Negative flag (also in bit 7). From here, opcodes such as BVC [Branch oVerflow Clear] do not require any preceding bit test. This only works with the top two bits and is typically used between 6502 and 65xx series peripheral chips. However, it works equally well with nybble encoded Walsh tiles stored as discontiguous fields. This is very much like my 8 bit floating point proposal. Although, rather than storing an IEEE754 style exponent and mantissa as discontiguous fields, we store two frequency nybbles as discontiguous fields.

This awkward arrangement has no performance impact on video decode to 16*16 tile display. However, it offers two decode strategies when downsampling to 8*8 tile display, as commonly provided by the VIC or VIC-II chip found in a VIC20 or Commodore64. The first strategy is redundant tile bitmaps such that all horizontal or vertical encodings specifying five or more waves display alternating bits in the horizontal or vertical axis. The second strategy is to use BPL and BVC opcodes to skip bit mask operations. This correctly clips all encodings which exceed the resolution of the 8*8 tile while reducing 8 bit Walsh tile encodings to the contiguous range 0-63. Or 192-255, if that is more convenient. This is a classic time versus space choice which saves 1.5KB RAM allocated to tile bitmap. It also allows other uses for the remaining 192 tiles, such as window decoration.

Commodore64 video in window? Are you mad?

This is the most sensible part. A standard benchmark for 8 bit video is the quality of rendering the Bad Apple cartoon which mostly features line art of a witch, the moon and a rotten apple. One particularly sneaky trick to render it a 640*512 on an Acorn BBC Micro with 2MHz 6502 is to encode the video as an executable binary. In this case, the "codec" is a raw sequence of 6502 instructions to update the screen buffer. An extension of this technique would be to transcode 640*400 video to 40*25 Walsh tiles. I believe that the cool kids call this 16:10 aspect ratio. I call it Commodore 64 aspect ratio. The laziest way to achieve this is by splitting frames with ffmpeg:-

# ffmpeg -i foo.mp4 -ss 20 -t 40 -vf scale=1600:900 -vf crop=1440:900:80:0 -vf scale=640:400 frame/%08d.png

This example starts from 20 seconds and limits output to a duration of 40 seconds. It will scale, crop and re-scale video of any resolution (ideally 16:9 or wider) and dump frames at the original frame rate to a directory with film industry eight decimal digit numbering starting from 00000001.png which is suitable for further processing with a prototype encoder which invokes libgd which invokes libpng which invokes zlib. For very rudimentary testing, modified frames may be assembled into a GIF animation using GIMP or re-assembled into a video with the original, unmodified audio using ffmpeg. For playback in a window on a Commodore 64, render to an off screen buffer using a raw instruction stream. Using a dynamically generated instruction sequence, the visible part of a window may be copied to screen. Assuming you aren't moving windows around the screen, this process gets *faster* as the video becomes more obscured.

There is a further trick. A full video exceeds available RAM and therefore frames of video are loaded into RAM in bundles. One or more bundles may be loaded while another is played. Within each bundle, common elements across frames may be grouped into subroutines - and the common elements do not have to occur in successive frames. Indeed, we get a birthday paradox where there are n(n-1) opportunities to find an exact match for each channel of each tile. Using increment and decrement instructions, there is also opportunity to encode near matches and none of this excludes a B+ tree of key frame diffs writing to a 1000 byte buffer. With 256 Walsh tiles and in ignorance of temporal coherence, 16 frames of unrelated video are overwhelming likely to share tile data and duplication *must* occur by frame 257. This is the principle of a de-duplicating filing system: create an environment where collisions are inevitable. In the case of Walsh tiles, any attempt to find duplicates is compounding and the ability to store an additional frame delta is extremely non-linear. The major penalty is that subroutine call and return incurs a total of 12 clock cycles. It also requires three bytes of RAM for each frame to call each subroutine and one byte of RAM to terminate each subroutine. There is opportunity to tail call but finding the tightest arrangement may be a combinatorial explosion. In the trivial case, a buffer will be populated non-consecutively. The section unique to each frame will be written first. Even here, writes are not sequential. Horizontal strips of high noise can be block copied in a loop. The remainder may aggregate sequential ranges of constants. For example, it is possible to LDX #0 and then write all bytes which should be zero then INX and write all bytes which should be one. These idioms may be repeated within each subroutine and the last subroutine invocation may be JMP rather than JSR. Only frames with no commonality end with RTS. It is possible to increment or decrement values before, during or after subroutine calls. This provides additional scope to pattern match and compact the binary, although it increases complexity.

You may think that it is crazy to implement a codec as arbitrary code execution. However, there has been a move within the 6502 community to implement a 65816 system with memory protection running at 20MHz or more. 65816 is 6502 with 8/16 bit extensions. This means more opcodes, more compact programs, more bus cycles moving data and the ability to index 64KB as a linear array from an index register. All of this is potentially beneficial for 1000 Walsh tile video. In particular, it may be preferable to encode with the 16 bit extensions while keeping decode 8 bit compatible. Whether or not a system has (or uses) 16 bit extensions, a privilege bit would be beneficial. With minimal privilege, arbitrary execution of the "codec" is restricted to its own address-space. The buffer can then be selectively copied to display by a kernel. This only requires two context switches per frame of video. Therefore, 60FPS video only requires 120 context switches per second. A 2MHz 6502 can handle 50000 interrupts per second. So, 120 context switches per second at 20MHz is a very minor overhead. Audio and retrieval from storage may incur significantly more context switching. However, audio does not require arbitrary code execution. Indeed, in the trivial case, it is possible to play uncompressed PCM audio. This is, of course, more secure and stable than the recent buffer overflow found within GStreamer's NES music player. The music player requires 6502 binary execution. This was implemented in C without safeguards and deployed on millions of computers. As one wag noted, "Poettering strikes again!"

If you want to run this codec on a 4GHz computer - and do so more competently than Lennart Poettering - load one bundle of frames into Mike Chambers' 6502 implementation (or similar) and execute 6502 bytecode until it reaches the correct nesting of RTS or exceeds a ridiculous instruction count. Then render buffer to screen using native software. Repeat for each frame then load next bundle. Your operating system probably provides read-ahead and therefore latency to load the next bundle is minimal. Although we are "executing" the codec, there is no sane opportunity for malicious code to escape a virtual machine. Executed code isn't native, isn't JIT, cannot perform Rowhammer, cannot poison a cache hierarchy, cannot perform timing attacks and has no operating system or I/O during execution. Further safeguards may include wiping temporary variables, subroutine/branch limits, disallowing read/write above the highest loaded address and making write/execute mutually exclusive. This prevents data leak between bundles, prevents self modifying code and prevents another level of bytecode execution. I would nuke it from orbit but such action is incompatible with a good frame rate. Although the codec is explicitly Turing complete, much of the cleverness is in the encoder and this is mostly restricted to a XOR of horizontal and vertical Walsh functions, a mechanism for downsampling, a target rich environment for birthday paradox collisions and generation of CPU instructions rather than conceptual blitter instructions. Indeed, the major concerns for decode are correct implementation of the audio/video container format and timing for smooth playback. Surprisingly, Turing completeness is not the major concern. Indeed, giving that there is a formal execution model, it is possible to perform static analysis of codec output using general purpose analysis software.

Ideally, I'd like to implement Walsh tiles as a complement or replacement for JPEG tiles within an agnostic quadtree codec and, ideally, I'd like to use this as a general network windowing system. However, this requires a significant amount of work. The minimal implementation is executing 6502 bytecode to obtain a flat grid of tiles. This can be scaled up to any desired resolution by assigning, for example, 256*256 pixel tile encode/decode to separate cores; possibly 500 or so GPU cores. Yes, with very little modification, this system can be adapted to render 8K video with CUDA or a multi-core CPU. In this case, 40*25 Walsh tiles retain downward compatibility. 40*25 also works as a 16:10 video thumbnail.

Sunday March 14, 21
02:52 PM
Science

Nassim Nicholas Taleb's most famous book, The Black Swan: The Impact of the Highly Improbable, is over-rated but has many individual observations relating to complexity. Taleb, a former investment banker, quit and may have written this book partly as therapy after chasing Mammon so intensely. It was immediately popular with banking friends and released shortly before the banking corruption of 2008.

Among other observations, Taleb compares power cuts to eathquakes and generally cautions us against applying Guassian statistics to Poisson events - which include the scale-free network of financial transactions.

As an example, if two people with a combined height of 12 foot, they are unlikely to differ by more than six inches. Whereas, among two authors with a combined income of US$1 million, one author is likely to account for 97%. (Such observation was popular with publishers.) I have independently found that "wisdom of the crowd" works best when the answer is Guassian, such as candy in a jar, rather than the success of a venture, such as a start-up, film or book. Have you ever used the Unix ping command and received a standard deviation of network latency? That's bunkum. Packet latency is Poisson.

Bouyed by this understanding and the hindsight that financial "stress tests" are useless, I took a more empirical approach to my second reading of the book. In particular, there is a small passage about deriving numerical constants from stochastic (random) processes. As a former DBA for a renderfarm, I am quite accustommed to industrial scale ray tracing with MentalRay. This ray tracer is stochastic and sometimes fails to converge. (To a lesser extent, crowd scenes with Massive also have this problem.) The result is an industrial scale halting problem where, at best, repeat runs have pixels which vary by one value. I've had similar fun with random replacement cache algorithms. While it is preferable to only amalgamate deterministic systems, such wishes are thwarted by matters such as network packet drop. So, whether you like it or not, a sufficiently large collection of bits begins to look analog. Empirically, a commercially viable system with 2^46 bits of RAM is way past that threshold.

Anyhow, the stochastic method to calculate pi is a combination of Pythagoras and the choice of random co-ordinates within a unit square. Specifically, we wish to calculate πr2 where r=1. Or perhaps 1/4 or 1/8 of a unit circle. Yes, this is calculating pi by throwing darts at a dartboard. It is also possible to discard the randomness and make a linear scan. This is not elegant or efficient. However, it can be applied to more complicated circumstances including situations where no exact solution is known or possible.

We can calculate pi using a Perl one-liner. Specifically:-

perl -e '$l=10000;for($x=0;$x<$l;$x++){for($y=$x+1;$y<$l;$y++){if($x*$x+$y*$y<$l*$l){$c++}}}print $c/($l*$l)*8,"\n";'

This calculates pi to three or more significant digits. However, it should be immediately apparent that doubling the accuracy quadruples processing. This is a brute force and ignorance approach. This trivial method of calculating pi has a O(n^2) problem. Regardless, it is occasionally useful to calculate or sidestep constants by using successive approximation. In ignorance of the golden ratio and 60 way cube symmetry of an icosohedron, I used dot product and binary successive approximation. Considering an icosohedron as a North Pole, a South Pole and two staggered rings each with five points, almost any two angles can be used in a game of higher/lower and determine a suitable latitude for the rings. The initial iteration is fiddly and the result has minor rounding error. Regardless, it is sufficient for rendering. This formulation has the icosohedron on point rather than the "house roof" cube formulation which, admittedly, only requires 45° rotation.

In the case of calculating pi, it is possible to take advantage of properties of a circle quadrant, such as being monotonic or contiguous. This allows tracking of the edge of a circle over successive lines or recursively dividing a circle into a quadtree. (The latter works really well with contiguous fractals.)

Anyhow, Happy Pi Day.

Monday December 28, 20
11:49 AM
Code

Every Christmas, I like to publish a fun program. Previous efforts include a brute force Sudoku solver, quaternion Mandelbrot animation and toroid animation. It is more fun if it includes fractals, spheres, toroids or hexagons. Or spherical cows or spherical reindeer. This year, I learned that there are variants of a Mandelbrot fractal which involve exponents larger than two. The result has a pleasing symmetry despite the off-by-one nature of the exponent. Specifically, zn+1 = znp + z0 has p-1 rotational symmetry and z7 has a pleasing six-way symmetry which looks like a snowflake.

It is relatively easy to adapt a Mandelbrot program to generate z3 and z4. Unfortunately, they run slower and therefore z7 requires efficient implementation. I couldn't be bothered to apply z3 then z4 or manually calculate z7. Instead, I wrote a program to generate the algebra terms for me. Yes, this is only a toy program but I'm writing programs to write programs. Anyhow, my program ran first time. However, z = (x + iy)7 produced 27 uncollected algebra terms - which is no advance on using paper. After five or so further attempts, I ran:-

#!/usr/bin/perl -W

$term{''}{0}=1;

for($l0=1;$l0<8;$l0++) {
  %temp=();
  foreach $ele0 (keys %term) {
    foreach $ele1 (keys %{$term{$ele0}}) {
      $temp{'a'.$ele0}{$ele1}+=$term{$ele0}{$ele1};
      $temp{$ele0.'b'}{($ele1+1)%4}+=$term{$ele0}{$ele1};
    }
  }
  %term=%temp;

  print 'z^',$l0,'=';
  foreach $ele0 (sort keys %term) {
    foreach $ele1 (sort keys %{$term{$ele0}}) {
      if($ele1==0) {
        print '+',$term{$ele0}{$ele1},$ele0;
      } elsif($ele1==1) {
        print '+',$term{$ele0}{$ele1},$ele0,'i';
      } elsif($ele1==2) {
        print '-',$term{$ele0}{$ele1},$ele0;
      } elsif($ele1==3) {
        print '-',$term{$ele0}{$ele1},$ele0,'i';
      }
    }
  }
  print "\n";
}

and produced:-

z^1=+1a+1bi
z^2=+1aa+2abi-1bb
z^3=+1aaa+3aabi-3abb-1bbbi
z^4=+1aaaa+4aaabi-6aabb-4abbbi+1bbbb
z^5=+1aaaaa+5aaaabi-10aaabb-10aabbbi+5abbbb+1bbbbbi
z^6=+1aaaaaa+6aaaaabi-15aaaabb-20aaabbbi+15aabbbb+6abbbbbi-1bbbbbb
z^7=+1aaaaaaa+7aaaaaabi-21aaaaabb-35aaaabbbi+35aaabbbb+21aabbbbbi-7abbbbbb-1bbbbbbbi

which, for speed, was adapted into the rather impentrable:-

p2=p*p;
q2=q*q;
p3=p2*p;
q3=q2*q;
temp=p3*p3*p-21.0*p3*p2*q2+35.0*p3*q3*q-7.0*p*q3*q3+x;
q=7.0*p3*p3*q-35.0*p3*p*q3+21.0*p2*q3*q2-q3*q3*q+y;
p=temp;

It is obvious with hindsight that the collected terms are combinatorics and therefore look like numbers from Pascal's triangle. It is also obvious with hindsight that each number of the sequence forms four phases of real/imaginary terms because there is additional opportunity to multiply by i. Regardless, getting a computer to do the algebra avoids the most likely "Why isn't this working?" scenario.

In a further example of laziness, I couldn't be bothered to use libgd, learn how to use the JavaScript Canvas class or similar. Instead, I wrote a small library in JavaScript to make a table of pixels using the deprecated document.write method. Specifically, the first row is 1024 transparent single cells. The discourages a web browser from "optimizing" the layout. The remainder is scanlines of pixels. Where pixels are the same color, they are aggregated into one table span. This significantly reduces memory allocation and therefore it significantly reduces processing time. The result is suitable for textures including spots, stripes, checks, rainbows and Mandelbrot fractals.

If you are similarly too lazy (or don't trust JavaScript), example output is provided. This might make you more inclined to try your own variants.

If you want more of this, I have been particularly impressed with Mandelbrot fractal animations where a fractional exponent is animated. This causes a Mandelbrot fractal to sprout additional heads as animation progresses. It should be apparent that such an animation requires an efficient exponent function which may be supplied with fractional real/imaginary inputs. I haven't determined the details of such a function but I presume it requires something like a gamma function with sine/cosine for the imaginary component. This was supposed to be fun.

begin 644 html-table-texture.tar.gz
M'XL("*KAZ%\"`VAT;6PM=&%B;&4M=&5X='5R92YT87(`[5Q;<]NX%59V^J1?
MP2A.=*%H`;PI&8N>M&G3V9GN2[HS?>AV,T<48W%#4S1%-?)F/=._W:<6MP-2
M%UOVUG&Z"3")1`('!SBW[X"$C'EUGCD53+/$J9)UM2J3T32#_/WQ6?JN=4^%
ML!+Z/O^F8W_<_)8E\%K4"\>NYXU=2EJ$AF/BM2S2>H"R6E906E:K*M/EQ2K)
MKJ,[U/X;+7_^]O7S%_"H]:CUKU;K/Z(\_O?O'K59Q9"W\Y;6-]_\Y5'KI&7*
MEU?FN_$?SY/X_?*8MSQ(_%,2!BK^@_$X8'1T[(Y#$_\/42;<S*?MR3R!&?NJ
MTBI+3E\)#YB,Y%U[,E*MT\7LDGTMXS(M*HNEB;,5G"51YR?X)\C*CK4LXZ@C
M'.KXIV7G=#*2#0>ZL>;'CM-NBXYO9R5\>#M-SM*\US]1=9"E9WE=^6Y1]M81
M.5E/GA/V:=M]ZV/;LIJT\2);G0OBJPT>23Y##I>,PZ7@<+G)(4OSI![,LJX9
MSK+BZ#NHYL?OL@6CZ*WMRX'7'U%"^G:S_M)98_V)Z):^Z\5/762"HQ;I.LGX
MO!>KLM=]\EJ4KNIR9279,K%N[B(C2G=IR_\-F93P5TU-J[KV:.0XW-K:8B-!
MQ"^4Y4?26PQL?M'XOS@OLF3M%(L/24F.B^Q3XS\+"X'_/EO_L?L6<8GG^0;_
M'Z(\>3Q:+<O1-,U'15)FEO.W=OMEF4`6];H<1EX6<UA&/8Y;`C&/,AK1$_8Y
M><X_$0A?5N<%B216\FL:U;AYE!&&G.R390#(H.P)_GU>4^-HL5K.)9=A%[K'
M1YSD[XSB'\/NM'E[LDE-AT=\@I*R5U_;M/_4/U'P)^41S,7\A$BB-Q.*,2O3
MO+*Z/__8'3*)AMVH>Z>9,RBOQXTB4H.Z8FPSOEOS%UB^U9'>W''83;O7=W9W
M.CNW&]6[N6-C5*5,2=7Y(>_P+&)2P1>.__<`_P?QWZ.(_SYU79?COQL2@_^?
M"_^/JJ0\_]CM7GTD5Q'5P$\$\!,!_!K_GE;)>5&#?0+QW#I*LH18O??)Y9*W
ME^>(,4T"B@0?Y7"BT]55#4='G/-'D0Q$DZ"@5W;4I%>5)QM]1-,QRQM7'WNB
M762#FWHBN(G91D*FW=1`&JFA*>=R45;606$;5-=)S*&9$V\DD8ULL#O_H;A#
M^37"2S;TCFQJM-]AY>YAY=Q]1MX=V6S,R.2ASX#_U;KZQ/CO>^-0X'](/,(R
M`,?_@%"#_P^"_Y:`_>V<W_[Y1QK98$]3=N6R*W8)J0W3U)E.697'JWB=J`1>
M"ZR&-\/4@>F458B^OB`4E))4T@IB$-3`R8'3<P+%0S`!SH7?@V#(/E)..TUM
M]L%G$4CFDKMBK_C+`=0(<@@YAAH$Y"@@A@$Q#LB!0(P$<BB08X$81\Y,3DW.
M34T.Q.Q`3@_D_#B9FK>:.,B9\SI08O`OFW],Q3!"8Z$22DF%8J%<2C"43(FF
M9$/A4#HEGI(/!502HH@HHQ(2E)0@Q00I)RA!04H*2E10LH(4%J2TH,0%)2](
M@4%)#%)DD#++.6OM*O4J_:*"E891Q:ACI6206@:E9E!Z!JEH4)H&J6J0NI:<
MT4)H(K21,A(H*X$T$T@[26)M0V5$D%:4M:",*KY2*:^('>6Y8S0R6EF;6=L9
M#:TMC:9&6VMC:VNCN='>VN!H<6UR;7,TNK8ZFAWMK@V/EM>FU[9'XZ/UM?FU
M_=$!M`>@"Z`/:"<`]`)0;@#*#P`=`90G`+H"H"^`<@90W@#H#H#^`,HA`#T"
ME$N`\@E`IP#E%8!N`>@7H!P#T#-`N08HWP!T#D#O`.4>H/P#T$%`>0B@BP#Z
M""C_P*C`L,"XT(&!D:%#0\<&!@=&APX/'1\8(#I",$0P1G20@(H2P#`!C!-0
M@0(8*:!"!52L``8+8+2`"A=0\0(8,*`B!C!D`&-&RJ*C58>KCE<,6!VQ&+(8
MLSIH`:,65-B"BEO`P`45N8"A"QB[<I0Z\C'T,?9U\(.*?L#P!XQ_V47C@P8(
M0(20+:`A0UXX\FNJW(-EBO:-J9N"33%Y4Q9F+E<EQ=Q->>![PKP>EXG6V9H*
MZ/&E0X5"K[Z4@M99ETJ`"I0C4B(-P+\%:2"%H,VD1A7>A>C7-%`6=0FJE%>)
MGB&*3YN021$SQSHX7(I^X07:,N)2\.'-BM,8-4AK%9KU_Y[U_SGDLR2;EHOJ
MGO:`#[W_<:G/UO_CL4M#XA.?K_^I[YKU_V?;__U.>X#UNH2X@NPWL!?LN,=\
M?U9\VA$Y)B3XE7O"@M.EX'2YA]/^O>%KA[>L(EK+=Q@7T>4)OAOJQ1$YZ<43
MOB?\[%FO&!3VQ>!BXA^3_DE<[RZP0?D+-M;LL&9[C2]#+B(VTJ!@59=8543R
MO57]KH3O,HL!]FX:I\S(ZUY\B^UE2>E0LZW\]>!_O)HF#X7_GH_O_WD*$/A/
MS>]_#/X;_&_@/\L`GH3\S3P@*UDU3Q`F'YAR/_@_3XJ'6O_[KL;_P"6N7/^;
M]_\&_PW^\]XN1W^-]V[$R#3$>U'A-AJ]Z,*M6V7F\`;\G^-2GB?8%2-P;2^0
M=Q?LGS.6:87?-1/+6'7@)$C/R6S)RA4]7.="<C%YYTO#__5]_0G`P=__C,<:
M_ZDK?O_/BL%_@_\&_^\'_QU:X[=KRQN)W\X.[(=UGG!<@FG"#G62<`W4?P7X
M7R3Y0ZW_O2#4^$]H*-;_@7G_8_#?X/_]X+_K4`1RUPXTD#=1O\X/BI:G!]O`
M_5>+_Q<KF#T4_M?O__T@D.__??/[?X/_!O_O!_^=4"&Z:V_COJ_R@N.KM&#0
M_FO'_Q+2?+KX<'\'0!PZ_V5,7'G^0Q"X+A'O?_S`O/_Y?/C_1GJ`.0#B%@=`
M[.+:QH$0`[9\[H\"TN^?_%^BW)[X7Q:+ZCZ/?SG\^X^QK^+?'_,'?W[^"PE,
M_'^V^/\K]P`3_;>(?K:TLR\'KE[>.6N\*9SFR3"%./MEP#X4Z4;KQ59K&16.
MOEE&%_5-,8C4BNQB$*F%7CF(2D4ZB);Z?)D>6U!.`L;WEU_8Y1(OR[JV5+7]
M.YTI<YMC:%Z_]CQS#,UO=/VW9%!7),L'6_]1XH7J_*^0)0.7XW]`S/DOGQ'_
MI0>8#/"_'`!VIX.^7KVZ(\*:@[Y,^43XC[%[;V,<^OO_D.+Z/V`K?X[_81":
MY_\'*:.1]6:QJAAB+*WO%]8;!A%):7V;6]]S+UBVV^]6>5REB]S:16:!4+-%
MO#KG>\4?RK1*>IV)H+.FBY(QBH@5)UE6P&R6YF?J;EE`+.Y.?\@[`L#*A#E>
M+L[QV!JN"</7#%=J-EDD5LF+J-N]F>M&*CG(]A`?@:K[N8SNP`:SU3Y.W4DU
MLQ@!4UT>T=-)>GXF\ZP^J[EC?4AGU3SJT(XU3]*S>151"[(JZG2L$4O"U>RT
M.^P<GLL\6</L;4IZN9P(RU]Y?1Z-[&=UB5"P2E22A&Z1T%T2=XO$W27QMDB\
M71)_B\3?)0FV2()=DG"+)-PE&6^1C'=)GF^1/-\E>;%%\F*/ZK;5^_L]--OZ
M_<,>FFT%O]I#LZWA/^ZAV5;QG_;0;.OXM:2YSJ6:_O3,#5#]5AZQ&Y8$G!S?
MCTF&O2UW[.6GIW[_*0W[]K:C\LK^/E]6+^5(+SX<5-UA-NQ:TS.^P"JC3G<8
M#[N=&P--]+A-L"E8NC[F-I9VL=93+WL<B;VA^'&TZ.]]WTAZ"_U><1'%8C3;
MOLUHN%F#@\6G]:%33!M/NDTU]^*!3^R@OUOW@NRI9(]O_8;#U$S5HA5GW)0Q
MOZ6,^:UDK%?`C27]%K]?"=;U0OHZR!=+Z<.,F+[X'^YO3''SN<>R]C[YW&+'
M<?]34/,98>-)2'*\8>/SNJ>B6TREL?W9V`"]VQ;H=7\$N_?/8#<W+G'K<M_F
MY4W;E[N/83=N839/)]OW&-94??-1;,-!S-.(*::88HHIIIABBBFFF&***::8
48HHIIIABBBGW5?X+!:N#Y0!X````
`
end

(Usual instructions for uudecode process.)

Saturday December 19, 20
05:03 PM
Hardware

When I joined SoylentNews around 2015, I was directionless; without a plan. Since then, I have gained a purpose. In this period, I have determined that:-

After determining fairly conclusively that we are heading for an economic depression, global war and all of the associated strife, I was quite depressed. I was especially depressed given that the Anglosphere is very likely to lose a global war. However, it is the severity of the situation which is alarming. In Aug 2019, it was apparent that socially and economically, there were echoes of all of the turmoil of the last century or so. It was particularly bad for the economic charts to align with periods either side of two global wars, the social chaos of 1968 and the economic crashes of 1987, 2001 and 2008. And we a currently in a guilded age which makes Rockerfeller, Chase and Carnegie look like amateurs. Even if the 2020s depression is merely the same depth as the 1920s depression, the global population is much greater and therefore the suffering will also be much greater. Unfortunately, the 2020s depression is likely to be 2-4 times worse.

From the 1920s to the 1970s, the stock market had companies of substance, like Ford and Dunlop. Now we have Pinterest and Chuck E. Cheese. Really? Nolan Bushnell, 6502 processor programmer and founder of Atari, thought that it would be a great idea to make an animatronic pizza restaurant and games arcade. Hundreds of these places have been ambling along - some for more than 30 years - before a stock market floatation and pandemic restrictions. Anyone who "invested" in this business has been deceived but some people got rich flipping that turd. To quote Matt Taibbi, "You know that the economy is fucked when the rich are running out of things to steal."

We're screwed. Politicians won't help you. Bankers won't help you. Celebrities won't help you. You'll have to dig yourself and everyone around you out of a financial hole, using little more than scraps while receiving approximately zero positive feedback. Oh, and expect this to last more than a decade. Perhaps two or three.

I thought about this for a while and decided that you need things which you can build and repair yourself. Like cars and computers. Or maybe computers in cars. Unfortunately, we have to do this on hard level with extra obsticles thrown in the way. For example, there are an increasing number of regulations which didn't exist when Ford Model T, Citroën 2CV or Volkswagen Beetle were in production. Indeed, the only reason these vehicles are out of production are due to increased expectations and increased regulation. For example, air bags are mandatory, despite them regularly failing and the chemistry being a nightmare. Indeed, the regulations are so entrenched that any superior system, such as the expanding foam featured in Demolition Man, it'll have to work in conjunction with faulty air bags. It is a metaphor of our times.

Bureaucrats want to add more regulations. It is quite likely that hydrocarbon road vehicles will be illegal by 2035 or sooner. Even Hitachi is planning to make a battery powered train which I've previously argued is inane. While such regulation is good in principle, use of hydrocarbons is already down, in part due to tele-commuting. To make matters worse, major cities, like London, are expanding road restrictions. Specifically, there will be extra fees for vehicles which aren't designed for Euro 4, 5 or 6 particulate standards. (Dieselgate was a vain and foolhardy attempt to meet the precise wording of the Euro 6 standard.) Any vehicle *design* which fails to meet the standard will be charged up to £300 (US$400) per day if it is anywhere within an urban area of almost one million people. This scheme is likely to expand to cover more than three million people. It is also likely to set a price cap for schemes established by other cities. The last time I bought a car, it cost US$500. People will now be expected to pay a similar amount daily. While this is aimed at commercial use, these costs will be added to the cost of distributing food and other goods.

Meanwhile, the UK's highest court confirmed that one of the world's largest airports, London Heathrow, can expand. This decision came on the same day that a London inquest first determined air pollution as the cause of a fatal asthma attack. While drivers being heavily taxed and simultaneously discouraged from using public transport, the aviation industry has zero tax on fuel and is currently complaining that sales tax will have to be paid in airport gift shops.

Meanwhile, some of us have to do real work rather than shill on Twitbook. So, some of us require transport. Nowadays, all of that transport requires computers. That includes planes, trains and automobiles. Cars require computers for the mandatory air bags. They also require a computer for the mandatory emission control - which fairly much makes fuel injection mandatory or an alternative source of power, such as hydrogen or lithium which requires a computer as the last line of defense to prevent explosion. Admittedly, it would help if said computer was designed rigorously. This requires partitioning tasks for maximum safety rather than maximum sales. For example, by not allowing an optional micro-controller, running on a common bus, to have unrestricted access to engine, gears, brakes and steering for the purpose of automatic parking. And if you must do that, don't make DBus available over TCP port 6667 on Sprint's data network because that would avoid recalling 1.4 million vehicles.

Incidences like this led me to work on a 256 bit cell network protocol with CRC32 and two phase commit. The latter is commonly used with databases but it offers considerable benefit for process control. Unfortunately, I thought that I was clever to make a fully software implementation but I subsequently discovered that a hardware implementation uses less transistors and less energy. This is particularly true if there are multiple communication channels.

Unfortunately, my skill with electronics has only made theoretical advances. This is becauase I've had difficulty ordering components. Also, meetings with off-line hardware hacker types have been cancelled. I expected a maximum of a four moth delay during the pandemic. Now looks like a 13 month delay. Regardless, I've used the time wisely and patiently. As previously promised, I've listened to all of MIT OCW 3.091 Solid State Chemistry. Indeed, I believe that I've listened to every lecture at least six times and got to the point where I can anticipate phrases within 30 hours of chemistry lectures. It has been surprisingly useful regarding the temperature range of semiconductors, air bags, oxygen sensors for fuel injection and alloys. In particular, I'm now fairly certain how Great Uncle Dick made the world's first aerospace grade titanium.

I've been using text-to-speech quite intensively. I've probably listened to more than 10 million words over 10 months. Yup. That's one million words per month in addition to undergraduate chemistry. Text has mostly been general news, financial news, the weird stuff and 8 bit computing. I've also been reading science fiction. In addition to re-reading the two million words of DeathWorlders, I've read most of the fan fiction. Most notably, Rantarian's infrequent but large bursts where he writes up to 1000 words per hour. When I ran out, I had a hankering for more fan fiction. After some truly awful Totally Spies fan fiction and some moderately less awful UFO fan fiction, I found some bearably awful Blakes' Seven fan fiction. Actually, after reading DeathWorlders and Blakes' Seven fan fiction, I appear to have an inexplicable appreciation for bad space pirate fiction. Regardless, I have yet to untangle the timeline of influence between the BBC production of Blakes' Seven which used a BBC Micro as a prop and the Elite space trading/piracy game which was originally written for the BBC Micro by David Braben before he wrote demonstration software for ARM, wrote Frontier and then started the Raspberry Pi Foundation. Indeed, there is quite a nexus between 6502, ARM, Raspberry Pi, Arduino and Z80.

On the 8 bit forums, I've noticed a kinship between enthusiasts of RCA1802, TMS9900, 6502 and, to a lesser extent, MC6809 due to a feeling that these processor architectures were cut short but had much more to offer. While there is a general kinship in 8 bit and 16 bit computing, there is a remnant of rivalry between MOS Technology/Zilog and Motorola/Intel. There is also a collective ambivalence towards Microchip AVR, Parallax Propeller and FPGA - along with minor grumbling about new-fangled technology, such as EEPROM. Don't mention cache, FPU or 32 bit addressing. Regardless, 6502 and Z80 remains relevant in vehicles. For example, 6502 was used in Volkswagen Jetta dashboards and Z80 can be cross-assembled to AVR which is overpriced but commonly used in automotive applications. I discovered empiracally that it is possible to spark AVR across 15V with no long-term consequence. This type of hardening is ideal in a vehicle. I don't know if UID2828's selection of AVR was careful study or being trendy but it is definitely a good choice for small-scale production in such constraints.

I've noticed that the 6502 and Z80 community is split into tiers. The top tier packages (presumably) licensed retro games with (presumably) original emulation on FPGA, gives out the full packaged design to sub-contractors and gets screwed on the royalties of 750000 units. The next tier makes less packaged boards or open source FPGA designs, like Foenix C256 or ZX Spectrum Next. This can raise about US$1 million per funding round on Kickstarter. However, the economies of scale for tooling FPGA production is such that there is only one opportunity to purchase any design revision before it is deprecated. The next tier wires breadboards, maybe makes videos and sells kits but definitely publishes plans and promotes work, like two identical SoylentNews articles promoting 68Katy. There is some interplay with the next tier. For example, The 8Bit Guy solicits an open source, commodity parts, 6502 successor to the Commodore 64 but the mere problem of implementation is beyond him. Unfortunately, there are also reasonable claims that The 8Bit Guy copies designs and diagrams from forums without attribution. If only someone could forward a full computer (hardware and software) with suitable license which could be sold immediately!

Of course, the next tier is the people working on personal projects and publishing their work. This includes Contiki, GeckOS and AcheronVM. However, the state of open source hardware in 2020 is similar to the state of open source software in 2005: someone is ready to package it and make more money than the original developer. I'm not sure how to overcome this but I have some observations:-

  • Software alone is rarely successful. This is especially true if the minimum wage is US$15 and Indian programmers work for US$3. (Quality may or may not differ.)
  • Kickstarter success is no guarantee of project success. Foenix C256 had five successful rounds of funding before the project became acrimonious. Jeri Ellesworth raised more than US$1 million on Kickstarter before customers received full refunds. This was due to Android founder, Andy Rubin, investing 1/6 of his US$90 million sexual harassment pay-off.
  • A domain name is no indication of success.
  • Hosting a hardware design or software on GitHub almost guarantees failure. Also, if working on OS or language, it is really dumb to host on a platform owned by a direct competitor.
  • Being very active on multiple forums almost guarantees failure.
  • A computer system without a memory map is wishful thinking.
  • Average sales for a bare circuit board is less than 10 units per year. This is particularly pertinent.

I've gained some purpose by being critical of other people's work. For example, I strongly dislike the work of Zaha Hadid who had two styles. An example of the more hideous style is the Antwerp Port Authority Building Extension. The other style is sexually suggestive. If you are particularly purile, search for Vagina Airport or, my favorite, Vagina Stadium with a roof which retracts in two halfs. Zaha branched into clothing and it is equally awful.

Upon reflection, I noticed that my own work also has two styles. One style is sleek yet practical. For example, I proposed a clamshell touchscreen laptop and phone similar to the OLPC design with two separate panels and a seam at the hinge. Companies like Samsung could have saved considerable embarrassment and expense with this style rather than a foolish attempt to make a continuous folding panel. I'd prefer a continuous panel but two flat panels is cheaper and easier to repair. Oh, of course. Samsung doesn't want you to repair your equipment. Buy more shiny!

My other style is more industrial. For example, 19 inch rack equipment which fits into a utility cupboard or is only used by trained staff. Some of this spills into consumer-facing "get 'er done" style but only because I cannot abide imitation which falls short. How often do products imitate Microsoft, Sony, Apple or another distinctive company? How often are they inappropriate or inferior? To me, Tesla looks like Porsche styled by Jony Ive at Apple. Except that Apple would never allow wonky doors or a cooling system patched with laminated wood and cable ties. Even the key for a Tesla looks like an Apple mouse. Well, the current versions are preferably opened with an Apple iPhone. So, the key previously looked like an Apple product - until it was replaced with a real Apple product.

Actually, I hope that we get to see a convergence of car and computer styling. Jony Ive now runs an independent design company, although its major client is Apple. Through this arrangement, Jony Ive is available to style vehicles for Lamborghini and is a lead contender for doing so. In which case, we'll get to see Apple styling - but now with properly fitting doors. I am particularly curious to see a 1980s Essex Boy style 800BHP cars because it won't be subtle. Consider the Mark 1 Ford Escort Cosworth from Fast & Furious 6 with a power to weight ratio which would shame many motorbikes. Now imagine this aesthetic applied to a Lamborghini. And someone who has worked on iPhone styling can fix the gaudy yellow accelerometer display found in a Lamborghini Aventador SVJ.

In such circumstances, I cannot compete with "fit and finish" and I won't make a pale imitation. Instead, I choose an alternative path of development starting from the period where it was possible to compete. That's why my outline vehicle design is a mash-up of Lamborghini Countach, Suzuki Jimny and BigTrak. (Am I the only person to notice that Lamborghini Countach development in particular follows the fortunes and styling of Commodore products from 1979-1985?)

Starting with old, trustworthy, repairable technology does not exclude modern practices, such as source code static analysis. I may be ahead here. A good method to determine company plans is to check hiring requirements. It from this method that Apple was first confirmed to be working on a smartphone. By the same method, I discovered that McLaren sought someone with Arduino experience. I hope this is only for prototyping otherwise it may violate Arduino's GPL2 license. I also hope it was for prototyping because the Arduino library fails MISRA standard and is therefore unsuitable for automotive and medical (which is also a focus for McLaren).

This is the worrying part. There are people in makerspaces working to a higher standard than large, well-known companies - and they receive approximately zero recognition or reward for their effort. Meanwhile, Nikola (a company which in no manner could be confused with Tesla, Edison or Nio) makes all of Elon Musk's ventures look sound and respectable. According to the amusingly named Hindenburg Research, Trevor Milton, the founder of Nikola, ran a series of dodgy companies, mostly with a single letter prefixing a dictionary word (in the style of EBay) before founding Nikola which attempted the Dot Com ruse of spending finance money on superficial but visible matters for the purpose of getting a larger round of investment. After possibly selling no more than five vehicles to one unsatisfied customer, this was eventually parlayed into General Motors acquiring 10% of Nikola in exchange for Nikola naming one vehicle manufactured by General Motors and for General Motors to sell the existing vehicle with both names. After this, Nikola was briefly worth US$14 billion. Given that Trevor Milton owned 20% of Nikola, his personal wealth peaked at US$2.8 billion and this allowed him to buy the most expensive home in Utah. I'm not sure that it was worth the money but it is now the largest recorded transaction for a residential property in Utah.

Apparently, Nikola outsourced all design and manufacture to the extent that software I've published on SoylentNews may exceed Nikola's intellectual property portfolio. And my venture is worth about US$1.50. Before the founder resigned due to allegations of sexual impropriety, he claimed that Nikola was the only vehicle company in the world which could run off-grid. Really? I'm not sure that claim is unique among vehicle companies in London. Three or more members of the London HackSpace have worked at such a company. Actually, it is little more than a hobby business. The wife runs a theater and the husband runs a fuel cell company from their shared office at the back of the theater. The whole building can be powered off-grid using fuel cells. Indeed, the (once) profitable theater is a reference site for the energy company which currently has a contract to design hydrogen powered garbage trucks for Liverpool Council. I strongly doubt that Nikola's solar/hydrogen claims were ever true. Meanwhile, a company which meets or exceeds Nikola's claim is struggling to stay afloat.

Obviously, there is a balance between substance and style. Nikola quite obviously has no substance. Many geeks have no style. Worse, many hobby computer projects are formless and functionless: bare circuit board and no purpose. Well, they can maybe run BASIC or Forth or something. Apple's hilariously amateur premiere at the Consumer Electronics Show was ahead of this. The first Apple computer was demonstrated on a wallpaper pasting table, inside a cornflake packet which was sprayed white, presumably by Steve Jobs. More than 40 years later, people routinely fail to match this standard. Any box will do. That includes a hummus tub or one made with drain pipe. (The latter is a homage to Steve Jobs' association with rounded corners.)

If your vaporware computer has a memory map and a box it would be in the top 20% of vaporware. A real product with a box probably multiplies sales by a factor of 20. In this respect, ZX Spectrum Next is not a representative example because it was styled by Rick Dickinson's design company. Rick Dickinson is not quite as famous as Jony Ive for computer design but he is the designer of the original Sinclair computer cases and has therefore worked with punched metal, injection molded plastic and membrane keyboards for 40 years. I wish that I could afford his services.

Beyond an embedded computer with the amorphous purpose of doing stuff in a vehicle, I have a plan for a rather unconventional computer. It is approximately 2m wide and 1m tall. It is angled like a draughting table and has an optional counter-weighted keyboard which can be moved around. The 8K display (7680×4360p) primarily displays 16×32 pixel fixed width text. (480×135 cells.) That's 480 columns for text windowing. Furthermore, windows use captions rather than titles. This allows windows to be moved on the touchscreen without hands obscuring the contents. Imagine a mix of LCARS, the flatscreens from TekWar or maybe Oblivion.

This process of imagining a product is helpful because it excludes options. This isn't all things to all people. It is the first step of product design. As a result, I find that it is not suitable for parallel bus expansion. Nor is it particularly suitable for use with a mouse. It might be suitable for video conferencing. If you *must* have a camera, it can be covered and the view is preferable to camera placement on a smartwatch which provides a rather snooty view of the wearer's nostrils and not much else.

Limitations are also an advantage. Nowadays, it is trivial to make a capacitive touchscreen with 8 inch (20cm) diagonal. 60 inch (1.5m) is relatively cheap at low resolution. 2.25m diagonal is barely possible at low resolution. The issue is 2m2 of capacitive sensing in a field of mains hum and other interference. In such circumstances, I'm surprised it works at any scale. Regardless, for my purposes, only 480 distinct horizontal positions are required. Even here, it is not a huge concern if it is off-by-one. So, it is possible to make a text interface where a graphic interface would fail. The coarseness of the interface works in conditions which are otherwise infeasible or impossible. Unfortunately, proportional text has become common to the extent that many people find fixed width text unbearable. Programmers are an exception and even this is a generalization. Thankfully, fixed width text is significantly less of a limitation for CJK [Chinese, Japanese, Korean]. So, curiously, this design may only be suitable for geeks and Asians. Or maybe Asian geeks.

I am now working within these constraints. Some factors are unchanged, such as network packet sizes, lossless 13.1 surround sound and all of the features of peripherals, such as working within a 16 bit address-space and Arduino compatibility. Others may be greatly simplified or eliminated. For example, I can drop many of the fancy features of the proposed video codec, such as bloom, hpels, tpels and qpels.

Anyhow, I have a form and a function.

Saturday August 08, 20
12:38 PM
Hardware

I've been working on a 64 bit extension to the 6502 processor architecture. This is for the purpose of implementing a secure computer which also has a hope of working after post industrial collapse.

Along the way, I have found a use for a practical use for 8 bit floating point numbers. Floating point representations were historically used for scientific calculations. The two components of a floating point number - the exponent and mantissa - work in a manner similar to logarithms, slide rules and the scientific representation of numbers. For example, 1.32×104 = 13,200. Why not just write the latter? Scientific notation works over a *very* large scale and is therefore useful for cosmology, biology and nanofabrication. For computing, floating point may use binary in preference to decimal. Also, it is not typical to store both the exponent and mantissa within 8 bits.

8 bit computers were known for parsimony. In particular, 8 bit computers were notable for a complete absence of FPU [Floating Point Unit]. This is also true for the numerous extensions of 6502 and Z80. Extending such systems to 64 bits falls within a particularly trivial case where data, address, SIMD and float may all fit into a small number of general registers. In particular, this reduces to two cases: SIMD integers and SIMD floats. However, there is an asymmetry in the system which has a semi-serious use. Integers may be 8 bit, 16 bit, 32 bit or 64 bit. If the data is less than the maximum size, there is little penalty for hardware to process multiple pieces of data in parallel; possibly using a different configuration of the same hardware. For example, it is possible to process 2-4 pieces of 16 bit integer data in parallel.

However, there are commonly less options for floating point SIMD. In part, this is due to mis-matched historical sizes which include 16 bit, 32 bit, 40 bit, 45 bit, 48 bit, 64 bit, 80 bit, 96 bit, 128 bit, 192 bit and 256 bit. After 10 or more drafts of the IEEE754 floating point standard, this settled down to powers of two; commonly 32 bit or 64 bit. However, multiples and submultiples of 32 bit are becoming increasingly common. Historically, FPUs and GPUs have commonly used single precision 32 bit, maybe double precision 64 bit and maybe a more detailed internal representation to minimize error between steps of a calculation. However, half precision 16 bit is becoming common. This is sufficient for some graphical applications. It also reduces energy and computation when working with neural networks. Unfortunately, some manufacturers differentiate and market segment their "gaming" GPUs and their "datacenter" GPUs by, for example, not allowing a single step conversion from 64 bit double precision to 16 bit half precision. This can be performed via the intermediate 32 bit representation but it requires more energy and more time using suboptimal code.

Much of the detail of SIMD, FPU and GPU is obscured by historical detail. If you have 56 minutes spare, I highly recommend a lecture by Danny Hillis which explains the multiple iterations of the Connection Machine. (Not a trivial matter because it attracted the mercurial attention of Richard Feynman.) In addition to explaining how to make and test a highly reliable cluster with thousands of nodes, it explains how to start with 1 bit processors, maintain downward compatibility while introducing 32 bit single precision floating point support, how to interface high bandwidth storage, how to adapt and migrate to generic hardware and how to introduce concurrent access. Towards the end, it resembles a GPU. CUDA, possibly the most successful parallel version of the C programming language, continues the trend to present. In particular, the use of bitmasks to gang of 32 "threads" into a "weft" is consistent with the second generation of the Connection Machine while also explaining why GPU support for 64 bit double precision has been relatively sparse. Many of CUDA's remaining limitations are due to downward compatibility of a PCI GPU using 32 bit addressing.

Anyhow, I'm taking a processor architecture with no hardware support for floating point and skipping everything prior to the fifth generation of the Connection Machine, Intel 80487, Intel AVX and ARM Neon. This is a blessing and a curse. In part, it is a blessing because it alleviates any requirement for separate registers or unusual data sizes. However, whether or not 16 bit half precision float is supported, it should be blindingly obvious that there is an asymmetry between 8 bit, 16 bit, 32 bit and 64 bit integers versus 32 bit and 64 bit floats. In particular, there is no standard for 8 bit floats.

Initially, I rejected 8 bit floating point representation as an absurd reduction. Perhaps I could find some other use for the unused instructions. However, there is some use for this limited precision. In the preferred implementation, 5 bit exponent and 3 bit mantissa approximates positive integers up to 35 bits with very coarse accuracy. It is possible to adjust the scale, reduce the error or include negative numbers. However, a four bit exponent (or less) is of particularly marginal utility when 16 bit float is more widely supported. In all cases, the use of greater precision may be preferred. If you remain committed to 8 bit floats, it is possible to use 2^8, 2^16 or 2^24 bytes to hold a table with one, two or three inputs. An example would be the cube of the input or the sum of squares. Historically, this would have been a huge overhead. However, if you want 64KB or 16MB table, it is because you have a vastly larger pool of data to process.

It may be desirable to reserve one or more values to represent values such as overflow or infinity. I could get fancy and explain this with ⊤ (top), ⊥ (bottom) and more obtuse notation. However, it is best explained with a semi-numerate example from Terry Prachett. Take a bridge troll who understands "one", "two" and "many". This gives us:-

  • One + one = two.
  • One + two = many.
  • Anything + many = many.
  • One * one = one.
  • One * two = two.
  • Two * two = many.
  • Anything * many = many.

Repeat for 8 bit, 16 bit, 32 bit or 64 bit floating point values but using a much longer sequence of finite values before we hit the end.

I'll finish with a practical example. Specifically, it is possible to convert an image to indexed color. It is typically available in programs such as Adobe PhotoShop and GIMP. This may be chosen dynamically based upon the content of the image, taken from a previous reduction, or from a pre-defined palette. Unfortunately, GIMP's algorithm is limited to a maximum of 256 palette entries. (I presume that Adobe PhotoShop has a similar format and limitations but I don't have a example handy.) The given example is monochrome to fit within the 256 limit. Regardless, it should give a empirical feel of 8 bit TinyFloat for the representation of images, although without the full dynamic range which may exceed 10,000,000,000:1. Optionally, run the following program:-

perl -e 'print "GIMP Palette\nName: TinyFloat\n#\n";for($e=0;$e<5;$e++){for($m=8;$m<16;$m++){$v=$m<<$e;printf("%3i %3i %3i\n",$v,$v,$v)}}'

to obtain:-

GIMP Palette
Name: TinyFloat
#
  8   8   8
  9   9   9
 10  10  10
 11  11  11
 12  12  12
 13  13  13
 14  14  14
 15  15  15
 16  16  16
 18  18  18
 20  20  20
 22  22  22
 24  24  24
 26  26  26
 28  28  28
 30  30  30
 32  32  32
 36  36  36
 40  40  40
 44  44  44
 48  48  48
 52  52  52
 56  56  56
 60  60  60
 64  64  64
 72  72  72
 80  80  80
 88  88  88
 96  96  96
104 104 104
112 112 112
120 120 120
128 128 128
144 144 144
160 160 160
176 176 176
192 192 192
208 208 208
224 224 224
240 240 240

Install as /usr/share/gimp/2.0/palettes/TinyFloat.gpl or similar. Have fun converting arbitrary images.

Sunday August 02, 20
11:35 PM
Hardware

In 2019, there was a flurry of interest in a post-apocalyptic operating system called Collapse OS. Since then, the primary author has been overwhelmed and this is especially true after a pandemic and global disruption to supply chains. In the absence of peak oil, resource collapse and some of the more dystopian conspiracy theories about population control and reduction, we have fragile systems and a growing mountain of technical debt. In particular, computer security is worsening. This is a problem when computers are increasingly critical and inter-connected.

A common problem is when company founders (or a random power user) writes a poor program due to ignorance or convenience. This software becomes central to the organization and is often supported by ancillary programs. Documentation is poor and the original authors move on. Whatever the cause, bad software is bad because it accretes and becomes an archeology expedition. The situation is worsened by the relatively short career of a competent programmer. It is often 15 years or less. Some people get stale and/or promoted. Some people quit the industry. Some people turn to hard drugs. Others turn to fantasy, the hypothetical or neo-retro-computing.

There are definitely things that we can learn from the past. While I would be agreeable and adaptable to a lifestyle modeled wholesale somewhere around 1950s-1980s, I'm probably in the minority. Considering only piecemeal changes, the parsimony of 8 bit computing is welcome to the technical debt of 2020.

I have an interest in micro-processor design and this has led me to research the precursors to the golden age of 8 bit home computing (1980-1984). Of particular interest is improvement to instruction density because this pays dividends as systems scale. In particular, it magnifies system effectiveness as caching (and cache tiers) grow.

I've enjoyed tracing the path of various architectures and seeing successive iterations of design. For example, the path from DEC's 12 bit PDP-8 to Intersil's 12 bit 6100 to PIC (with 12 bit, 14 bit, 16 bit and 18 bit instructions) to AVR. Likewise, from DataPoint 2200 to Intel 8080 to Zilog's products and x86. The shrink from mini-computer to micro-computer was more often inspiration than binary compatible. Even when components were shared, there was incompatibility abound. For example, it was common practice to dual boot a mini-computer. During the day, it would run an interactive operating system. Overnight, it would run a batch processing system. Commodore, which bought MOS Technology and had the rights to the 6502 micro-processor, couldn't make two computer designs compatible with each other. And it was common for people using computers from the same vendor to use incompatible floppy disk formats.

I'm astounded by the scattershot development of 1970s mini-computers, early micro-processor and the systems built around them. During the 1970s and 1980s, it was common for teams to be assembled - to design a mini-computer, micro-processor or micro-computer - and then disband at the end of the project. As noted in Tracy Kidder's book: The Soul Of A New Machine, a product could fail even if there was sufficient staff to cover attrition. However, downward compatibility wasn't a huge issue. Even in the 1980s, computers were mostly sold using analog processes. That included fully optical photo-copying and tallying sales manually, on paper. Since antiquity, calculations were assisted with an abacus. Romans advanced the technology with the nine digit decimal pocket abacus - which had approximately the same form factor and utility as a smartphone running a calculator application - and was probably used in a similar manner to split restaurant bills The Roman Way. In the 17th century, this was partially replaced with logarithms and slide-rules. Although mechanical calculators pre-date Blaise Pascal and Charles Babbage (an early victim of the Second System Effect), the ability to make reliable, commercial components began in the 20th century. Since the 1930s, there has been an increasingly rapid diffusion of mechanical and electronic systems. By the 1960s, large companies could afford a computer. By the 1980s, accountants could afford a computer. By the 1990s, relatively poor people could afford a computer for entertainment. By the 2010s, it was possible to shoot and edit digital films using pocket devices. Only relatively recently has is become normal to sell computers using computers - and it is much more recent to retail computers over the Internet.

In the 1970s, downward compatibility wasn't a concern and clean-sheet designs were the norm. A napkin sketch and 500 lines of assembly could save five salaries. Every doubling of transistor count added scope for similar projects. By 1977, there was the very real fear that robots would take all of our jobs and that there would be rioting in the streets. This was perhaps premature by 40 years. However, when Fred Brooks wrote the Mythical Man Month, the trope of the boy genius bedroom programmer had already been established and debunked. By 1977, autonomous tractors had already been prototyped. By 1980, people like Clive Sinclair were already planning fully electric, highway, autonomous vehicles guided by nothing more than an 8 bit Z80. (Perhaps this is possible but requires a exaflop of processing power to refine the algorithm?)

Mini-computer Operating Systems were in a huge state of flux in the 1970s. Unfortunately, technical merits were unresolved during the transition to micro-computers. This has created a huge amount of turbulence over more than 40 years. Even in 2001, it was not obvious that more than one billion people would use a pocket Unix system as a primary computer. Unfortunately, GUI in 2020 is as varied as OS in 1980 - and I doubt that we'll have anything sane and consistent before instruction sets fragment.

Every decade or so, process shrink spawns another computer market. However, the jump from transistor mini-computers to silicon chips was different because it slowly ate everything that came before. Initially it was the workstations and the departmental servers. Nowadays, the average mainframe is an Intel Xeon with all of the error checking features enabled. And, until recently, supercomputing had devolved into an international pissing match of who could assemble and run the largest x86 Linux cluster. Unfortunately, that has only been disrupted by another diffusion of industrialization. In the 1980s, the firehose of incompatible systems from the US overspilled and mixed with the firehose of incompatible systems from the UK and then flooded into other countries. While the saga of Microsoft and Apple in the US is relatively well know, a similar situation occurred in the UK with Acorn and Sinclair. Meanwhile, exports from the UK led to curious products, such as a ZX Spectrum with BASIC keywords translated into Spanish. Or a Yugoslavian design inspired by the ZX Spectrum but with 1/4 of the ROM. (The optional, second 4KB EPROM provided advanced functions.) Indeed, anything which strayed near the Iron Curtain was grabbed, cloned or throughly inspected for ideas. This has led to an unusual number of Russian and Chinese designs which are based on Californian RISC designs. If the self-reported figures from China are believed then the most powerful computer in the world consisted of 10 million cores; loosely inspired by DEC Alpha.

1980s computing was heavily characterized by MOS Technology's 6502 and Zilog's Z80 which were directly copied from Motorola's 6800 and Intel's 8080. Zilog's design was a superset of an obsolete but familiar design. However, the 6502 architecture was intended to be a cheaper, nastier, almost pin compatible design which undercut Motorola and stole goodwill. Regardless, 6502 is a fine example of parsimony. Instructions were removed to the extent that it is the leading example of a processor architecture which does not define 2/3 of the opcodes in the first release. The intention was to make each chip smaller and therefore more numerous at the same scale as Motorola's 6800. It was also 1/6 of the price because Motorola was handling technical pre-sales for an almost identical product. There is also the matter that the 6502 designers had abandoned their work on the 6800 before defecting to MOS Technology. This considerably hobbled Motorola. Well, Motorola sued and won on a technicality. The financial impact allowed acquisition by Commodore where the design was milked until it was obsolete. And then it was milked further by price gouging, economies of scale and vertical integration. Zilog actively invested in design improvement with Z180, Z280 and Z380 extensions and mutually incompatible Z800, Z8000 and Z80000. However, 6502 and Z80 were largely cheap, ersatz, one-hit-wonders before customers migrated to Motorola and Intel. During this period, it was Motorola - not Intel - which was known for the craziest heatsinks and chip packages. Likewise, it was Microsoft - not Apple - which had the most precarious finances. The rôles change but the actors don't.

In 1982, the UK had an estimated 400 micro-computer companies. The US had thousands. Many had vaporware, zero sales or only software. Even the successful companies had numerous failures. Apple failed with the Apple 3 and Apple Lisa before success with Apple Macintosh. Acorn had an inordinate number of the cost-reduced Acorn Electron. Quite infamously, Atari dumped an inordinate number of game cartridges in the Californian desert. By 1984, home computing had become a tired fad. In 1979, it was common for a home computer to have 128 bytes RAM and a 256 byte monitor program. By 1984, 128KB RAM and text menu directory browsing was common. Casual customers were bored with Space Invaders, Pac-Man and Centipede but the economies of scale aided academia, industry and the development of 16 bit systems.

1979-1983 was a period of economic trouble and Reagan/Thatcher economics. It also overlapped with an economic bubble in the computer industry. The end of that tech bubble was fueled by the Great DRAM Fire Of 1983. A manufacturing disaster caused a shortage. That stimulated demand. That keep the fad running. In the 2010s, DRAM was often sold in powers of two: one gigabyte, two gigabytes, four gigabytes. In the 1980s, DRAM was often sold in powers of four: four kilobits, 16 kilobits, 64 kilobits. A shortage of the newly introduced 256 kilobit chips caused a comical shortage elsewhere. Imagine a shortage of $1 dollar bills causing people to use $0.25 quarters, the shortage of quarters causing people to use $0.05 nickels and the shortage of nickels causing people to use $0.01 pennies. This type of lunacy is entirely normal in the computer industry. A system with 512KB RAM would ordinarily require 16 chips. However, the shortage led to bodge-boards with 64 (or considerably more) chips. The harddisk shortage in 2007 was minor compared to this comedy. Although, depressingly, the cause was similar.

Moore's law of computing is the observation that transistor count doubles every two years. A re-statement of this law is that computing requires an additional bit of addressing every two years. With the full benefit of hindsight, the obtuse 20 addressing scheme of the 8086 processor architecture gave Intel an extra four spins of Moore's law. Meanwhile, every 16 bit addressing scheme became increasingly mired in bank switching. Bank switching is perfectly acceptable within a virtual machine or micro-coded processor implementation. However, if every idiot programmer has to handle every instance of bank switching in every part of a program then the result is going to be bloated and flaky. Unfortunately, in the 1980s, the typical compiler, interpreter or virtual machine added an overhead of at least 10:1. That put any sane implementation at least three generations (six years) behind. To quote Tim Cook, "No-one buys sour milk." As I noted in Jul 2018:-

With the exception of some market leaders, the majority of organizations grow slower than Moore's law and the related laws for bandwidth and image quality until their function becomes trivial. As an example, it is difficult to write a spell check within 128KB RAM. A dictionary of words is typically larger than 128KB and word stemming was quite awkward to implement at speed. For this reason, when Microsoft Word required a spell check function, Microsoft merely acquired a company with a working implementation. It seems outrageous to acquire a company to obtain a spell check. It can be written very concisely in a scripting language but that doesn't work on an 8MHz system with 128KB RAM. Likewise, it is difficult to write a search engine within 16MB RAM but trivial to write in a scripting language with 1GB RAM.

While Acorn introduced people to the joys of "Sideways RAM", Intel had the luxury of repeatedly failing with 80186, 80286 and 432 (and subsequently 860, 960, Itanium and probably others). Indeed, the gap from 8086 to 80386 is about eight years and the gap to 80586 is also about eight years. Meanwhile, Microsoft scooped up the DataPoint 2200, Intel 8080 and CP/M weenies and consolidated a monoply while giving customers every insecure or easy to implement feature. We can speculate about IBM buying micro-processors from Intel with AMD as a hypothetical second-source. However, a possible consideration was avoiding high-drama characters, such as Clive Sinclair, Frederico Faggin, Chuck Peddle, Jack Tramiel, Steve Jobs and Gary Kildare.

There were so many opportunities for x86 to not dominate. For example, the 6502 architecture was released in 1976. By 1977, Atari began work on 6516: a clock-cycle accurate, downward compatible, 16 bit extension. Unfortunately, the project was abandoned. When Apple received the (unrelated) 65816 which met this criteria, it was deliberately underclocked to ensure that it was slower than the first Apple Macintosh. Acorn could have made ARM binary compatible with 6502. However, I've previously noted that such compatibility is easiest when starting from ARMv6 with Thumb extensions - which itself cribs from every iteration of Intel Pentium MMX. And Commodore? Which owned MOS Technology? This is the same Commodore which subsequently spent USD0.5 million to reverse engineer its own Amiga chips because its lost the plans. Similar opportunities were missed with Z80, RCA1802, TMS9900, MC68000, NS32000 and others, although possibly not as numerous. It was also possible that IBM chose another architecture, although it was unlikely to be from a direct competitor, such as RCA.

Boot-strapping a computer is a crucial consideration. It is usually performed with the previous generation of hardware. Specifically, early 8 bit home computers couldn't self-host. Work outsourced to Microsoft was initially assembled on a rented mini-computer. Work outsourced to Shepardson Microsystems, such as Apple's 8 bit ProDOS and Atari BASIC, was assembled on a Data General mini-computer. Perhaps the latter systems could self-host but I am unaware of any serious effort to attempt it. Acorn and Apple, who are in many ways trans-Atlantic fraternal twins, both started with a 6502 and a 256 byte monitor program. However, that doesn't imply that either system was fully self-hosted. For example, when a Commodore PET production delay led to delayed royalties to Microsoft, Apple switched from Steve Wozniak's Integer BASIC to Microsoft's floating point BASIC. From that point onwards, many Apple customers relied upon software which has been cross-assembled from a mini-computer. Likewise, the first version of Apple's ProDOS was written in 35 days on a mini-computer using punch card. It was a similar situation for Z80 systems. For example, Japanese MSX computers which used various extensions of Microsoft BASIC.

Like Russia and China, Japan has its own twist on technology. That includes its own fork of ICL mainframes, its own fork of 8086 and numerous Z80 systems from Casio, Sega, Sharp and others. It is a minor sport in Japan to port NetBSD to yet another Z80 system. However, the proliferation of Z80 systems within Japan does not explain the widespread availability of Z80 systems outside of Japan. This is due to the history of Japan's industrialization. Japan was particularly committed to exporting quality electronics after World War 2. However, Japan's electricity grid has aided export to rich consumers in the developed world. Specifically, Japan's first two public electrical generators were a European 50Hz generator and a US 60Hz generator. Inevitably, Japan doesn't use a single mains frequency throughout the country. This has the advantage that domestic electronics products are invariably suitable for global export. This has contributed to Japanese games consoles from multiple manufacturers being common in Europe and North America. The disadvantage to mixed 50Hz/60Hz mains came after the Fukushima nuclear disaster. Relatively little power can be transferred over DC grid ties. Ordinarily, this is sufficient to balance power. However, it was insufficient to prevent power cuts in Tokyo despite surplus generator capacity.

Anyhow, when Collapse OS started, Z80 was the most common and workable micro-processor which could be adapted with a 15 Watt soldering iron. Unlike many other designs, such as 6502, Z80 remains in production. Should this change due to industrial collapse, numerous examples are available at DIP scale. Unfortunately, programming the things is *horrible*. Worse that 8086. Most significantly, everything takes a long time. Like the RCA1802 and early PIC, Z80 uses a four phase clock. Despite the Z80 being released in 1977, the cycle-efficient eZ80 was only released in 2001. In general, 4MHz Z80 has similar bus bandwidth to 2MHz 6502. However, despite the Z80 having at least four times as many registers and instructions, there are places where Z80 is inferior to 6502 or other choices.

Connectivity between Z80 registers is poor. Transfer via stack covers all cases. However, that's inane. It is particularly slow due to stack operations which only work on register pairs. One of these pairs is the accumulator and flags. This arrangement is not upwardly compatible with 16 bit, 32 bit or 64 bit extensions. It is for this reason that Z800, Z8000, Z80000 and x86 separate these fused registers. When not using stack, instruction encodings allow reference to seven registers and a memory reference. One memory reference. Which is terrible for traversing data structures. A linked list is the most trivial case. There are idioms and workarounds. However, they have pointless limitations. For example, there are index registers which escape the memory reference. However, they are not downwardly compatible with 8080 or work independently of the alternate register set which may be reserved for interrupts. Furthermore, handling index register upper and lower bytes separately explicitly breaks upward compatibility with Z180. So, it is possible to use the escaped index registers in a manner which is neither upward compatible, downward compatible or interrupt compatible.

I mention such detail because I admire the shear bloody mindedness of self-hosting a Z80 Operating System on a Sega Master System, in 8KB RAM. This is an art which has fallen out of fashion since the late 1970s but could be urgently needed.

In its current form, Collapse OS optionally uses a PS/2 keyboard or Sega joypad. It optionally runs on an RC2014 Z80 system. It implements software SPI to maintain its own storage format on MicroSD cards. I appreciate the quirky storage format. It has been a historical problem. Indeed, when working on a quiz game buzzer, a feature request to play sound samples led to an investigation of playing WAV or MP3 from MicroSD. Read only access to FAT32 is by far the most difficult part. That's more difficult than decoding and playing MP3 without dropping sound samples.

There are three major components in the core of Collapse OS: monitor, line editor and assembler. These components and all of the lesser components can be assembled in two passes while accumulating no more than 8KB of state. Likewise for linking. Obviously, such a system can be expanded outwards. Maybe a better text editor, a cross-assembler or compiler. A stated intention is to migrate to AVR. However, this does not have to be self-hosting. It is sufficient to self-host on Z80 and use such a system to program AVR micro-controllers, such as commonly used in Arduino hardware. Again, I admire the intention to program an Arduino from a Sega Master System.

In support of Collapse OS, I attempted to port Steve Wozniak's SWEET16 to Z80. Perhaps this is not a beginner's project for Z80. However, it is possible that SWEET16 and a 6502 emulator are the first and last things I ever write in Z80. Outside of Collapse OS, I may write other Z80 assembly. For example, a Z80 to AVR cross-assembler has considerable merit but requires extensive test programs. Indeed, a chain of interpreters and cross-assemblers gain a network effect. Specifically, Forth interpreted on SWEET16 interpreted on 6502 interpreted on Z80 cross-assembled to AVR. SWEET16 on 6502 on AVR is publicly available. Likewise for Forth on 6502. SWEET16 on Z80 offers more options. In all cases, it limits execution to one layer of interpreter overhead. Given the relative efficiency of 4MHz Z80 versus 20MHz AVR, many of these options equal or exceed native Z80 for speed, size, legibility and portability.

Collapse OS is broadly aligned with my effort to implement a 64 bit extension to 6502. Firstly, there is the shared goal of undoing a large amount of technical debt. Secondly, there is the shared goal of self-hosting on DIP scale hardware with considerably less than 2^31 bytes of state. Thirdly, there is the shared goal of AVR as a possible target architecture. However, we differ as much as we agree. This includes storage format, instruction set, implementation language and user interface. I encourage old 6502 applications on new hardware. Collapse OS encourages new applications on old Z80 hardware. My vaporware is a multi-core, pre-emptive, networking, graphical system with a dedicated card bus system. Whereas, Collapse OS is a working, single-tasking, command line system which runs on legacy hardware.

Regardless, we both strongly agree that the current level of bloat is unmanageable. It manifests as seemingly unrelated problems, such as RowHammer, buffer overflow or the inability to repair hardware. However, it is a symptom of laziness and externalized cost which has been ongoing for decades. For example, there was a period in the 1990s where multiple platforms attempted the stretch goals of implementing downward compatibility, upward compatibility, modular re-use of software and migration to new hardware at minimal cost while entering new markets. This includes Apple's Copland, Microsoft's Chicago (the full implementation, not Windows95) and Sega's 32X. It also includes 3DO, a vaporware games console which could be expanded into a USD10,000 Virtual Reality system. Much of this is deprecated. However, some of this technical debt has been inflated but not paid in full. For example, the Java Applets and ActiveX which remain in use can be traced back to this era. Much of the vendor bloat in POSIX, SNMP and BGP also began in this era.

I've previously mentioned that a Linux system with 512MB RAM is typically regarded as an embedded system because it typically doesn't self-host a C compiler. Specifically, a Raspberry Pi running Raspbian is unable to compile GCC or LLVM with the default compiler settings because the result exceeds the per-process limit 2GB virtual memory. With a mere change of compiler settings, it is possible to get this down to 360MB. However, it still requires 30-45 hours to compile. For similar reasons, experts recommend that compiling the optional 22,000 packages of FreeBSD should not be attempted without 48 hardware threads and a minimum of 2GB RAM per thread. Obviously, a system with a minimum of 96GB RAM is overwhelmingly likely to be 64 bit. Regardless, such a system will still get snagged on heavyweight packages, such as GCC, LLVM, MySQL Server, Postgres, Firefox, Chrome, OpenOffice, GIMP, QEMU, JBoss and ffmpeg.

Wirth's law of software bloat is such that 2^31 bytes memory is dicey for compiling a popular application. I target 2^24 bytes - in part so I don't have to solder more than 3×8 bit latches or 32×128KB static RAM chips. 2^16 bytes was a threshold avoided in the 1980s because it risked delay or failure. And yet, Collapse OS happily self-hosts in 2^13 bytes. This leads to multiple questions.

Is it possible to self-host in less than 2^13 bytes? Unknown but systems from the 1980s suggest yes.

  • A Sinclair ZX Spectrum has a Z80, 16KB ROM and 16KB or 48KB RAM. This is a minimum of 2^15 bytes.
  • A Jupiter Ace makes an exceptionally poor first impression. It could be mistaken for a ZX81 if it didn't have a Forth compiler. It has a Z80. It has 8KB ROM of which 3KB is Operating System and 5KB is Forth keywords. It also has two synchronous banks of RAM: 2KB for monochrome display and 1KB for application. If functionality is trimmed, this would be 2^14 bytes.
  • The 4KB ROM B of a Z80 Galaksija is broadly equivalent to the functionality of Collapse OS. Including state for assembly, this is 2^13 bytes.
  • TinyBASIC is 4KB on 8080 and Z80. However, traditional implementations use two layers of interpreter and have terrible performance. Regardless, state remains 2^13 bytes.
  • Older systems self-host with less state. However, they may require, for example, micro-code which exceeds the complexity of Z80.

An alternative question is: How much software fits into a micro-controller with 16 bit address-space? Empirically, I've discovered that within an Atmel AVR AtMega328P with 32KB EEPROM there is space for six or more dialects of 6502 interpreter. I've also discovered that my own implementation of Arduino digital I/O and time delay requires less than 2KB. A trivial bytecode interpreter requires 2-4KB. Cell networking with error correction requires a maximum of 6.5KB. If a system uses a combination of native functions and Forth style bytecode, it is possible to fit Arduino style tutorial programs, buzzer game, alarm clock with I2C RTC, analog servo control, digital LEDs, power control, cell networking protocol and multiple user applications. With consideration for embedded programming standards, this is all suitable for automotive and medical use. Furthermore, this can be self-hosting and accessed via serial from Windows, Mac or Linux with no software install.

Obviously, there are disadvantages. The suggested micro-controller's EEPROM is only suitable for 10,000 writes. Therefore, to extend this limitation, a wear leveling and/or a bad block scheme should be considered. However, the major disadvantage is the anything like Forth is a hugely impenetrable mess which makes Perl look pleasant. It could be made more similar to BASIC or Lua. Alternatively, it could be more like Java. From the outside, Java looks like a simplified version of C++ without pointers or multiple inheritance. This is a con. From the inside, Java looks like Forth had a fight with 6502, Z80 and 8086 and they all lost. Specifically, it has a stack with 16 bit alignment, zero page and four register references. This raises many questions about bytecode, block structure and graph-coloring which I may summarize separately.

A limitation common to many programming languages is the cluttered name-space of function names. Object oriented languages typically arrange this into a strict tree hierarchy. Forth with a loose class hierarchy would be a considerable advantage. In addition to an explicit named tree structure, it partially solves the twin problems of line editing and execution order. BASIC typically solves this with line numbers and an absence of block structure. This is a particular problem because GOTO in Microsoft BASIC typically has O(n^2) overhead. This can be replaced with a directory structure of block structured one-liners and a significantly faster interpreter.

Forth has a traditional representation in which token names are stored as a linked list and each element begins with a one byte value (five bits for length of name, three bits for flags). This could be replaced with a tree of names with very little overhead, if any. (In both cases, a name is a 16 bit pointer.) If not using the traditional representation, it may also be desirable to use a variation of DEC Radix50 encoding. Specifically, 40^3 = 64,000. Therefore, it is possible to store three alpha-numeric characters in two bytes. There will be a little overhead for encode and decode routines. However, in aggregate, it'll save space. (With hindsight, I am unsure why this is not implemented more widely.) It may be worthwhile to take further ideas from Jackpot, one of James Gosling's lesser known projects to edit bytecode. Or perhaps Jack, a simplified version of Java used in some of the exercises in FromNAND2Tetris.

When I suggest that it is possible to make a network switch with 1KB RAM or remote desktop into micro-controller with all the finesse of Commodore 64 graphics, I also suggest that functionality can be modified without reboot or loss of data. Obviously, that's a dangerous proposition for deployment. However, it provides significant options for development and testing in a field where feedback is notoriously poor.

Sunday March 22, 20
04:04 PM
Hardware

Statement Of Problem

I've made several attempts to implement a unique and useful processor architecture. These have all been fatally flawed for various reasons. In chronological order, this includes:-

After resolving this with a nybble virtual machine and assembler, I realized that I have been designing perfectly secure processor architectures because they have no compiler, no operating system and no applications. Therefore, I re-state the problem. I wish to implement a system where it is possible to efficiently compile and run code, such as C. It may also be an advantage to run legacy binaries. This may provide an initial set of applications to aid boot-strap.

Rationale

The laziest way to provide compiler support is to extend an architecture. The easiest architectures to extend are the oldest ones with the least transistors. There are plenty of candidates. After the scatter-shot development of micro-processors in the mid 1970s and the scatter-shot retailing of home computers in the early 1980s, leading candidates include 6502 and Z80. Oddly, both were cheap ersatz copies of other architectures. Both had numerous forks and derivatives. Both are dying but not quite dead architectures. Admittedly, MOSTek and Zilog were both high-drama and this was a contributing factor to IBM being the inadvertent king-maker for Wintel dominance.

Unfortunately, 40 years of x86 development has built a huge pile of technical debt. Spectre, Meltdown and the 22 or so variants of processor flaws which affect multiple processor architectures from multiple companies. This will continue without end. Chips are specified using a hardware description language. Similar to programming languages, there are numerous incompatible dialects of numerous incompatible languages. Similar to programming, hardware descriptions have one bug per 50 lines and one critical bug per 50000 lines. How many bugs are in modern processors? We'll have to start with the dialect of hard description language and the length of the definition. Unfortunately, that's all secret. Continued development under this model implies that critical bugs will be added more frequently than they will be removed. Open development occasionally withstands scutiny but it has a severe fragmentation problem.

I propose a second path of development from the early micro-processors to present in which we avoid known errors. And if this fails, I suggest a third or fourth attempt. Essentially, pretend that x86 never occurred nor the constant churn of RISC processors which chase its market share. And so, with great reluctance, I announce a 64 bit extension to 6502. There have been numerous 16 bit extensions including 65816, 65Org02 and Atari's failed 6516. There have been numerous 32 bit extensions including 65j02, 65GZ032 and the failed Terbium. Most notably, QVC shipped 450,000 game consoles with 65j02 processor. However, I am unable to find reference to a 64 bit extension or any extension implemented in C. I am able to find a fairly faithful 8 bit implementation of 6502 written in AVR assembly. On 20MHz ATMEGA328P (used in Arduino Nano), it uses 2KB ROM and emulates 1MHz 6502. Like many projects in this field, the firmware is free for non-commercial use. Although it may co-exist with legacy hardware and software, it doesn't appear to be actively developed. On the same micro-controller, I found that 32KB ROM is sufficient to implement a 64 bit extension which maintains similar compatibility with 6502 and full compatibility with Arduino.

I've been asked why I didn't extend a different architecture; most frequently Z80, 8088 or MC68000. Z80 has already been extended unsuccessfully with Z180, Z280 and Z380 - and used as a template for the unsuccessful Z800, Z8000 and Z80000. Of most note is the cycle efficient eZ80 which was released in 2001. 8088 was extended by NEC and then deprecated in favor of ARM. MC68000 was extended by Motorola and then deprecated in favor of PowerPC. I'm concerned that further attempts at processor architecture extension would be too similar to x86 and PowerPC. I'm also concerned that ARM is dying. It is being squeezed out of the market by security problems and strong competitors with Russian and Chinese support. This includes MIPS with better compiler support, Alpha derivative Sunway with better performance, Xtensa with better code density and RISC-V with better code density and licensing. Unfortunately, ARM's most obvious replacement, RISC-V, is going to fragment horribly. We've had Linux fragmentation and Android fragmentation. Now we're going to have RISC-V fragmentation. Don't believe me? Take an example from 6502 development. Take an open source hardware description. Double the width of the data paths. Double the address lines. Upload to code repository. The result is numerous incompatible variants. (Most impressively, one variant runs at 70MHz on FPGA and is fast enough to display 1024×768 VGA.) Now consider RISC-V, with more extensions than C++ and multiple open source implementations. This is going to get very messy.

So Few Opcodes!

6502 has some curious properties for architecture extension and perhaps I'm in a long line of people who have been suckered by them. Firstly, 6502 doesn't have any prefix instructions. It is far easier to start from an architecture which doesn't have prefixes or mode bits. Secondly, the only mode bits are decimal mode (widely ignored) and an interrupt mask. Thirdly, it is preferable to have no user flags but, if you must have flags, the negative, overflow and carry flags are in the most convenient place for emulation. Someone was thinking ahead. Fourthly, the original implementation has 151 instructions which are defined in convenient blocks. Instruction sets invariably use more than 2/3 of available bit patterns in the first iteration. The history of 6502 development provides a counter-example. Most conveniently, in the base specification, all instructions of the form xxxxxx11 are undefined. That's a contiguous 1/4 of the instruction set. The original NMOS implementation was quite sparse and executed undefined opcodes without error. Most commonly, this was exploited to simultaneously load one constant into two registers. 65C02, 65CE02 and 65816 replace these instructions with mutually incompatible instructions. Indeed, there are at least six mutually incompatible extensions used in military, industrial and consumer products. However, the original 151 instructions are implemented consistently (outside of decimal mode). Indeed, the core instruction set is preferable to Java as a bytecode format due to compactness and simplicity.

I propose a continuation of 6502 dialects. Initially, each dialect will implement the original 151 instructions of 6502 in full. Eventually, it is possible to re-define deprecated opcodes. This is common practice on x86. For example, on 8086, 0xF0 means decrement stack segment. Nowadays, it is a prefix instruction for SIMD and atomic operations; most notably the 0xF00F bug. If applied to 6502, how far could this process be taken? Potentially, it is possible to implement a hybrid 6502/Thumb encoding or a hybrid 6502/RISC-V encoding. ARM and RISC-V have 32 bit instructions and an optional 16 bit short encoding. ARM has a 4 bit conditional execution field and the complement to unconditional execution is used for a compact representation of 4096 instructions within 16 bits. With the benefit of hindsight, RISC-V uses a 2 bit field to provide 49152 compact instructions in the same space. Some bit-fields may have to shuffled around but it is possible to splice 6502 semantics with ARM's short encoding and/or RISC-V's long encoding. I don't suggest this as a practical exercise (although the hardware definitions are publicly available). However, it demonstrates where a proto-RISC architecture from the 1970s can be taken with the hindsight of MIPS, ARM, RISC-V and other processor architectures.

I strongly recommend that downward compatibility is maintained. In particular, the vast majority of code should run in a 16 bit address-space or small extension thereof. This is for multiple reasons. Firstly, an unmodified compiler only supports core opcodes running in the smallest address-space. Secondly, it eliminates bound checks and exception handling for the poor micro-controller which will be running 6502 emulation. This saves a significant amount of processing per memory cycle. In this scenario, each application gets one or more 64KB segments and cannot stray outside of it. Thirdly, use of 16 bit pointers is part of a strategy to save cost, save energy, save space and save time while increasing reliability and trustworthiness.

Just Add Cruft

I've taken my ideal processor architecture (7 bit extensible instructions, 8-64 bit data with automatic SIMD, 8 symmetric registers, no user flags, 3 aligned stacks, signed multiply with optional accumulate, 1 operand optionally sourced from aligned memory) and glommed it to 6502 (8 bit instructions, 8 bit data, 2 asymmetric index registers, user flag register, barely performs addition, 1 operand optionally sourced from unaligned memory). The result is like ARMv8 in Thumb mode with Crossfire's SIMD and prefixed 6502 instructions to provide memory operations. This retains a surprising amount of functionality. It also has good instruction density. In particular, 8 bit operations use 8 bit instructions. Longer operations use longer instructions. This is a property which has made x86 successful. I prefer RISC with clean, symmetric register files and flat, linear memory. Apparently, the market prefers cruft. Why? It is a mix of Amdahl's law of diminishing returns and Kay's law. ("Simple things should be simple. Complex things should be possible.") This applies to increased sizes of address and data. When this is not the case, a processor architecture is retired. In general:-

  • A 1-address or 2-address design is sufficient for business tasks, such as date calculation.
  • A 3-address RISC design is preferable for heavy tasks, such as FFT, matrix multiplication and sorting.
  • A crufty design can do both with relative efficiency.
  • A RISC design with compact encoding (ARM, RISC-V, Xtensa) is a poor approximation of crufty design.

I am not pleased that 8 bit cruft is essential to a competitive processor architecture. However, 6502 is a relatively small piece of cruft which is widely understood and widely supported. Furthermore, it is possible to migrate partially or fully to contemporary instruction encodings which also have compiler support. Overall:-

  • It is Arduino compatible. Arbitrary extensions to the 6502 instruction set can be written in C/C++ and/or linked with blinky light libraries and suchlike.
  • It is more than 20 times faster than my previous proposal.
  • It is less than 1/20 of the price of my previous proposal.
  • It is significantly easier to program.
  • It has better instruction density than ARM.
  • It is faster to emulate than ARM.
  • The emulator is more compact than ARM.
  • It works with CC65, the 6502 port of TinyC.
  • It runs Acorn BASIC, Apple BASIC and Microsoft BASIC.
  • It also runs Space Invaders, PacMan and - to quote Micro Men - Jet Set F*cking Willy.

Hardware

I am now working on a board which allows a 6502 to be replaced with an Arduino Nano or similar. This is for the development of a system which is entirely detached from the ongoing clusterf*cks of Wintel, GNU/Linux, systemd and Android. As an example, the Feb 2020 flap involving systemd-crond could have been averted with simple-cron, which I published in Jun 2016. Yeah, guys. Keep downloading random binaries. One day, you'll get one which works.

I spent an inordinate amount of time devising a state machine. However, the result was worthwhile. Indeed, it may be of general interest to anyone who wants to support or extend a variety of processor architectures on a micro-controller using about 14 I/O pins. This includes 6502, 6507, 6510, 65816, 6800, 6809, 6811, 8008, 8051, 8080, 8085, 8088, Z80, 68008 or similar. The critical part of the design is 3-4 multiplexed control lines which are upwardly compatible. This allows migration of processor emulation from legacy host to standalone card bus system. This can be achieved using one implementation of firmware. If downward compatibility is broken and firmware varied, it is possible to re-purpose some legacy states of the multiplexed control lines and implement a card bus system with a significantly larger address-space. However, it remains vastly preferable for each application to run within a highly restricted address-space.

A board using one common version of firmware may have:-

  • Optional legacy bus interface.
  • Optional I2C bus with optional boot ROM, clock, thermometer and suchlike.
  • Optional card bus interface. A system without card bus interface is indistinguishable from a system with no cards.

The migration path is as follows:-

  • Minimal implementation with one demultiplexer chip. State machine allows interrupt poll, bus arbitration and a cycle of six states to perform read or write of one byte from legacy host. Backtracking through states allows descending/ascending sequential read/write or atomic operations.

  • Implementation with boot ROM, clock and persistent memory. Introduces a legacy boot phase where legacy host is identified (Acorn, Apple, Atari, Commodore, other) and boot menu allows configuration. Optional breakpoints allow a range of compatibility versus acceleration options. For example, breakpoint on legacy host's floating point multiply routine allows acceleration of all Taylor approximations and Newton-Raphson approximations. It is also possible to implement an instruction cache and/or hold some or all of a 6502's infamous zero page within a micro-controller. Unfortunately, many Commodore computers use memory location zero for a 4 bit program/4 bit data page banking latch. This allocation is incompatible with Apple's SWEET16 virtual machine. Furthermore, every manufacturer has incompatible page banking. Anyhow, default configuration varies by host.

  • Implementation with extended memory. Requires two demultiplexer chips. Allows text mode windowing and co-operative multi-tasking in separate address-spaces.

  • Implementation with legacy host interface and card bus interface. Provides access to improved audio, video, storage and network. This may be laughably quaint by contemporary standards but it is highly secure. For example, it is suitable to trunk and route encrypted AMR-WB+.

  • Implementation with card bus interface only. May support 144 hot-swap cards in 19 inch rack. May support multi-core cross-bar implementation. May support Contiki, CollapseOS, POSIX kernel and/or XWindows. May be prototyped and shrunk to SoC implementation; possibly with cards implemented as chips with through vias. In sufficient volume production, this would be smaller, cheaper, faster, more energy efficient and more stable than a Raspberry Pi.

State Diagram

Bit patterns for states may be shuffled and/or flipped to facilitate wiring. Columns 1-2 provide states for legacy host. Columns 2-4 provide states for card bus interface. Column 5 allows re-use of legacy hosts states and therefore uses duplicate numbering.

+-------------+     +-------------+
|             |     | State 0000: |
|    Begin    |---->| Read        |
|             |     | Interrupts  |
+-------------+     +-------------+
                           |
                           |
                           v
                    +-------------+
                    | State 0001*:|
                    | Write Addr  |
                    | Bits 16-23† |
                    +-------------+
                           ^
                           |
                           v
                    +-------------+     +-------------+     +-------------+     +-------------+
                    | State 0011: |     | State 1011: |     | State 1010: |     | State 0010‡:|
                    | Write Addr  |<--->| Write Addr  |<--->| Write Addr  |<--->| Write Addr  |
                    | Bits 8-15†  |     | Bits 16-23† |     | Bits 24-31  |     | Bits 48-55  |
                    +-------------+     +-------------+     +-------------+     +-------------+
                           ^                   ^                   ^                   ^
                           |                   |                   |                   |
                           v                   v                   v                   v
+-------------+     +-------------+     +-------------+     +-------------+     +-------------+
| State 0110: |     | State 0111: |     | State 1111: |     | State 1110: |     | State 0110: |
| Bus         |<--->| Write Addr  |<--->| Write Addr  |<--->| Write Addr  |<--->| Write Addr  |
| Turnaround  |     | Bits 0-7†   |     | Bits 8-15†  |     | Bits 32-39  |     | Bits 40-47  |
+-------------+     +-------------+     +-------------+     +-------------+     +-------------+
       ^                   ^                   ^
       |                   |                   |
       v                   v                   v
+-------------+     +-------------+     +-------------+     +-------------+
| State 0010‡:|     | State 0101‡:|     | State 1101: |     | State 1001: |
| Read Data   |     | Write Data  |<--->| Write Addr  |<--->| Bus         |
| Bits 0-7    |     | Bits 0-7    |     | Bits 0-7    |     | Turnaround  |
+-------------+     +-------------+     +-------------+     +-------------+
       |                   |                   ^                   ^
       |                   |                   |                   |
       |                   v                   v                   v
       |            +-------------+     +-------------+     +-------------+
       |            | State 0100: |     | State 1100: |     | State 1000: |
       |            | Bus         |<----| Write Data  |<----| Read Data   |
       |            | Turnaround  |     | Bits 0-7    |     | Bits 0-7    |
       |            +-------------+     +-------------+     +-------------+
       |                   |                                       |
       |                   |                                       |
       |                   v                                       |
       |            +-------------+                                |
       |            |             |                                |
       +----------->|     End     |<-------------------------------+
                    |             |
                    +-------------+

\                                 /     \                                 /     \             /
 \______________   ______________/       \______________   ______________/       \____   ____/
                \ /                                     \ /                           \ /
                 V                                       V                             V
         Legacy Bus States                   Optional Card Bus System           More Addresses
       (Requires one 74x138)               (Requires additional 74x138)

* May be sacrificial state in some modes of operation. May be card select in some modes of operation.
† Address bit range differs in some modes of operation.
‡ Legacy bus read is mutually incompatible with 64-88 bit addressing. Legacy bus write is mutually incompatible with 80-88 bit addressing.