Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


The Internet

Posted by The Mighty Buzzard on Saturday January 27 2018, @07:34PM (#2953)
7 Comments
/dev/random

I think I won it today. I managed to single-handedly fend off the communist and socialist hordes with logic and reason, only garnering a very few disagreements remotely based in reason and none that could not be refuted.

I'm done with that article now though. I can't be spending all day educating the ignorant. That and it's nap time.

bob_super wizard

Posted by takyon on Friday January 26 2018, @03:38PM (#2950)
2 Comments
/dev/random

Found this comment while I was searching for NASA GOLD stuff.

Zuma!

Grope Central

Posted by takyon on Wednesday January 24 2018, @11:58AM (#2946)
15 Comments

Fun For the Whole Family

Posted by The Mighty Buzzard on Sunday January 21 2018, @09:18PM (#2942)
79 Comments
/dev/random

You know what's fun? Making regressive looter shitheads lose their entire mind by asking them to rationally and logically explain their position without trying to claim "muh feelz" as a valid argument. Ninety-nine out of a hundred of them won't be able to do it and will lose their shit on the spot. The one left over will be able to but more than half the time they'll have some foundational assumption that cannot be chalked up to anything but feelz.

Reminder for Intel Apologists

Posted by takyon on Sunday January 21 2018, @12:37PM (#2941)
3 Comments
Business

Intel Has a Big Problem. It Needs to Act Like It

During the six months Intel was quietly working to try to fix the vulnerabilities, Krzanich sold $24 million in company shares. Intel says the stock sale was part of a plan that had been in place before anyone there knew about Meltdown or Spectre, but the day after Krzanich’s CES speech, two U.S. senators sent letters to the Securities and Exchange Commission and the Department of Justice demanding investigations. Consumer and shareholder lawyers have filed a dozen class actions against Intel, and there are few signs the pressure will let up on Krzanich anytime soon. In a research note, an analyst for Sanford C. Bernstein & Co. called the stock sale “indefensible.”

My Ideal Alarm Clock, Part 1

Posted by cafebabe on Tuesday January 16 2018, @08:15PM (#2932)
4 Comments
Hardware

After thinking about micro-controllers and high availability using a clock as an example, I realised that I'm in the unusual position of being able to make my ideal alarm clock. This is only slightly more advanced than the generic quiz game buzzer implementation with the addition of interrupts, LED multiplexing and switch de-bounce. This is all within my ability. Or, at least, it should be. Indeed, I've seen a founder of the Arduino Project struggle with an 8×8 LED matrix. This person also described frying pan technique for surface mount soldering. Well, I've seen their software and I've seen their hardware and I conclude that making an alarm clock can be achieved with similar perseverance. Specifically, I have:-

  • Derived an H-Bridge from first principles.
  • Likewise for MOSFET rectification.
  • Put a rectified sine wave through a D-Class amplifier to obtain bi-directional motor movement with more control than an H-Bridge.
  • Used a combination of tilt switches, analog integrator, VCO [Voltage Controlled Oscillator] and servos to make an approximation of the Nekko EEG Cat Ears.
  • Arranged decimal counter chips to light 2×10 LEDs in sequence.
  • Written PIC assembly, 6502 assembly, Arduino C and similar.
  • Written signal handling software in C and Perl.

However, I:-

  • Struggle to deploy compiled code due to the opaque protocols being used.
  • Dropped modulo operator functionality somewhere while avoiding patronizing user interfaces.
  • Failed to get DACs working but this may be Voltage level incompatibility.

I have reached the stage, which is common among electronic engineering students, where making a clock is the test of ability. This is not a certain project but much else remains theoretical if this does not occur. Anyhow, let's do this properly. We may even get a MISRA C alarm clock.

Specification

Functionality For Version 1: Buttons for inputs. Probably about six. Multiplexed, seven segment LED digits for output. May be jumbo size. Probably about six. Specifically, HH:MM:SS in 24 hour format only. One modulated square wave in audio range to drive speaker. Can be derived from software or multiple oscillators. One digital output stays high when alarm triggers. Can be used to make clock radio. Or, because we're geeks, drive a relay for coffee maker or similar. Serial output in format compatible with gpsd. In combination with ntpd, this allows home network time to be set from alarm clock.

Functionality For Version 2: Three channel PWM output for dawn light prior to alarm.

Functionality For Version 3: Battery back-up RTC [Real-Time Clock]. This allows clock to count time when unpowered.

Much of this can be implemented with decimal counter chips. Counter chips with decoded outputs are preferable for Nixie Tubes while BCD outputs are preferable for seven segment displays.

Full functionality requires software. Due to laziness, I suggest using an Arduino Nano (Atmel AVR micro-controller with 8KB ROM). This also allows clock to be powered from USB. For example, a wall-wart phone charger. With or without RTC, date handling is awkward. Will conceptually reserve two places of LED multiplexing for century. So, display may be CCYY/MM/DD or YY/MM/DD or HH:MM:SS. This requires either 8×7 outputs or 6×7 outputs. With either a four digit year or a two digit year, it is possible to calculate day of week and have a seven day alarm clock. The trivial use case is setting the alarm one hour later for Saturday and Sunday.

With some changes to software, it is possible to make joke clocks which don't have Monday. It may also be possible to make Julian/Gregorian, Chinese or Mayan mode. Nominally, day of week output uses LED seven segment decimal point or similar. Internal calculation may or may not start from Sunday being zero. If possible, day of week calculation should extend beyond Year 2099.

After thinking about this problem for three days, I wrote most of the software in one programming session. Buttons are defined as "prev", "next" "up", "down", alarm "toggle", "snooze", "start" and "stop". From this, it is possible to set year, month, day, hour, minute, second and seven alarms to hour, minute and second. Like the buzzer game, all buttons may be wired high or low. This also allows "toggle" to be a momentary switch or a latching switch. "Snooze" suppresses alarm for 180 seconds plus randon jitter. (The fixed 300 seconds of Symbian 70 allows me to fall asleep 18 times or more over 90 minutes or more.)

Pressing "stop" once cancels the alarm sound but leaves the relay active. Pressing "stop" again cancels the relay. Pressing "start" enables the relay but not the alarm sound. Unfortunately, without the serial functionality, the compiled program requires 7KB or the 8KB ROM. I've not exhausted the I/O of Arduino Due but have exhausted I/O of Arduino Nano with 16 multiplex outputs, 8 digital inputs and 3 digital outputs. I considered mixing common cathode and common anode seven segment displays. Effectively, half of the LEDs would be wired backwards in an attempt to get more output. Unfortunately, this is a subset of the infamous Charlieplexing where LEDs are lit by a difference of Voltage between I/O pins. Although it is possible to control n(n-1) segments for large n, it works best when the total lit segments is 10 or less. There are also current limits. Furthermore, wiring arrangements may be fragile.

It is possible to use external shift registers but multiplexing the multiplexing will make the software particularly obtuse and I may have difficulty getting this working. In the most extreme case, it is possible to reduce I/O to seven lines: input stream, output stream, clock, latch, speaker output, relay output and dedicated output to gpsd. I'm not controlling a mains relay via shift registers. Nor am I sending serial via a shift register.

I checked code quality and the majority of warnings derive from #include <Arduino.h>. Regardless, the case statements to set time are too long. This may also be why compiled output is 7KB. That's too much for a clock and significantly more functionality is planned. Switch de-bounce also requires improvement.

After an US$8 retail purchase of a mix bag of LEDs and LCDs, which, crazily, includes a 1 row × 16 column LCD text display, I'm also considering a really fancy power-on light test. Something like a PWM snaking figure-of-eight across each digit.

What Kind Of Shonky Software Do People Write?

Posted by cafebabe on Tuesday January 16 2018, @08:12PM (#2931)
2 Comments
Software

I've been curious about programming standards and now that I'm actively writing software for process control (rather than wrangling other people's software without success) now would be a good time to make the software as reliable as possible. Arguably, writing in C or C++ is the first mistake but it is hard to avoid when, for example, I2C devices only have library code written in C. Even a person who refused to deploy C would be required to read it and use it for reference tests.

I found an old, unauthorized copy of the MISRA C 2004 Specification. This was originally developed for vehicle safety and is now used in medical, aerospace and other industries. And now you can add the space cadets in the hydroponics industry.

I feared looking at MISRA because I was under the false impression that it involved bone-headed stuff like writing conditions backwards to prevent accidental assignment. Feel free to do that but the prefered method is static analysis. Overall, the MISRA guidelines are very good to the extent that I'm considering a license for the current version.

Reading the standard proceeds as follows. A right-minded person initially thinks "What kind of shonky code are people writing?" Sooner or later, this is followed by "Ah, I need to change something in my current project." This is followed by "I need to change something in a previous project." This is followed by "I need to change everything I've ever written." This is followed by the later realization that not even a typical example of "Hello World" is written correctly. (Specifically, no function prototype or library call return code check.)

Writing C to MISRA standard is an attainable goal but almost everything falls short. I mentioned this to some programmers. No-one thought any of it was idiotic but some parts of conversation began with "I don't do that because ..." and omissions included declarations, error checks and default cases. I defend multiple returns from a function under specific circumstances in which execution is effectively a try ... catch construct across two functions. However, that works best in a server environment for the purpose of avoiding memory leaks. For MISRA's focus, embedded applications, even the use of dynamic memory allocation is discouraged. And that removes one of the main uses for multiple returns. An improvised try ... catch can also be used to fail fast. This seems problematic until the difference between library code and application code is considered. Library code throws errors which cannot be handled due to lack of context. Application code catches errors because the context for handling them is known. Unfortunately, "Never test for an error which you cannot handle." and an embedded system with all unhandled exceptions tied to "reboot" may lead to device failure mode in which there is a constant cycle of reboots and no further diagnostic information.

So, what kind of shonky software do people write? I let a Lisp hacker look at the code for a virtual processor. The code review can be summarized as "Your code embodies everything that's wrong about C." followed by particular ire for a portable attempt to define a 16 bit signed integer which is clearly labelled as working on gcc and clang but possibly not further. After "Well, at least you didn't use recursive macros." I noted the part where 4 byte BER fetch is implemented as a macro which is nested within itself.

That's a problem. Some embedded C is stuck in a timewarp between Kernighan & Richie's traditional C and the ISO C 1999 Standard. My style is closest to the ISO C 1990 Standard and that works fine for embedded software. However, between 1969 (when C was being developed on a 16 bit, two's compliment computer) and 1999 (when int16_t was standardized), it wasn't possible to reliably define a signed 16 bit integer and, after almost 50 years of C, it remains problematic in some cases.

This is part of a wider problem where implicit assumptions and conflations are common reasons for software being inadequate. Things like currency handling, date formats (especially two digit years), character encoding, disability support - and the magnitude of numbers.

It also fits the observation that the installed base of technology is half of the age of the technology unless steps are taken to actively skew the age. This was certainly the case for web browsers before automatic updates became common. However, it seems to apply to programmers. Effort to drop C entirely (often by people encouraging WebAssembly) is an over-reaction (and contradictory). However, tests and standards have to be greatly improved.

I wrote some regular expressions in Perl4 to test quality of C. Yes, the old joke is "Now I've got two problems." More seriously, dodgy scaffolding is allowed by MISRA C 2004 but not in deployed software. In this case, you can verify that nothing I've previously written will pass all of these tests. The worrying part is that I only wrote five easy tests out of about 100.

After unpacking the archive on a Unix system, make quality applies tests to hello.c and finds a missing function prototype but doesn't find the missing return check. Flags on the compiler find these problems and produce a warning that argc and argv aren't used. Technically, these should also be const. So, that's six problems with "Hello World". gcc and my crude checks find an overlapping subset but important problems are missed.

begin 644 code-quality-check-dist-20180112-104505.tar.gz
M'XL("+&16%H"`V-O9&4M<75A;&ET>2UC:&5C:RUD:7-T+3(P,3@P,3$R+3$P
M-#4P-2YT87(`[1IK4]PXDJ_QKU#(D'D0C^UYAA!SQ1)RH0ZRV4`J=8O98&QY
MQH=?*WM@"+"__;HEOP8F9)*09*O6316VI%:KN]4OR6.%-I7_G)B>FUS(UIA:
MI[+MQHG<4;6GJJ9U9$WM]=6^LF>>4L?UZ-)7@`HP[/?QJ0W[:OF)T!OV.DM:
MM],9J`.U/QPNJ5J_W^DM$77I!\`D3DQ&R-(_%!XIDY@I)VZ@T.",^+#+DO2(
M;(%5D-^$59`MM`JR/37]R*,D,P3`:FPUT4S(P9C"C"`.6>)._+;T",8R^R&H
MWD22]C;_LZUS\B]V]@_TM]N;+_:VR<[K_8/-W=V<*!$6:,:Q.PK:D9>V3[S0
M.BV:SB2P$C<,9(^6D"(6)F'19#29,#X\IIX7MBUI:TL?69:TO[WU3I?ER&2F
M3\S8#&1@T3K5->FW=YN[NCPB\GO3\^`_G2;,)')$;3-(7(L`HJV[<;CV=&WM
MF;:VM@8X\=BTPW-XB4(W2"B33>8F8VA;)G@1.E;V#LH<!3@C8:Z5"':3BXC&
MT.>[('$PFNVTPN",LA@$E=[L[&[G/(]&ENR[@3RF9A2['ZG>[0P'3\G-83J-
MS,#6NP-IZ]?=7W79<9D)6AD!/3T,J/3ZW1[0=%)V0EC*\5`2)PAE5#^UY8^4
MA7'1`Q(`CX$9I'V@G2C"+M],QM+^SN\%C[XYE=T`\"D\XB"643R/ZCV8:5,'
M]!2%$;R'OIO(#LR@<JI`09J:S+L0%&`B]*7$/!<F<"H^92,J@XY@]X($.1I9
M,2#XV5N<OWEF/FPZN$>,>J%IIX*!N=AR'%'K9EL62"6)QB&&1IM&R5@?D#FR
M,FI-8,?.:(JD2OMOME'+C(;,AI6Y)2,W64=FR[E**0@U+C6\,(QX2&;NR00Q
M2V-G%':.(Z2&(L;<R)2M2+8\V.9\IRS*I*W-K5?`#+?$8F5]@/H4G4@LUK6B
M;9Y0K]SQOXD?X0SIY>[FO_5:`[VI26H-]!U\HJ7B$VT.GVAE^$3KX$_0!Q\'
M5IJ2!([V3'H`S2WL1)K-S%^)'(I724KS$V#*`E7>OH5]1=K*;"!8##N++8MA
MYY%I,?0\""V&?B.P29+E43-`L9DOT$EG@R@V/5."B0?CZ#$C"#@7'UP5T,!+
MP&`Q?A`'HA7$!K*J0@!C@*?".E.3C<#05`)1*Y[X9`,W:!,VZ"_I@64F>0M0
M,9@3^91T"B3I`7"13RC6AH4AUB`^#NEYOWQLFPEPL/+?%7_%EE=>K>RM[!^W
MDVE"'C\&0I@2FJ0L`M@#LSZX&E!L6#9IMQ'1/[5=1NJUQHN=MTW0XOZ[EVJS
MUCC8`3;JP)05H83DV`/);()H^VKSF-3;;67^G#)QS/VR'$P@FKB6')X'&(`P
M$GQBP39,J'^*&:75)%=7Q+BY@/6-U,#U8LJU+S/G3L%&'UW0QAKG_U-H8E%!
M^03P.U\R8?KQ2[`3-J'IGJJ%E<`D??DX.K?1S*A-ZO&TW5*FTWK1%C7HH2JO
M'?%_K4-5.\J:7?&&PYV\LY^_#<3;++UO)76\C)N9"H!R@@3SK'NY;-K"F+D"
M"NE15WJ)&)JK7F_5;TT$;T`MW)C)%7-K.K?Z@^9<(G?7?]:"]?^-\'>?];\Z
MU$3]W]-ZW7Y_`/7_8-@;5/7_#ZG_'^8'@(@R+%9Y_<]+_I<A(YM\QWV*)=9.
M@&6^[?*JX8[Z/R__U4%6_C\B8>!=D/.00?5SCA4R5"P7A)=SX**8!(+$Y.5-
MG%S`00!J'>+&)$S&E)V[,27QQ'%<RP4T6.!\#(>%!A224'39C1K67OKS_8,7
M.Z\WFDUR*1'B.FGW7\H?AYOR[T>'?Q@-<'YB)$>K#71KZ/QPM-KD/2VC@>-&
M$]ZRGDO%#04M0FH\,^LU;1V:U[/D&ZYSQ?DID[J$I>"_\?#YAGZDXZM^9#25
MG"!/R00XWG[[EBS7U&?DW&18ZCXC61%`N'A<V9`B>&^=0-XF9KXE;2-8%AQ=
M?\[/[\?_LV+I7L__VJ"KI?[?[W0'7?3_?E^K_/_OX/];Y"6>"L&]$Q9ZY/W8
MA=/)3DQ>APF,0T5Z#N[[RP79)+^@;?RSHT)*.U:,$SO,)K1;_S*N18,SF)/F
M3S5M-A5E%*[/LFB<-)R07>7QQ3C)YA:!)%_RT##JAK%\E-$I<V/4D0FC/G=L
MF8\M%V-"C3D3)1T4RY8IE!%R(M?\_P+"'/*Y9<JW8R-E+&2?B8QH%I,D=FU*
M0H><!E#+DQ$<SWV396$2@4[=I*%.>UHSY9/`^39C\^$=.K^\F\7%PK=GXNF?
MW[-8J4>)D)IS>/UMX?P[Q_\;!]3[B?^=/KR+^]^AVAWVEM2.J@ZJ^]^?'O\A
M]F?&O$N#43)>)+IK6AK=RW?`YAF]]V@O?.J#%4Z"1%?7I9H5>A,_N-T16V%$
M>0?Z>&F<-XO1_.WS>:0(?O`'AU<EC-=GNA_-Z<1`8@0&.UJM96,0>/(`V88`
M\_BQX"*/[F695G6/[X*8D<:O&2'G8Q1BKJ[.J5_O.Q%"4VA2F\/A9Q@4<V9B
M,G)H7->*Z)L-B!D;`_4>PC*0B]'42H1G0W(F8[H3JV7.T\TIS5V?QVB)SUI"
M_0@ORAHSRE%*Z,TL82$51-\8=@H"7R\J>"(S1Y2(=8702+Z<(Z_+8M]T(I&>
MD*N20C+A2H(5JE'*F,VLQN$R%9MWASP9R[E<Q7:5.<_8*A:>PU79GY02ZBQ;
MA:J_A*U/J?1'IO/OG/^SJ_5[/?]UU'YV_]/MPW\\_PVT?I7_?W[^?Y-_C_RI
M![MX<D).*8V$-_.O;GH\=IV$YU>1?(HV\ELTT:G%#$@D#1R#L'A%IPEEP97X
M/G,%Z2:]O2G2C!@B]7HY9?)/*_1/4E^OY[G/IM:EX.%:%RO-Y+!\RF5YBC-O
MBC2SZK7T)85((\WBZYBFC4OC>K%DGG8U#M>-RZ.F\ABUW*AI3VJ=)[5N4Z$N
MUBG`"9R)J`DG_QJ%S6KPCT.G]"(F*R!]?K1^F'/*!034Z^:777A9'B@C35@P
MO4[L$")]$"9D#.&VE-#*L?4.WIRYO%E?Q5LIE\YE+>=]0=X*O=VA-K2@G&$2
M4%+"6:3LL5W'H0R\:+Z6S<`N2P;<IM+]O`-I!7_C_)]_S[[7^]]N?YCE_V&O
MB_>_0U6MSO]_B_O?O8F7N/B[K[<B,^U#VJ;%YZ#\AN#G%@BU.&'F!3_%NP'>
M`2YVC/^^I^""D]M'6N-$N%)Z!PI9NU1ZI#/3T_KMP_#L65@@;VC??@S&CTLI
M.0);[T/N2@N2^-9I>$:XF]?6MV1KM^X6F&]?=CL!N0:I\3Z!<8=`?F:?*:.0
MPR8!&E!*5!P6!?M5$OOZ^)_^6.FKUL`@/^CU/OG]7QOT\M__]O#^5^MWM>K[
M_P\!I45>\1^6O0^99Y.6`K'9#2QO`C'@>9S8;M@>;_`?>Q'?=(,&OIAL9#VQ
MQB9KM?#]K.2D3F-9D#M'<@_![?C-BG#.AMJLG+"""BJHH((**JB@@@HJJ*""
C"BJHH((**JB@@@HJJ*"""BJHH((**JC@GN'_5VX!IP!0````
`
end

(Usual instructions for uudecode process.)

My Ideal Processor, Part 10

Posted by cafebabe on Tuesday January 16 2018, @08:10PM (#2930)
0 Comments
Hardware

(This is the 61st of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I've been working on a virtual processor and I've got benchmark figures. The virtual processor is intended for process control but it may have general use or use on FPGA, GPU, optical computer or mainframe. In particular, there is a market for object code portability and high uptime across x86, ARM, AVR and other processor architectures.

I've been working on the instruction set and I've worked through a long list of decisions which are loosely connected. These aren't choices which can be made in isolation and without consequence elsewhere in a design. However, each choice makes minimal progress because a sensible set of choices may lead to something impractical. For example, 64 registers × 64 bit requires a 4 kilobit context switch. Numerous registers are great. Wide registers are great. But fast interrupts are also great.

After going through this process repeatedly for more than three years, I've devised a fairly conventional processor architecture (3-address, Harvard architecture) which provides cross-compatibility for 8 bit, 16 bit, 32 bit, 64 bit or possibly larger registers and address pointers.

I wrote a fetch-execute instruction interpreter and then called it from a loop which tries all permutations of the 2^28 instructions. This crashed repeatedly. The optional floating point encountered divide by zero. After #include <signal.h> so that SIGFPE (floating point exception) could be set to SIG_IGN (ignore signal), it was trivial to set SIGUSR1 and SIGUSR2 to trigger virtual processor interrupts. It optionally outputs a PID file to make signal invocation easier. That's great fun. With one Unix command, it is possible to send virtual interrupts to a virtual processor.

Further testing found further problems in approximate order of difficulty. Apparently, I missed a bound check on a virtual stack. After these problems, the test ran smoothly. I expected the first full run of 256 million instructions to require about 30 minutes. It finished within five minutes. That's approximately one million instructions per second. After brief disassembly and seeing 140 byte stack references, I applied the standard flags. This reduced the frequency of stack references and their severity to 40 bytes. I also obtained the usual double performance. That's approximately two million virtual instructions per second for integer performance and 0.4 MIPS for a mixed integer and float workload. Optimistically, on a Raspberry Pi Model B Revision 2 with 750MHz clock and 697 BogoMIPS (approximately 14 instructions per 15 clock cycles), it takes a minimum of 132 clock cycles to go around the fetch-execute loop but many of the instructions exit early due to bogus opcodes, bound check failures or similar.

Despite being a 64 bit SIMD implementation which is vastly more powerful than a 16 bit computer from the 1980s, this arrangement captures less than 1% of the efficiency of the host architecture. I looked at the disassembly more closely. My code has addition of 8 bit integers and this involves a loop with eight fixed iterations. gcc correctly unrolls this loops and converts it to SIMD instructions. However, this doesn't occur for floating point instructions. This may be a limitation of the compiler or the target architecture. (ARMv6 with Neon.)

What am I trying to achieve? Optimistically, it is possible to achieve 1/3 efficiency. That would be one clock cycle to fetch a virtual instruction, one clock cycle for branch fan-out to each case statement and one clock cycle to implement a virtual instruction on native hardware. There is no need to return to start because duplicate fetch and decode can be placed at the end of each case. Therefore, bytecode interpreter execution flow may bounce around unstructured code as required.

Nowadays, I presume that fetch and branch incurs instruction pipeline stall of at least two clock cycles. Also, I'm attempting to implement 8 register × 64 bit on hardware which is 8 register × 32 bit. Furthermore, operations on arbitrary virtual registers (nominally R0-R7) requires transfer to internal registers (nominally A, B and C) where cases may perform A=B+C or A+=B*C. This particular step would be easier with a stack virtual machine but difficulties occur elsewhere.

Current efficiency is a mystery. Maximum efficiency is a mystery. My implementation won't match the execution speed of Dalvik which is deprecated in favor of ART [Android RunTime] which is claimed by Oracle to be have less than 1% of the efficiency of HotSpot in some circumstances. That's a familiar figure.

My interpreter and registers may exceed the size of the native processor cache. Like Dalvik, the opcode is in the least significant bits. This avoids bit shift prior to branch. That sounds clever and planned but it isn't possible to put an opcode elsewhere and get a broad range of 1 byte BER packed instructions. However, when testing instruction patterns sequentially, the cases of the instruction interpreter are called in rotation. This may incur significant cache churn. If this is the case, I might get significantly more throughput by testing a more realistic distribution of opcodes.

If this was intended to run games and apps, it would be fast enough to run Leisure Suit Larry but not fast enough to run Fruit Ninja.

My Ideal Processor, Part 9

Posted by cafebabe on Tuesday January 16 2018, @08:08PM (#2929)
0 Comments
Hardware

(This is the 60th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

By using some downtime productively, I devised a virtual processor architecture which is a fairly inoffensive blend of 3-address RISC processor architectures. It is most similar to an early ARM or early Thumb architecture. However, it uses BER to compact instructions. This is much like UTF-8 encoding. Although it only has eight general purpose registers, they hold integers and floating point values and all operations on small data types are duplicated across the width of the register. This is much like the SIMD as defined in FirePath. I hoped to implement something like the upward compatibility of of the Transputer architecture. Specifically, 16 bit software is upwardly compatible with 32 bit registers or possibly more. My hope has been greatly exceeded because it is feasible to implement 8 bit registers in an 8 bit address space and individual bit addressing is possible but less practical. At the other end of the scale, 64 bit is practical and 1024 bit or more is possible.

It has irked me that processor architectures are advertised as having 2^r general purpose registers and d addressing modes. I then discover that one (or typically two) of the "general purpose" registers is a program counter and micro-coded call stack - and that many of the addressing modes rely on these registers being mapped within a register set. For example, branch gets implemented as a register move to program counter and a relative branch is addition to program counter. Stack manipulation similarly relies on re-use of generic addressing modes. Furthermore, one stack with a mix of data and return addresses is encouraged because it uses the least registers. Anything which deviates from this group-think requires additional hardware, additional instructions and incurs a slight impedence mis-match with historical code. However, the use of generic addressing modes relies upon a conflation of data register size and address pointer size. With heavy processing applications, such as multi-media, we have GPUs with a bus width of 1024 bits and an Xtensa processor architecture with register width up to 1024 bits.

Beyond 16 bits, binary addition becomes awkward. 32 bit flat memory models and 64 bit flat memory models have been accommodated by smaller circuits and the resulting increase in execution speed. However, rather than targeting a 32 bit, "bare metal" RISC architecture (or RISC pretending to be CISC), in the general case, it would have been preferable to target a 64 bit virtual machine and allow more varied implementation. (Yes, I sound like a curmudgeon who is ready to work in a mainframe environment.)

Virtual machines vary widely. For example, Perl and Python have bytecode which includes execution of regular expressions. Java has an infamous "invoke virtual method" although that hasn't discouraged Sun's Java hardware or ARM's Jazelle. What I'm suggesting is keeping it real. 100% execution on a hypothetical processor and only invoking unknown instruction exception handling on lesser implementations, such as hardware without FPU. I expect this will "gracefully degrade" like Sun's and SGI's handling of SPARC and MIPS operating system updates. In practice this becomes unusable because heavy use of the latest instructions increases the proportion of time spent emulating instructions. This occurs on legacy hardware which is already under-powered. I presume that Apple has been doing similar in addition to restricting battery usage. Unless a conscious effort is made to drop out of a cycle of consumption, for example, by fixing hardware and operating system during development, most users of closed-source and open-source applications and operating systems are bludgeoned into accepting updates to maintain a pretense of security. Users have a "choice" along a continuum of fast and secure. However, the end points are rather poor and this encourages cycles of hardware consumption and license upgrades. This is "Good, fast, cheap. Choose any two." combined with planned obsolescence.

Fortunately, we can use the consequences of miniturization to our advantage. Specifically, 2010s micro-controllers exceed the capabilities of 1960s mainframes. They have more memory and wider data paths. They invariably have better instruction sets and more comprehensive handling of floating point numbers. They have about 1/100000 of the execution latency, about 1/100000 of the cost and about 1/100000 of the size and weight. Unfortunately, security and reliability has gone backwards. A car analogy is a vehicle which costs US$1, has the mass of a sheet of paper, travels one million miles on one tank of hydrocarbons and causes one fatal crash every three miles - if you can inexplicably get it to start or stay running.

This is the problem. Security and reliability are unacceptably low. Consumers don't know that it is possible to have a computer with one year of uptime or significantly more. Proper, dull practice is to fix bugs before adding features. However, market economics rewards the opposite behavior. Microsoft was infused with neighboring Boeing's approach to shaking out bugs. I'm not sure that bugs introduced by programmers should be handled like defects in metallurgy. In software, major causes of failure are bit flips and your own staff. Regardless, Microsoft, a firm proponent of C++, adopted a process of unit testing, integration testing, smoke testing, shakedown testing and other methodolgies which were originally used to assemble airplanes. However, while Boeing used this to reduce fatality and the associated poor branding, there is a perception that Microsoft deliberately stops short of producing "low defect" software. That's because an endless cycle of consumption is severely disrupted by random dips in reliability. Users (and our corporate overlords) accept a system with an uptime of two hours if it otherwise increases productivity. For scientific users, a 1960s punch card job or a 1970s interactive session on a mini-computer saved one month of slide rule calculations and often provided answers to six decimal figures. Contemporary users want an animated poo emoji so that they can conspicuously signal their consumption of blood minerals.

I've spent more than eight months trying to compile open source software in a manner which is secure and reliable. The focus was process control on ARM minus ECC but the results are more concerning. I didn't progress to any user interface software beyond a shell or text editor. Specifically, bash and vim. I was unable to bootstrap anything and I was only able to compile one component if numerous other components were already present. An outline of the mutually dependent graph is that clang and llvm require another compiler to provide functionality such as a mathematics library. Invariably, this comes from gcc. Historically, gcc had a mutual dependency with GNU make. Both now have extensive dependencies upon other GNU packages. One package has a hard dependency on Perl for the purpose of converting documentation to a human readable format without creating a mutual loop within GNU packages. In addition to core GNU packages being directly or indirectly dependent upon Artistic License software, it only throws the mutual loop much wider. I was able to patch out the hard dependency for one version of software but I was using Perl and GNU make to do it. Even if I had taken another path to do it, I would, at best, be dependent upon a GNU or BSD license kernel compiled with a GNU or BSD license compiler. Market economics mean that GNU and BSD kernel distributors include thousands of other packages. Via Perl repositories, Python repositories, Ruby repositories, JavaScript repositories, codec repositories, dictionaries and productivity scripts, their may be indirect support for 35000 packages or more. Any component within a package may require more than 100 other components and the dependencies are often surprising because few are aware of the dependency graph. We also have the situation where some packages are gutted and re-written on a regular basis and some open source project incur management coup. This rarely concerns distributors. That would be acceptable if everyone was competent and trustworthy. This is far from the case and we seem to have lost our way. In particular:-

  • Layered implementation is taught but not practiced.
  • Coupling between software components is too high and it is increasing.
  • If you want to run mainstream software, we have completely lost the ability to bootstrap from the toggle switches.
  • Our software has no provinence.
  • Cannot trust our compilers.
  • Cannot trust any other software to work as described.
  • Cannot trust hardware.
  • A movement of citizens can capture a project.
  • A false movement can capture a project.
  • Covert infiltration is known to be widespread and occurring at every level.

This made me despondant. Was there any other outcome? When the new rich use novelty to signal wealth, was there any other outcome? When governments and corporations routine abuse individuals, was there any other outcome? With industrial espionage and dragnet snooping, was there any other outcome? I considered contributing to BSD. I like the empirical approach to fixing bugs found on your own choice of hardware combined with the accumulated knowledge of open source software. However, in practice, it involves decades of Unix cruft layered over decades of x86 cruft. I also considered buying a typewriter to get some privacy. However, I'd probably use it someone else's network printer/scanner/photocopier and/or many of the recipients would use larger scanning systems. Taking into account the acoustic side-channel of a typewriter, I'd probably have less security than using pen and paper while inconveniencing myself.

However, the answer to security and reliability is in a statement of the problem. We have to bootstrap from the toggle switches. We have to regain provinence of our compilers. Forget about network effects, economic pressure or social pressure. The official report about the Internet Worm Of 1988 was that no implementation should be used in more than 20% of cases. That means we need a minimum of five processor architectures, five compilers, five operating systems without shared code, five databases, five web servers, five web browsers and five productivity packages. That's the minimum. And what have we got? Two major implementations of x86, one major source for ARM and almost everything else is niche and/academic research. Compilers are even worse. Most embedded compilers use a fork of gcc which lack automatic security features of the main branch. clang and the proprietary compilers aren't sufficient to establish compiler trust. Ignoring valid concerns about hardware and kernels, anyone outside of a proprietary compiler vendor has difficulty getting three sets of mutually compatible compiler source code in one place. This is for the purpose of cross-compiling all of them and checking trustworthy output. If that isn't easy to replicate then everything else is fringe.

I envision SPIT: the Secure, Private Internet of Things. Or perhaps SPRINT: the Secure, Private, Reliable InterNet of Things. (This is why you shouldn't let geeks name things.) I start with a dumb bytecode interpreter. This consists of case statements in a loop. This implements a fetch-execute cycle of a processor. Or a shell. Or a interpreted language's interactive prompt. Although the principle is general, the intention is to make something which can be implemented as hardware without supporting software. Initially, it uses an untrusted host operating system and an untrusted compiler. That is sufficient for design, benchmarking and instrumentation, such as counting branches and memory accesses. An assembler allows a developer to get an intuitive understanding of the processor architecture. This allows idioms to be developed. This is useful for efficient compiler output. A two-pass assembler also provides forward reference resolving for a trivial one-pass compiler. (By chance, I've been given a good book on this topic: Compilers - Principles, Techniques And Tools by Princeton doctorates Alfred V. Aho, Ravi Sethi and Jeffrey D. Ullman.)

From this point, it is possible to build circuits and/or deploy virtual machines which perform practical tasks. It is also possible to implement functionality which is not universal, such as parity check on registers and memory, backchecking all computation and/or checkpoints and high availability. Checkpoints can be implemented independantly of assembler of compiler but may be beneficial to provide explicit checkpoints.

There are a large number of choices which are not constrained by this process:-

  1. Does the processor architecture work in binary? Probably yes but other choices include balanced ternary, decimal, binary decimal and Chinese Remainder Theorem.
  2. Does it support negative numbers? Probably yes.
  3. Are negative numbers represented as sign and magnitude, one's compliment, two's compliment or other?
  4. Does it have opcodes and/or a program counter? Probably yes.
  5. Is it a pure Harvard architecture?
  6. What are the word sizes?
  7. Does it have stacks and/or registers?
  8. How many stacks?
  9. How many registers?
  10. Does it have registers for constants?
  11. Does it have direct access to memory?
  12. Does it have unrestricted access to memory?
  13. Does it have paging, segments, flat memory or other?
  14. Is there a constants segment?
  15. Does it allow unaligned memory access?
  16. Does it allow unaligned stack access?
  17. Does it have separate data registers and address registers and are they the same width?
  18. Does it support integer multiplication?
  19. Does it support integer division?
  20. Does it support integer modulo?
  21. Does it have support for floating point data?
  22. Does it support SIMD?
  23. Is there a conflation between SIMD, floating point, integers and/or pointers?
  24. Does it have one accumulator, multiple accumulators or registers which can be ganged?
  25. If it has a flag register then which flags are present?
  26. Which opcodes change which flags?
  27. If it doesn't have a flag register then what mechanism is provided for conditional execution?
  28. Is it a 2-address machine, 3-address machine, 4-address machine or other?
  29. Is it a pure load and store architecture?
  30. Does it take multiple operands from stack or memory?
  31. Does it allow memory-to-memory operations?
  32. Does it provide string operations?
  33. Does it provide arbitrary precision operations?
  34. Does it provide arbitrary size vector operations?
  35. Does it have support for multiple processors?
  36. Does it have support for co-processors?
  37. Does it have support for custom instructions?
  38. How does it handle interrupts?
  39. How does it handle exceptions?
  40. Is there conflation between interrupts and exceptions?

After deciding all of this and more, few constraints have been placed upon ALU, FPU or instruction format. Specifically, almost every ALU and FPU function can be specified as an instruction prefix. For example, I devised a 3-address processor architecture where the default instruction was a 2-address move between eight registers. This is encoded within 1 byte as 00-sss-ddd. Escape sequences convert this instruction into addition, subtraction and bit operations. This is encoded as 01-000-aaa for common ALU functions and 01-001-bbb for rare ALU functions. Instructions may optionally use a third register and/or wider range of registers. This is encoded as 10-ppp-rrr. Unfortunately, this exercise is far less efficient than 8080 derivatives, such as Z80 and x86. In particular, the heavy use of prefixes leads to combinatorial explosion of duplicate encodings. Regardless, escapes have their place. For example, rather than providing conditional branch, PIC micro-controllers have instructions to optionally ignore the next instruction. This allows conditional branches, conditional move, conditional arithmetic and conditional subroutines. Whereas, VAX mini-computers implement addressing modes as suffix codes. The difficulty with escapes is that they generally reduce execution speed. Although it is possible to implement instruction decode at the rate of 4 or 5 words per clock cycle, it is more typical for implementation to be the inverse: 4 or more clock cycles per word. This is greatly complicated when variable length instructions may cross instruction cache boundary or virtual memory page boundary.

RISC solves these problems and others but the code density is poor. 32 bit instructions are impractical. 16 bit encodings (mine included) are poor unless longer instructions are allowed. Xtensa takes the unusual approach that instruction opcode implicitly determines a 16 bit instruction or 24 bit instruction. I've found that an explicit bit per byte allows code density to match or exceed the code density of 32 bit ARM instructions and 16 bit Thumb instructions.

Starting with 32 bit ARM instructions, the 4 bit condition field is rarely used and the remainder is 28 bits which can be packed into 1, 2, 3 or 4 bytes using BER format where the top bit of each byte indicates a subsequent byte. This is similar to UTF-8 but without the need to be self-synchronizing. Some of the saving can be spent on lost conditional functionality. Yes, this may require additional clock cycles. But the overall saving on a system with an instruction cache would be worthwhile. 16 bit Thumb instructions make the saving less pronounced but having additional lengths (and no additional escape bits in total) means that a trivial encoding in BER format is more efficient than Thumb encoding. If fields are arranged so that the least frequent encodings occur at one end then the efficiency gain can be amplified. BER format optionally allows longer instructions and therefore it becomes relatively trivial to reach or exceed the peak 40 bit per clock cycle instruction decode of an Intel Xeon.

So, any conventional 3-address RISC instruction set with suitably arranged fields may be a practical implementation with practical code density and practical execution speed. The trick is to not screw it up. Don't do anything too radical. Indeed, use of BER instruction packing allows other changes to be de-coupled with confidence. Knowing that the optimal instruction has not been chosen, it remains possible to implement a virtual processor, implement an assembler, implement a trivial compiler, implement an optimized compiler and then improve the instruction format without creating or modifying any compiler. This can be achieved by making cursory changes to the virtual machine to obtain statistics. Then changes can be made to the assembler and virtual machine to accommodate changes to instruction encode and instruction decode.

Some inefficiency may be beneficial for some implementations. Split fields are of minimal consequence for a hardware implementation or FPGA but may be very awkward for a general purpose computer. Likewise, some opcode arrangements may be beneficial for GPU but awkward for FPGA. In particular, omission of a flag register is very beneficial for software implementation but detrimental to composibility of instructions and therefore detrimental to code density. However use of conditional prefix instructions is also detrimental. They require implicit state to be held for one instruction cycle. This adversely affects interrupt response. Hidden state complicates a design. In this case, hidden flags cannot be dumped anywhere so the instruction sequence must be executed atomically.

There is the less immediate concern of running a 32 bit and/or 64 bit virtual processor on more modest hardware. This is a classic mainframe technique. Motorola, Tandem and others had less success replicating this on cheaper hardware but there are minimal complaints about Atmel AVR support for 32 bit numbers and 64 bit numbers on processors with 8 bit registers and 16 bit registers. I'm one of the worst offenders but it isn't that bad and I'm a relatively rare corner case. I've made extensive effort to optimize code size and execution speed across common architectures. Sometimes these objective align and sometimes they don't. In addition to that, I've made extensive effort to avoid Arduino's patronising interface and instead use GNU make for compilation across Atmel SAM ARM and Atmel AVR. This is mostly a publicly available script plus compiler flags, configuration settings taken from previous work and archiving taken from previous work. Unfortunately, it drops the AVR library routine for arbitrary modulo. I presume division is similarly affected. Most people don't notice how this is implemented. Powers of two are reduced to bit shifts and bit masks but it is an irritation to not have, for example, modulo 10 and that's why I haven't published it.

For many applications, such as process control, a virtual processor would be indistinguishable from current practice while offering advantages. A 40MHz, 16 bit micro-controller is considered to be slow and under-powered. Regardless, many applications, such as hydroponic pump control, only require response within 15 seconds. I presume that systems such as chemical mixing and gas pipelines have similar bounds. Faster is better but reliable is best. If it is possible to increase reliability by a factor of 10 but it reduces speed by a factor of 1000 then there are circumstances where it is very worthwhile to trade a surplus resource.

My Ideal Processor, Part 8

Posted by cafebabe on Tuesday January 16 2018, @08:06PM (#2928)
0 Comments
Hardware

(This is the 59th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I defined a provisional instruction set. The final instruction set will be defined after empirical testing. Fields may be split, moved, removed and/or expanded. Opcodes may be re-arranged. However, there is enough to begin implementation and testing.

The basic concept for implementing a virtual machine is a set of case statements within a while loop. This implements the fetch-execute cycle. One instruction is fetched at the beginning of the while loop and it is decoded via a switch statement (or nested switch statements) and then each case implements an instruction or related functionality. Instructions can be as abstract as desired and may include "make robot walk", "set light to half brightness" or "fetch URL". In my case, I'm trying to define something which could be implemented as a real micro-processor. Even here, people have implemented hardware with instructions such as "calculate factorial", "find string length", "copy string" or "add arbitrary precision decimal number". Even this is too complicated. I'm intending to implement a virtual machine which would compare favorably with hardware from 25 years ago. This is partly for my sanity given how computers have become encapsulated to the extent that almost no-one understands how a serial stream is negotiated or encoded over USB. There is also the security consideration. I cannot secure my desktop. I certainly cannot secure a larger system. Anyone who claims that they can secure a network is ignorant or a liar. I'm not even sure that my storage or network interface run their supplied firmware.

I'd like something which targets Arduino, FPGA, GPU or credit card size computer (or smaller). We've got quad-core watches and a watch with RAID is immenent. With this level of miniturization, we can apply mainframe reliability techniques to micro-controllers. For example, we can run computation twice and check results; insert idempotent, hot-failover checkpoints between all observable changes of micro-controller state; or implement parity checking when it is not available on target hardware. These techniques have obvious limitations. However, embedded systems often have surplus processing power. When EPROM cost more than a week's wages, micro-controllers would decode Huffman compressed Forth bytecode or similar. Now we can use the surplus to increase reliability. The alternative is too awful to contemplate.

It is possible to have standardized object code which resumes on replacement hardware. This is like VMware for micro-controllers. In a trivial case, a LCD panel may run a clock application. The clock application checkpoints to a house server. When the panel fails, it would be possible to purchase a replacement panel and restore the application on the new panel. It may now run on a slower system with smaller display but within a minute or so, it should find its time source, adjust to the new display size and otherwise display time to your preferences.

A more practical example would be a hydroponic controller. People are developing I/O libraries which allow relay control over Ethernet with minimal authentication, no error checks (between devices with no memory integrity) and no hardware interlocks. For your own safety, please don't do this. A more sensible approach is to run two instances of the firmware. One instance runs locally in a harsh environment where humidity may reach 100% and temperature may fall below 0°C. The other instance runs in a controlled environment which is always dry and at room temperature. Both instances run integrity checks but no relays get triggered unless both instances computation the same results. Alternatively, the instance local to the relays may continue with less oversight while the server container sends alerts about the lack of monitoring. For a large environment, it is possible to use the standard database technique of a double or triple commit to ensure that servers have a consistent state prior to server hot-failover. This can work in conjunction with console access, graphical remoting and centralized OLAP over low-bandwidth networks.

Sun Microsystems said "The network is the computer." and then gave us Java. This only provided hot-failover within the Enterprise Java Bean environment. I'm proposing a system where low-bandwidth process control runs in two or more places, has the convenience of Android, the reliability of an IBM mainframe, the openness of p-code and the security of erm, erm. That part has been lacking for quite a while.