Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


My Ideal Processor, Part 7

Posted by cafebabe on Tuesday January 16 2018, @08:04PM (#2927)
0 Comments
Hardware

(This is the 58th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I had some time to kill in a hospital so I took some paper and solved a problem which has been annoying me for more than three years. I wanted to implement an instruction set with good code density while maintaining an efficient hardware implementation. This requires the presentation of one or more useful instruction fields in the approximate frequency in which they are required. It also requires using the majority of bit permutations within each field. Presenting useful fields is trivial. Popular processor architectures like IBM 360, DEC VAX and Intel x86 make instruction length implicit as part of an instruction. RISC processor architectures insist on one size instructions (usually with exceptions for constants). ARM Thumb re-uses a (rarely used) conditional field as a method of compacting many of its 32 bit instructions to 16 bit respesentations. However, the Xtensa processor architecture shows that good code density can be achieved by straddling a golden ratio with 2 byte instructions and 3 byte instructions where the length is implicit to the instruction.

I've found that similar results can be achieved by making instruction size explicit rather than implicit. Specifically, through the use of BER. With this encoding, bytes are split into a one bit field and a seven bit field. The top bit indicates if another byte follows. This allows numeric values of any size to be encoded in multiples of seven bits. In particular, it allows processor instructions to be encoded as 7 bits, 14 bits, 21 bits, 28 bits or more. It is also possible to choose representations other than octets or where escape bits are skipped. So, for example, 2 byte, 3 byte and 5 byte encoding would only require two or three escape bits among 37 bits or significantly more. This allows detailed instructions to be defined while programs match or exceed to code density of AVR or ARM. Explicit escapes appear to reduce code density but it provides more options for compaction while simplifying instruction decode.

Within an unpacked instruction, fields can be arranged (or split) to minimize average instruction length. Where fancy functionality is rarely used, such as ARM's pre-scaler, the default bit permutation of zero requires no space when packed as BER. Another trick, most useful within a 3-address machine, is that one reference to a register may be relative to another. This allows compact representation for common cases, such 2-address instructions. An example would be g+=h which can also be written as g=g+h or g=h+g. An instruction which references g more than once may use zero in subsequent references. Obviously, this requires all references to be skewed but this can be achieved with XOR or addition such that common cases have zero in desired fields. This allows an instruction to be compacted more efficiently.

A worked example would be 5 bits for opcode, 2 bits for addressing mode and 3 bits for each of three register references. Ordinarily, this would require 16 bits. I've found 16 bit encoding to be quite restrictive. The conventional approach is an implicit sequence of additional 16 bit words. However, with BER and skewed register references, 2-address instructions only require 13 bits. This fits into 2 bytes while retaining fast decode of significantly longer instructions. In practice, instruction size is similar to AMD's VEX prefix but implemented cleanly rather than being bolt-on cruft. This arrangement also satisfies the criteria of an instruction-space as a binary fraction where there is a 1 byte breakpoint instruction and numerous useful instructions with 1 byte and 2 byte encoding. Although there is no explicit bias within a set of general registers, if operand references cross encoded byte boundaries then use of some registers leads to a more compact representation. This runs faster - even within a virtual machine. This provides opportunities to tweak code efficiency within a conceptually simple model but it isn't essential.

This was all known to me but the piece that had been annoying me for more than three years was instruction set representation of a fully functional processor. It is easy to take specific cases, like g=h-i, or common optimizations, such as a-=c but it is more difficult to devise ALU, stack and main memory interface with realistic code density. A designer must ensure that there are a practical set of instructions which are sufficiently over the break-even point. Also have to ensure that bit patterns are only allocated to one task. When I worked on a rigidly fixed 16 bit instruction set in which opcode could appear within each of four fields, I had numerous false successes in which the same bit patterns represented two or more operations. A more conventional division of opcode and operands reduces this problem but leads to the opposite problem of underwhelming allocation. This may occur directly where a common instruction requires a long encoding or indirectly where idioms require an inordinately long sequence of instructions. There are two examples here.

x86 is generally regarded as achieving the best code density among widely deployed systems. The method by which this is achieved is questionable market economics. It is also achieved by re-cycling obsolete bit patterns. For example, the INT3 breakpoint instruction. Or 0xF0, which was a 16 bit stack decrement and now a prefix for MMX and suchlike. However, the legacy 16 bit mode has a very curious wrinkle. 20 bit addressing (1MB) is achieved be pre-scaling 16 bit segment pointers by four bits. In 2018, an armchair critic can easily suggest that pre-scaling by eight bits would have given a 24 bit address-space (16MB). However, this was devised in an era when systems shipped with 1KB RAM or less. (The 1976 pre-release documentation for 6502 describes implementation of systems with 256 bytes or less.) Wasting up to 15 bytes between each of four segments was considered positively profligate. However, the astounding part comes from the rare case when pointers between segments are tested for equality. In this case, Intel's macro assembler generated a 44 byte sequence. That's excessive.

We've mostly dumped that historical baggage but the problem of addition and multiplication remain. This is the most pertinent example. When implementing multiplication, typically upon signed, two's compliment binary integers, there are several choices:-

  1. Signed multiplication only.
  2. Unsigned multiplication only.
  3. Two variants. Both inputs signed or unsigned.
  4. Four variants. Any input signed or unsigned.
  5. Three variants. This is from the observation that multiplication is commutative. However addressing modes on available operands may be:-
    1. Restricted to registers.
    2. Either operand may source data from memory.
    3. Both operands may source data from memory.

There is also the consideration that multiplying m bits × n bits requires approximately m+n bits of answer to avoid overflow. This leads to choices:-

  1. Store result in two specificied registers.
  2. Store result in two sequential registers.
  3. Store result in two registers or one memory location.
  4. Store result in two registers or two memory locations.
  5. Allow overflow.
  6. Restrict input to m/2 bits to preclude overflow.

There are also options where one or more pieces of data are discarded. For example, by "writing" to a null register. This may be dedicated constant register or it may be a register which is (otherwise) unremarkable. There are a similar set of choices for the quotient and remainder of integer division.

The initial release of the MC68000 provided MULS and MULU where all inputs and outputs could be register or memory. However, for very sound reasons, 32 bit arithmetic was initially implemented a two rounds of 16 bit operations and this led to the very obvious design decision to only implement 16 bit multiplication. Despite the unsigned capability and flexible addressing modes, the width limitation was deeply unpopular and led many people to falsely regard the entire range of MC680x0 processors as being 16 bit or only fully compatible with 16 bit implementations.

In RISC design, unsigned multiplication appears to be the obvious choice on the basis that, for example, 32 bit × 32 bit unsigned multiplication can be used to implement 31 bit × 31 bit signed multiplication. (Where do the sign bits go? They get held implicitly in the program counter when the signed multiplication macro executes one of four paths for positive and negative inputs.)

My personal preference is unsigned multiplication with resticted input range and no integer division because this provides the most implementation options and the most upward compatibility while the minimal implementation requires the least gates and energy consumption. However, this is within the context of a design which (provisionally) may source one aligned operand from memory and pre-scales forward branches to cache-line boundaries. (Nominally, 64 bytes.) So, synthesizing signed multiply requires execution paths which form a two tier binary tree. This requires four cache-lines. Each execution path requires two conditional branches (and possible execution pipeline stalls) or backward branches to reduce the number of cache-lines required. The most stream-lined version requires a temporary register and 256 byte of instruction-space. This provides less functionality than a MC68000 while incurring far more clock cycles than most RISC designs. That's unworkable.

We have to make a cut somewhere. We either have multiple memory access cycles, two inputs, two outputs and special 4-address multiplication instructions or some functionality must be discarded.

So, I'll try again. No integer division, signed multiplication with unrestricted input range and no handling of overflow. This allows all variants to be synthesized without branching but each unsigned input over the full range doubles the number of multiplications required. Temporary registers may also be required to assemble the pieces. Overhead can be reduced with moderate abuse of type casting. For example, it is possible to multiply and accumulate 20 bit unsigned × 8 bit unsigned in one instruction (and possibly one clock cycle). The design is otherwise fairly agnostic about signed and unsigned data. At worst, comparison of unsigned integers can be synthesized from signed comparison via use of two temporary registers and zero extra branches.

However, after resolving this ALU functionality (and previously resolving FPU and SIMD functionality), stack and memory addressing modes are completely unresolved. My preference is bit addressable memory or possibly nybble addressable but anything smaller than bytes would be completely incompatible with the vast majority of RAM. Bit addressing also bloats constants and my designs a relatively bad in this regard. Bit addressable offsets don't affect pre-decrement or post-increment but it fails horribly when accessing an element within a structure or when skipping through an array of fixed length structures. From this, a friend made the observation that a stack only has to be aligned to the width of a general register and stack offsets can be implemented portably for any width of register. Specifically, 32 bit code can work with a 64 bit stack while encouraging good code density.

However, a trend popularized by VAX, MIPS, ARM and other processor architectures has been to steal two of the general purpose registers and replace them with a call stack and program counter. This allows MOV to PC to implement branch and post-increment to implement stack pop. (My preference for a zero register allows OR with zero to implement MOV while allowing indirect addressing modes to access a stack hidden under the zero register.) To maintain historical compatibility, performance and code density, some addressing modes don't make sense and some are detrimental. In particular, program counter relative emerges from a lack of position independent code and a failure to separate code and data. While it is possible to define meaningful semantics to the majority of bit permutations, this requires CISC micro-code or extra decode logic. This slows execution. Addressing modes can be offloaded to dedicated load and store instructions but this messes with code density more than ALU throughput. There is also the observation that x86 allows the maximum tricksiness of unaligned, little-endian access. This covers atomic, memory-to-memory conditional swap operations and, in most cases, doesn't cause the Turing complete memory interface to deadlock.

While it has never been advisible to store a 2 byte pointer on an odd memory boundary, use of RISC compatible, open source compilers has allowed aligned storage to become the norm. From Intel Core processors onwards, unaligned access is implemented with penalty, without lawyers suing for deceptive practice and, in most cases, performs read and write to the requested memory locations. (See Intel Core Errata for reasons why NASA uses older Intel processors.)

Among schemes which work correctly, the trend has been to break historical instructions into micro-ops which make one memory access. Where access crosses word, cache or page boundary, two accesses may require two micro-ops. For new designs, opcodes may source one aligned piece of data while performing computation. Dedicated instructions may also load or store aligned data in one instruction or unaligned data in two instructions. For efficiency, code density and historical compatibility, these instructions may implicitly set atomic locks and/or perform post-increment or suchlike. Intel's Itanium only offers pre-decrement in similar circumstances. This may be more useful or may be related to other details of the processor architecture. Regardless, it is trivial to make a register which is also a ripple counter. Also, it is trivial to make a decrementing counter by inverting the inputs and outputs or an incrementing counter. However, it is more cumbersome to make a latch which increments and decrements in a range of steps. Therefore, asymmetry, such as pre-decrement only, is occasionally seen.

I'd resolved much of this already. I just had to be certain that memory-to-memory addressing, such as MC68000's MOV (A0)+,(A1)+ is not a template to follow slavishly. Indeed, a suitable RISC design allows optional unaligned access which exceeds the historical functionality without slowing the common case.

In the spirit of Fred Brooks Junior ("Show me your data structures and I shall be enlightened. Show me flowcharts and I shall remain mystified."), for eight registers, three stacks (system, USR1, USR2) and a stack of program counters, instructions have:-

  • 1 accumulate bit.
  • 1 float bit.
  • 3 other opcode bits.
  • 2 bits for data path width where:-
    • 00: byte.
    • 01: word.
    • 10: long.
    • 11: double.
  • 3 bit destination operand.
  • 3 bit source operand.
  • Another 3 bit source operand.
  • 3 bit pre-scale.
  • 7 bit immediate or offset value.

The first 5 bits define opcodes in groups of four such that:-

  1. Integer multiply.
  2. Integer multiply with accumulate.
  3. Floating point multiply.
  4. Floating point multiply with accumulate.
  5. Integer subtraction.
  6. Integer subtraction with accumulate.
  7. Floating point subtraction.
  8. Floating point subtraction with accumulate.
  9. Unused.
  10. Unused.
  11. Floating point divide.
  12. Floating point divide with accumulate.
  13. Aligned load and store instructions.
  14. Unaligned load and store instructions.
  15. USR1 push and pop instructions.
  16. USR2 push and pop instructions.
  17. Arithmetic shift right.
  18. Arithmetic shift right with accumulate.
  19. Trigonomitric functions.
  20. Trigonomitric functions with accumulate.
  21. Arithmetic shift left.
  22. Arithmetic shift left with accumulate.
  23. Hyperbolic functions.
  24. Hyperbolic functions with accumulate.
  25. Unused.
  26. Unused.
  27. Roots and rounding.
  28. Roots and rounding with accumulate.
  29. Flow control.
  30. Flow control.
  31. Unused.
  32. Unused.

ALU or FPU instruction performs one of:-

  1. a = b op (imm << pre)
  2. a = b op ((imm + c) << pre)
  3. a = b op (([imm + c]) << pre)
  4. a = b op (([imm + c++]) << pre)
  5. a += b op (imm << pre)
  6. a += b op ((imm + c) << pre)
  7. a += b op (([imm + c]) << pre)
  8. a += b op (([imm + c++]) << pre)

You may notice several omissions from the instruction set. For example, as is often defined in C, logical shifts can be derived from arithmetic shifts. Also, OR with zero provides register-to-register MOV. Most importantly, where is addition? There is no 3-address addition which can source from memory. However, register-to-register multiplication is commutative and therefore some addition can be hidden in the multiply instruction. Specifically:-

  • a = b + ((imm + c) << pre)
  • a += b + ((imm + c) << pre)

It is useful to implement:-

  • a = 0
  • a = const
  • a += const
  • a -= const
  • a += a
  • a = b + c
  • a = b * b
  • a += b * b

Of these, handling constants is the most awkward. However, it may be possible to amortize this difficulty if constants are sourced from stack. Fixed quantity left shifts are provided with all ALU operations. With some byte manipulation, this also provides fixed quantity right shifts. See definition of ntohs() for example. (This idiom may be replaced with byte swap or word swap, if available.) Unused bit patterns may be used for AND, OR and XOR. Division and modulo which cannot be implemented as bit operations are implemented with a subroutine. This is standard practice on AVR and ARM. At present, there are no instructions to cast between integers and floating point. However, immediate values are cast appropriately and this may form the basis of conversion subroutines. Epsilon values over a large range can be obtained with use of pre-scaler and floating point division. "Branch never" means call subroutine. "Branch never nowhere" means return from subroutine.

Although the instruction set is not completely orthogonal, omissions can be written as unconditional sequences of instructions which may use temporary variables. These local variables compete for register allocation or stack allocation like any other local variable. Ignoring the asymmetry of register-to-register addition with pre-scale, 2^r registers are completely orthogonal. Register allocation can be shuffled with impunity. However, program length may differ slightly when packed encodings change. When combined with function inlining and a callee saves, stack only call convention, register allocation has no dependencies between functions.

Blunder Woman has Died

Posted by turgid on Saturday January 13 2018, @12:53PM (#2918)
4 Comments
/dev/random

Bella Emberg, aka Blunder Woman (Cooperman's faithful sidekick), has died at the age of 80.

Trump Censors His Own Transcript, Again

Posted by DeathMonkey on Wednesday January 10 2018, @08:06PM (#2915)
17 Comments
Code

As part of his ongoing effort to prove he’s “a very stable genius” and “like, really smart” following the release of a book that portrays him as the exact opposite, on Tuesday afternoon President Trump held a televised meeting with members of Congress on the topic of immigration.

This did not go as planned. The most notable moment was when Trump responded with enthusiasm to Democratic senator Dianne Feinstein’s suggestion that they pass a “clean” bill making the Deferred Action for Childhood Arrivals program permanent. House Majority Leader Kevin McCarthy quickly jumped in to remind Trump that Republicans don’t want to protect DACA recipients without getting some border-security measures (or maybe even a big, beautiful wall) in return.

But that’s not what one might take away from the exchange if, for some reason, they opted to read the transcript released by the White House. The Washington Post’s Ashley Parker noticed that the line where Trump agrees with Feinstein’s proposal — saying, “Yeah, I would like to do it” — is “curiously missing” from the document.

Guess Which Line Was Missing From the Transcript of Trump’s Immigration Meeting

Apparently this isn't the first time.

Trump plays golf 3 times as much as Obama, costing $43 mil.

Posted by DeathMonkey on Tuesday January 02 2018, @07:01PM (#2900)
31 Comments
Code

Donald Trump has spent 81 days on the golf course in his first year as President, racing past his predecessors.

Mr Trump, after a weekend at his Trump International Golf Club in West Palm Beach, Florida, has spent more time on the green than George W Bush did during eight years in office.

The President has also been on the golf course almost three times as much as Barack Obama did during his first year.

The American public spent at least $43 million in order to support President Donald Trump‘s considerable golf habit in 2017.

The American Public Reportedly Spent $43 Million Last Year So Trump Could Play Golf

Donald Trump plays golf almost three times as much as Barack Obama after one year in office

Virginians react to House race decided by 1-vote

Posted by DeathMonkey on Wednesday December 20 2017, @09:08PM (#2879)
1 Comment
News

Every Vote Counts!

RICHMOND, Va. -- Among the holiday hustle and bustle of Carytown, Virginia voters react to what some might call a "Christmas Miracle" for Democrats in the Commonwealth.

"That's sort of amazing," voter Scott Williams said.

"That's pretty amazing," voter Ariel Furler said.

In a stunning turn of events, Democrat Shelly Simonds gained eleven votes in a recount to beat the Republican incumbent in the 94th District by just one vote.

The final tally: 11608 votes to 11607 votes.

UPDATE: Apparently it's a tie now. By state law, the winner of the tie will be determined "by lot."

Using A Cheap Audio Amplifier For Motor Control, Part 2

Posted by cafebabe on Tuesday December 19 2017, @09:50PM (#2877)
4 Comments
/dev/random

A terrible event has happened. The person who inspired me to use audio amplifiers as cheap motor controllers is very likely to be dead. He may be younger than 40 and it may be suicide. All of his family live in another country and they may have missed his funeral. If he is not dead then it is possible that he is stuck in a mental asylum for an extended period. At best, he may get out of this situation with no pets, no possessions and no home.

Either way, I blame his girlfriend for the situation. He considered their relationship to be very exclusive but she treated their relationship in much more casual manner. I believe that she was unfaithful before I met either of them. She may think that her boyfriend was unfaithful when he was just doing geek things with other people. That includes the time we spent together making minimal progress with robots.

She had easy access to drugs and she may have chemically messed with his sanity. Apparently, the police concluded their investigation without suspicion and gave his phone to her. This would be consistent with gaining all of his possessions from a will or gaining power of attorney.

Origami Cube: Map Projection, Four-Way Rotational Symmetry

Posted by cafebabe on Tuesday December 19 2017, @09:47PM (#2876)
0 Comments
Code

After writing software to make bitmaps for origami cubes and then making dozens of random designs, including cartoon characters, celebrities, memes (some from shock-sites) and fancy dress costumes (there's no shortage of that on the Inter-tubes), a friend asked if I could make designs which join at the edges. For example, a continuous pattern of water waves or a cube projection of a map of Earth. Actually, my first suggestion was placing Dress-Up Jesus onto the net of a cube. You might think that is blasphemous but the commedian, Bill Hicks asked "You think when Jesus comes back, he really wants to see a cross? That's like going up to Jackie Onassis with a rifle pendant on." On that basis, what do you think his reaction would be to an organized religion which uses it as its primary symbol?

Anyhow, I've been working on bitmap designs which join on all four edges. Examples include:-

  • Octagons and squares. I thought this would look like floor tiling but it looks more like a mishapen sportsball.
  • A solved Rubik cube with the slight dodgy 1980s palette of #FFF, #F00, #F70, #FF0, #3F0, #03F.
  • Diagonal lines separating squares of random colors. This looks like one of the more difficult Rubik cube variants.
  • Inspired by My Ideal House, it took less than five minutes to make an abstract Mondrian style design as follows:-
    +---------------+---------------+-------+---------------+
    |               |               |       |               |
    +-----------+---+---------------+-------+    Yellow     |
    |   Blue    |   |               |       |               |
    +-----------+---+---------------+-------+---------------+
    |   Blue    |   |               |       |               |
    +-----------+---+---------------+  Red  |               |
    |           |   |               |       |               |
    +-----------+---+---------------+-------+---------------+
    |                               |       |               |
    |                               |       |               |
    |                               |       |               |
    +-------+-------+---------------+-------+               |
    |       |       |               |       |               |
    |       |       |               |       |               |
    |       |       |               |       |               |
    |Yellow |       |     Blue      |       |               |
    |       |       |               |       |               |
    |       |       |               |       |               |
    |       |       |               |       |               |
    +-------+-------+---------------+-------+-------+---+---+
    |       |       |               |               |   |   |
    |       |       +---------------+      Red      |   |   |
    |       |       |               |               |   |   |
    |       |       +---------------+---------------+---+---+
    |       |       |               |               |   |   |
    |       |       |               |               |   |   |
    |       |       |               |               |   |   |
    +-------+-------+---------------+---------------+---+---+

I may attempt to convert a Mercator projection of Earth to an origami cube. Obtaining the sections around the equator is easy. Just take the mid section of the map and divide into four strips. The artic regions only require a little more work via GIMP's polar co-ordinate distortion dialog box. Unfortunately, sections may have to rotated and shuffled to ensure that they assembled correctly. However, this doesn't require additional software and, for the final step, inputs don't have to be square or the same size.

I'm going to be off-line until next year but if anyone wants to troll, feel free to send a suitable bitmap and origami folding instructions to the Flat Earth Society.

Origami Cube: Bug Fix For Dec 2017

Posted by cafebabe on Tuesday December 19 2017, @09:45PM (#2875)
0 Comments
Code

In the spirit of bug fixes before features, in the origami cube program:-

rotate(im0,im5,3);

should be changed to:-

rotate(im0,im5,1);

This does not affect designs with two-way rotational symmetry. Nor does it adverely affect designs which have similar colors in opposite corners.

Optical Computing, Part 1

Posted by cafebabe on Tuesday December 19 2017, @09:43PM (#2874)
1 Comment
Hardware

(This is the 57th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

The quest for faster computing and reduced energy consumption may lead to widespread use of optical computing. This has been predicted since at least 1971 but hasn't occurred. This has been due to numerous difficulties.

The most significant difficulty is the scale of integration. While it is possible to manufacture electronic transitors at 22nm or (various exagerations of 15nm), the use of infra-red lasers limited optical computing to a theoretical minimum feature size of 600nm. Use of gallium in one or more guises may reduce this to 380nm. Significantly reducing this limit would require development and safe handling of extremely small X-ray sources or similar. As a matter of practicality, I'm going to assume that optical computing is powered by a 410nm gallium blue laser diode.

While etching of electronic transistors has received considerable funding and development, optical computing competes at increasing disadvantage. If optical circuits are manufactured using two dimensional etching then we have the problem that we will not be able to have optical processors with 100 million transistors unless the optical substrate is very large or manufactured in very small pieces and then stacked and connected in a manner which has not been developed. Alternatively, some form a holographic etching may have to be developed.

I'm going to assume that an optical CPU uses no more than 10000 gates. At this point, we're at the same scale of integration as 1970s CPU designs. An optical CPU may run at 20THz but it will have no more gates than a Z80 and scope to increase gates may be very limited. For example, electronic inter-connections may be significantly slower than optical connections. We may have the foreseeable situation of an optical 8086 (or similar) emulating a much more recent iteration of x86. This would include optical emulation of cache tiers and wide registers. Specifically, the old wisdom of processing data in 4 bit chunks or 16 bit chunks may be re-established as a matter of necessity.

Emulation will allow optimizations which are infeasible in hardware. For example, if four-way SIMD is used to calculate three dimensional co-ordinates, an emulator can peep ahead in an instruction stream and see which pieces are used before a register is over-written. Then it is possible to only calculate the three pieces required and then leave 1/4 of a register unchanged. In hardware, this would require more circuitry and energy than it would save. In software, this could make a program run faster while saving energy.

I presume that main memory will remain electronic. This would maintain expected storage density. However, DRAM may be superceded by persistent memory.

Generic, multi-core designs will become increasingly desirable. For neural simulation, decompression of dendrite weights and subsequent floating point calculations will all be performed by small integer processors. The dabblers who make their own image format, filing system or init system will move to instruction set design. This will create a profusion of incompatible applications, compilers and execution environments which will make Android look quaint. Despite increasing I/O bottlenecks, theoreticians will continue to avocate pure message passing.

I Come From The Planet Gallium!!!!!

Posted by cafebabe on Tuesday December 19 2017, @09:35PM (#2873)
1 Comment
Hardware

I have a friend who is convinced that Gallium is not an element in the periodic table and is a planet in a science fiction series. I understand his concern because I'm quite convinced that chives is a medical condition. ("Have you got chives?" "No, it's just the way I walk.")

Anyhow, enough of this silliness. I've been researching gallium nitride. Actually, I don't think it would be a huge revelation to say that I've been working on software to implement a network protocol on low-power, low-cost devices and the primary purpose of the system is to securely and reliably control and monitor perimeter security, beer brewing and hydroponics. Well, it is mostly about hydroponics and that's where I've found multiple references to gallium nitride.

Blue LEDs have become fairly ubiquituous. Well, it is a slightly purple type blue which is gallium nitride's 410nm spectral peak. However, now that experimental LEDs exceed 50% efficiency and manufacturing quality has improved, blue LEDs have a particularly piercing quality. In particular, lights on emergency vehicles have become blindingly bright. A friend noted that lights used by traffic police have become dangerously bright and if anyone needed the lights so bright then they shouldn't be driving.

Laser diodes and other spectral peaks are also widely used. Green LEDs have switched to exploiting a different spectral peak of gallium compounds. That accounts for the switch from a moderate green to a slightly blue type green. The short wavelength from blue laser diodes allows more data to be retrieved from optical disc storage. Ultra-violet LEDs are used for forgery detection and hair removal.

LED distribution forms a geographic monopoly. Nichia in Japan, Philips in Europe and Cree in North America are market leaders. Of these, Cree is significantly smaller but you'd never guess from their extensive advertising. Hydroponic innovation occurs disproportionately in arid parts of North America. However, when top tier hydroponic lights contain top tier components from the local geographic monopoly, it leads to co-branding which disadvantages any competitor outside of North America.

However, the development of blue LEDs is a checkered (colorful?) history. After many years of development at the direction of Nichia's founder and continuing without support, Shuji Nakamura was the primary recipient of the 2014 Nobel Prize For Physics. He also got a 20000 Yen (about US$200) bonus for his work. Most of my knowledge about Japanese business culture come from anime, gangster films and Niall Murtagh's book The Blue-Eyed Salaryman. From this, I immediately knew that a 20000 Yen bonus was insultingly small. People get more than this for improving TPS Reports. He sued, won the largest bonus in Japan and then lost most of it when his employer appealed. After that, he started his own company and began working on gallium nitride on gallium nitride. After completing the triple (Triad?) of red, green and blue LEDs and getting a Nobel Prize for his efforts, I thought this was just a chemist's scientific investigation. However, after seeing electron microscope pictures of gallium nitride on a silicon substrate, I fully appreciate the quest to continue development. My reaction was "Well, that's like trying to turf the White Cliffs Of Dover." After gallium nitride has been applied to a silicon substrate, the cracks are deep and craggy. The mis-match in atom sizes causes the silicon to shear apart. Of course, this wouldn't happen if doped gallium nitride could be applied to a substrate of gallium nitride. That's currently under development.

My investigation came after attempting to switch 300W of LEDs at 48VDC with MOSFETs. This shouldn't be a huge challenge. 1000W MOSFETs are available and 600W LED systems are available. So, 300W is entirely reasonable. However, V = IR and P = IV. therefore, P = I2R. I performed a calculation to determine the heat that would be dissipated by my chosen MOSFETs and found that I required a heatsink which would dissipate 18W. That would make MOSFET switching about 94% efficient. It also required an elaborate arrangement for heatsinks or a saccraficial area of circuit board to work as a less effective heatsink. From my calculations, I found that I could improve efficiency by increasing Voltage and decreasing current. Unfortunately, 48V is at the limit of wiring regulations which allow LEDs to be exposed. Higher Voltage would reduce efficiency because LEDs would have to be placed behind glass or plastic. This could also be a moisture trap.

I remember that Google co-ran a competition a while back where the objective was to do efficient power conversion. Apparently, two teams exceeded the specification by a factor of three. I didn't understand the details at the time but I've now learned more about MOSFETs. The general approach seems to be use of an asymmetric "five legs topology" to perform switching. I presume this works like a Commodore 64 PSU's Voltage sense output. This allows a Commodore 64 to automatically determine if it is running from 50Hz or 60Hz mains electricity. From this, it can automatically output a PAL or NTSC signal. The Google Little Box Challenge winners seem to be using something similar (plus a neural network) to optimize switching times. They also use counter-wound coils, capacitors which change value under load and gallium nitride FETs.

Oh. It might be possible to raise the switching efficiency of gallium nitride LEDs by using gallium nitride transistors. This appears to be a material of the future.

The best introduction on this topic (for someone already familiar with gallium nitride LEDs and MOSFETs) was from (the rather plucky) Efficient Power Conversion's Application Note AN002: Fundamentals Of Gallium Nitride Power Transistors by Stephen L. Colino and Robert A. Beach, PhD which explains the manufacture and characteristics of the EPC1001 and EPC1010 GaN FETs. These are just example products and they'll make anything you want. They freely admit that they're bootstrapping from the existing infrastructure. That means all of the products use a silicon substrate. From Shuji Nakamura's work, that's known to be highly inefficient. However, within these constraints, they'll modify their manufacturing process to meet the characteristics of your choice. Specifically, "Where this does not allow compliance with safety agency creepage distance requirements, underfill can be used." If you're willing to use this experimental technology then manufacturers are willing to adapt products to your requirements.

I only investigated this because I thought that 94% efficiency was unreasonable. However, I accidentally found something which may solve multiple problems. Specifically, EPC's Application Note AN002 suggests that GaNFETs can be used in D-Class Amplifiers. That means GaNFETs can be used for audio amplifiers, quadcopters and robots. (My current choice for quadcoptor control is misuse of a TDA7379 audio amplifier. That's about 95% efficient and requires a heatsink. Raising the efficiency would reduce or eliminate a heatsink. This would be particularly welcome for a quadcopter.)

GaNFETs are already suitable for switching domestic mains electricity at 1MHz. Indeed, GaNFETs outperform MOSFETs in all characteristics except gate leakage current - and I suspect this deficiency will be resolved when gallium nitride on gallium nitride is resolved by a Nobel Physicist or one of his rivals. Even without this, there is talk among experts of GaNFET switching at 1GHz or even 1THz. I foresee GaNFETs, CRT [Chinese Remainder Theorem], PWM [Pulse Width Modulation] and/or SDR [Software Defined Radio] converging. That would allow micro-controllers to shape and modulate radio waves over a very wide range of frequencies while only using one leaky power component. The leak will get fixed and the switching frequencies will continue to rise. Within 30 years, it may be possible to identify a person's sex and race from the resonant frequencies of their DNA. Within another 30 years, it will be possible to accurately diagnose illness by bringing a device close to a patient. That sounds very much like like a medical tricorder. That would be great if it was only used for good but there was a proposal to make a smart bomb which only triggers when it is within range of one or more passports with RFID chips of a chosen nationality. A similar result can be obtained with facial recognition and, one day, it will also be possible by remoting sensing DNA.

From my reading of Nexus Magazine, Volume 24, Issue 6 and Nexus Magazine, Volume 25, Issue 1 (current issue), EMF [Electro-Magnetic Fields], Wi-Fi and dirty mains are all very unpopular concepts. I think they'd be even less impressed about MIMO phased-array beam-steering.

EMF is unavoidable. Wi-Fi is definitely avoidable. (Thank you, super-powers, for letting us plebs use frequencies which are useless for long-range military applications, such as 2.45GHz [a resonant frequency of water] and 60GHz [a resonant frequency of oxygen].) Dirty mains is a real problem. Very few people can design power switching circuitry properly. I can't do it. However, I'm not under time pressure and I have the luxury of asking questions. The typical scenario is a dabbling amateur who is an "expert" within a company. Under time pressure, the "expert" designs some square wave switching circuitry. That wastes about 30% of the energy but, hey, the customer rewards first-to-market and the customer pays for the externalities. The design might be sent to a standards laboratory which has commercial incentive to obtain repeat business. A modified design might be made to specification by a manufacturing sub-sub-contractor. After a device is stocked in a warehouse, it will get bashed around by a next-day delivery courier who competes on speed and price. Then it will get further abuse during daily use. The 30% excess energy radiates along any available cables. Historically, the power and frequency range of dirty signals were relatively capped. But with GaNFETs, it is now possible to switch thousands of Watts at unprecedented frequencies. And software control of GaNFETs is a particular problem if software can be compromised. From Edward Snowden's documents and other sources, we know that:-

  • The NSA can compromise almost every smartphone ever made.
  • The NSA can compromise smart televisions.
  • The NSA has made passive bugs which can be powered remotely with by up to 1kW of microwave energy. (There is a conspiracy theory that use of this system killed Hugo Ch´vez.)
  • An increasing proportion and quantity of devices have one or more digital radio interfaces.
  • Many of these devices are already able to generate radio signals up to 5GHz. 60GHz will be commonly available.
  • The baseband processor on many phones runs code sent from a telephone company and has access to main memory.
  • Almost every device with Wi-Fi is insecure.
  • There is no Internet Of Things, Privacy And Security.
  • It is increasingly common to modulate radio signals from software.
  • A battery powered device may be used in close proximity.
  • A mains powered device may generate powerful signals in short bursts without overheating its circuitry.

I don't know enough to propose a solution. However, I know that one of the regular advertisers in Nexus Magazine won't help you. The Polarix Disc is a rather pretty two inch diameter circle of single-sided, copper-clad fiberglass etched with concentric hoops. However, its main effect appears to be the transfer of US$30.