(This is the 17th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
For a speaker array, basic problems between host computer and a micro-controller can be overcome. An outline solution is host -> USB2.0 -> device -> SPI -> DACs. Blocks of sound are transferred over USB. Each block nomimally represents 48kHz sound for up to 1/24 second (2000 samples or so). However, without exceeding the USB2.0 bandwidth limitation of 12Mb/s is is possible to transfer:-
Each block of samples is sent with a type, a length and one or more checksums. When this data is placed into a triple-buffering system, the micro-controller may seamlessly switch type when processing the next buffer.
Selection of cost-effective components is an art that I haven't mastered. My technique is to obliquely search EBay by functionality. This gives an overview of surplus components and cloned components. From this, it is trivial to find official datasheets. This invariably encounters warnings from manufacturers to not use legacy components in new designs and instead use components which, back on EBay, are up to 10 times more expensive. Obviously, I could use comparison functionality on the more advanced retail websites but this provides an overview.
After reading many datasheets, I'm not much further ahead. What DACs should be used? Maybe Analog Devices AD1952? Linear Technologies LTC2664 16 channel I2S DAC? Maxim MAX5318 18 bit SPI DAC? Or one of the many other choices?
After staring at I2S for a long time, it appears that, yes, it has a passing similarity to I2C or SPI with the exception that:-
Some components very obviously follow the technique poineered by Dallas Semiconductor where the device is made with different modes of operation. In this case, different interfaces are notched out with a laser according to market demand. Given that DACs may be laser tuned, this is one of the most obvious places to increase margin on commodity components.
Some DACs interfacing with SPI or I2S may be connected to a serial stream in parallel and the selectively slurp data via a hand-over signal. This allows DACs to scale without incurring bit errors from, for example, typical SPI daisy-chaining devices in series.
I considered the possibility of performing I2S (or suchlike) without a dedicated interface. This would provide the most design flexibility because the serial format would be defined entirely in software. If one DAC is discontinued then it would be possible to modify software (and board wiring) and continue with a different DAC. However, 32 × 16 bit samples at 48kHz is a bit-rate execeeding 25Mb/s. To raise and lower one clock signal from software requires at least 50 MIPS. This excludes processing power to perform any other functionality. Toggling can be amortized by ganging eight or more serial streams. However, this requires an intermediary, such as a shift register - or a chunky-to-planar bit matrix transpose, such as performed by a Commodore Amiga Akiko Chip. 4014 parallel-to-serial shift registers are too slow (and cumbersome).
The task of interest is to take eight bytes of data and output, for example, the bottom bits of each byte to a micro-controller's parallel port. Then one pin can be toggled. This acts as a clock for eight separate serial streams but only requires two instructions to signal a change of state to all downstream devices. Unfortunately, the transpose which preceeds output is processor intensive. If a CPU has suitable bit rotate operations through a carry flag or suchlike, it may be possible to zig-zag in 64 clock cycles or so. 64 conditional tests would require two or three clock cycles for each test. Is there a faster method? The benefit would be a greater volume of output and possibly more channels. (Something akin to VGA Mode X graphics popularized by Quake.) Or reduce power consumption. Or a reduced hardware specification.
The simple software transform requires one or more instructions per bit - and that assumes sufficient registers and flags. When I first encountered this problem, I considered a chain of rotates via one flag register. However, after consideration of quadtrees and matrix multiplication optimization, it is "obvious" to me that a matrix of 2^n×2^n bits can be transposed in n iterations. For 8×8 bits, three iterations are required. The first iteration swaps two opposing 4×4 blocks. The second iteration swaps two opposing 2×2 blocks in each quadrant. The third iteration swaps individual bits. If bytes are held in separate variables, this requires eight registers to hold the data and more registers for bitmasks and intermediate values. This works poorly on many micro-controllers. For example, ARM Thumb mode only has eight general, directly addressable registers. Thankfully, values can be ganged into 16 bits, 32 bits or even 64 bits. This significantly reduces the quantity of registers required. It also greatly reduces the number of instructions (and clock-cycles) required for a transpose operation.
The overall result is that 25Mb/s can be bit-banged with less than 15 MIPS of processing power. However, this only applies if eight streams are bit-banged in parallel. Other functionality, including 6 million multiplies per second, may remain within a 40Mhz processing budget.
(Score: 3, Funny) by Pino P on Saturday July 15 2017, @01:44PM
So in regular RAM, the data is carried on the red (on-beat) arrows, and in East German [wikipedia.org] Dance [wikipedia.org] RAM [wikipedia.org], it's carried on both the red arrows and the blue (between beats) arrows. Am I right?