Stories
Slash Boxes
Comments

SoylentNews is people

Journal by cafebabe

(This is the 17th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

For a speaker array, basic problems between host computer and a micro-controller can be overcome. An outline solution is host -> USB2.0 -> device -> SPI -> DACs. Blocks of sound are transferred over USB. Each block nomimally represents 48kHz sound for up to 1/24 second (2000 samples or so). However, without exceeding the USB2.0 bandwidth limitation of 12Mb/s is is possible to transfer:-

  • Silence.
  • For monophonic sound:-
    • 8 bit at 48kHz, 96kHz, 192kHz, 384kHz or 768kHz.
    • 16 bit at 48kHz, 96kHz, 192kHz or 384kHz.
    • 24 bit at 48kHz, 96kHz or 192kHz.
    • 32 bit at 48kHz, 96kHz or 192kHz.
  • For stereophonic sound:-
    • 8 bit at 48kHz, 96kHz or 192kHz, 384kHz.
    • 16 bit at 48kHz, 96kHz or 192kHz.
    • 24 bit at 48kHz or 96kHz.
    • 32 bit at 48kHz or 96kHz.
  • For Ambisonic WX format (one dimensional sound-field):-
    • 8 bit at 48kHz, 96kHz, 192kHz or 384kHz.
    • 16 bit at 48kHz, 96kHz or 192kHz.
    • 24 bit at 48kHz or 96kHz.
    • 32 bit at 48kHz or 96kHz.
  • For Ambisonic WXY format (two dimensional sound-field):-
    • 8 bit at 48kHz, 96kHz or 192kHz.
    • 16 bit at 48kHz or 96kHz.
    • 24 bit at 48kHz.
    • 32 bit at 48kHz.
  • For Ambisonic WXYZ format (three dimensional sound-field):-
    • 8 bit at 48kHz, 96kHz or 192kHz.
    • 16 bit at 48kHz or 96kHz.
    • 24 bit at 48kHz.
    • 32 bit at 48kHz.

Each block of samples is sent with a type, a length and one or more checksums. When this data is placed into a triple-buffering system, the micro-controller may seamlessly switch type when processing the next buffer.

Selection of cost-effective components is an art that I haven't mastered. My technique is to obliquely search EBay by functionality. This gives an overview of surplus components and cloned components. From this, it is trivial to find official datasheets. This invariably encounters warnings from manufacturers to not use legacy components in new designs and instead use components which, back on EBay, are up to 10 times more expensive. Obviously, I could use comparison functionality on the more advanced retail websites but this provides an overview.

After reading many datasheets, I'm not much further ahead. What DACs should be used? Maybe Analog Devices AD1952? Linear Technologies LTC2664 16 channel I2S DAC? Maxim MAX5318 18 bit SPI DAC? Or one of the many other choices?

After staring at I2S for a long time, it appears that, yes, it has a passing similarity to I2C or SPI with the exception that:-

  • Components are invariably stereophonic.
  • Left and right channel data is double-clocked on positive edge and negative-edge. This works like some interations of Dance RAM.
  • Components require an unwavering clock signal because this is used with frequency-doubling techniques to obtain a stable master frequency for PWM functionality.
  • External dependencies reduce component cost but is more fiddly.

Some components very obviously follow the technique poineered by Dallas Semiconductor where the device is made with different modes of operation. In this case, different interfaces are notched out with a laser according to market demand. Given that DACs may be laser tuned, this is one of the most obvious places to increase margin on commodity components.

Some DACs interfacing with SPI or I2S may be connected to a serial stream in parallel and the selectively slurp data via a hand-over signal. This allows DACs to scale without incurring bit errors from, for example, typical SPI daisy-chaining devices in series.

I considered the possibility of performing I2S (or suchlike) without a dedicated interface. This would provide the most design flexibility because the serial format would be defined entirely in software. If one DAC is discontinued then it would be possible to modify software (and board wiring) and continue with a different DAC. However, 32 × 16 bit samples at 48kHz is a bit-rate execeeding 25Mb/s. To raise and lower one clock signal from software requires at least 50 MIPS. This excludes processing power to perform any other functionality. Toggling can be amortized by ganging eight or more serial streams. However, this requires an intermediary, such as a shift register - or a chunky-to-planar bit matrix transpose, such as performed by a Commodore Amiga Akiko Chip. 4014 parallel-to-serial shift registers are too slow (and cumbersome).

The task of interest is to take eight bytes of data and output, for example, the bottom bits of each byte to a micro-controller's parallel port. Then one pin can be toggled. This acts as a clock for eight separate serial streams but only requires two instructions to signal a change of state to all downstream devices. Unfortunately, the transpose which preceeds output is processor intensive. If a CPU has suitable bit rotate operations through a carry flag or suchlike, it may be possible to zig-zag in 64 clock cycles or so. 64 conditional tests would require two or three clock cycles for each test. Is there a faster method? The benefit would be a greater volume of output and possibly more channels. (Something akin to VGA Mode X graphics popularized by Quake.) Or reduce power consumption. Or a reduced hardware specification.

The simple software transform requires one or more instructions per bit - and that assumes sufficient registers and flags. When I first encountered this problem, I considered a chain of rotates via one flag register. However, after consideration of quadtrees and matrix multiplication optimization, it is "obvious" to me that a matrix of 2^n×2^n bits can be transposed in n iterations. For 8×8 bits, three iterations are required. The first iteration swaps two opposing 4×4 blocks. The second iteration swaps two opposing 2×2 blocks in each quadrant. The third iteration swaps individual bits. If bytes are held in separate variables, this requires eight registers to hold the data and more registers for bitmasks and intermediate values. This works poorly on many micro-controllers. For example, ARM Thumb mode only has eight general, directly addressable registers. Thankfully, values can be ganged into 16 bits, 32 bits or even 64 bits. This significantly reduces the quantity of registers required. It also greatly reduces the number of instructions (and clock-cycles) required for a transpose operation.

The overall result is that 25Mb/s can be bit-banged with less than 15 MIPS of processing power. However, this only applies if eight streams are bit-banged in parallel. Other functionality, including 6 million multiplies per second, may remain within a 40Mhz processing budget.

 

Post Comment

Edit Comment You are not logged in. You can log in now using the convenient form below, or Create an Account, or post as Anonymous Coward.

Public Terminal

Anonymous Coward [ Create an Account ]

Use the Preview Button! Check those URLs!


Score: 0 (Logged-in users start at Score: 1). Create an Account!

Allowed HTML
<b|i|p|br|a|ol|ul|li|dl|dt|dd|em|strong|tt|blockquote|div|ecode|quote|sup|sub|abbr|sarc|sarcasm|user|spoiler|del>

URLs
<URL:http://example.com/> will auto-link a URL

Important Stuff

  • Please try to keep posts on topic.
  • Try to reply to other people's comments instead of starting new threads.
  • Read other people's messages before posting your own to avoid simply duplicating what has already been said.
  • Use a clear subject that describes what your message is about.
  • Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything, even moderated posts, by adjusting your threshold on the User Preferences Page)
  • If you want replies to your comments sent to you, consider logging in or creating an account.

If you are having a problem with accounts or comment posting, please yell for help.