Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


cafebabe (894)

cafebabe
(email not shown publicly)

Journal of cafebabe (894)

The Fine Print: The following are owned by whoever posted them. We are not responsible for them in any way.
Monday July 10, 17
03:30 AM
Hardware

(This is the 15th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I started with the observation that computing mostly consists of paper simulation rather than structured information. I started describing a URL-space to overcome this limitation. Then I mentioned problems with network addressing and packet payload size which affect a multi-cast streaming server. After describing the outline of streaming audio hardware and software (part 1, part 2, part 3, part 4, part 5) and some speaker design considerations, we've spanned the extent of this project. The remainder is detail, in-filling and corollaries.

The first detail is how to interface a speaker array to a host computer. For simplicity, we'll assume one source of WXYZ Ambisonic sound-field at 48kHz within an .AVI file. This is four channel audio. As previously described, that's one channel of omnidirectional sound and three channels of directional sound (left-minus-right, front-minus-back, top-minus-bottom). For each time-step (48000 times per second), a four element vector is multiplied with a 4×32 element matrix to obtain the output for each speaker in an array. This requires about 6.1 million multiplications per second. However, what hardware processes this data? A host computer? A dedicated processor? Some kind of analog process?

Analyze the situation and choose suitable interfaces. Matrix input is 48kHz × 4 channels × 32 bits. That's about 6.1Mb/s. Assuming a daisy-chain of 16 bit SPI DACs, matrix output is 48kHz × 32 channels × 16 bits. That's about 25Mb/s. Considering a list of suitable host interfaces against availability and cost (EtherNet, USB, FireWire, SPI, SCSI), 10Mb/s EtherNet and 12Mb/s USB2.0 provide suitable bandwidth and the latter would be the most conventional.

How much RAM is required (and does this affect packet size or type)? We assume the .AVI has 24Hz, 25Hz, 30Hz, 50Hz or 60Hz video only. For each frame, this requires transfer of 2000, 1920, 1600, 960 or 800 time-steps where each is 4 channels × 32 bits. This requires triple buffering of 32000 byte buffers. So, a micro-controller or DSP of the following specification is required:-

  • USB2.0.
  • 6.1 million multiplications per second.
  • SPI above 25Mb/s.
  • 96KB RAM or more.

So, a 40MHz micro-controller with, USB, SPI, hardware multiply and 128KB RAM would be sufficient. It may even be possible to perform 64 bit multiplication on such hardware. However, this specification doesn't have much headroom. In particular, multiple sound sources require mixing by the host computer. This is particularly awkward if one sound source is Ambisonic while another is stereo. Regardless, communication from host to micro-controller should include the following:-

  • Method to identify device version.
  • Method to set matrix multiplication constants.
  • Method to send up to 2048 samples of monophonic, stereophonic or Ambisonic audio at 48kHz only.
  • Method to silence output.

Meanwhile, host computer should include the following:-

  • Text and graphical interface to set speaker position and relative volume.
  • Method of rotating sound-stage.
  • Method to set master volume.
  • Method to send all of this information as matrix multiplication constants.
  • Method to mix one or more sound sources to monophonic, stereophonic or Ambisonic audio at 48kHz only.

A board with this functionality would cost about US$2 per channel. However, this excludes power, connectors or a box. Due to SPI allowing open-loop control, it is possible to keep the same firmware and make versions of this system with less audio outputs.

Connectors may be two bare wire terminals per speaker, one phono connector per speaker or one headphone socket per speaker pair. The latter is the most compact and cost-effective.

How does this system differ from Dolby Atmos? Dolby Atmos permits 128 point sound sources to be mixed for 500 people. That's a particularly ambitious sweet-spot. It potentially requires matrix multiplication for a 128×50 matrix (or taller) per audio time-step. This is in contrast to a 4×32 matrix (or shorter) per audio time-step.

Display Options Threshold/Breakthrough Reply to Article Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Monday July 10 2017, @04:09PM

    by Anonymous Coward on Monday July 10 2017, @04:09PM (#537150)

    We're feeding this into 2 ears at most, possibly with a subwoofer felt in the gut.

    Common situation #1 is headphones. You get stereo.

    Common situation #2 is people moving about instead of sitting nicely in the sweet spot. Mono is required in order to avoid bad spots.

    Really uncommon situation #1 is VR with head tracking and known ear-to-ear distance.

    Really uncommon situation #2 is somebody willing to sit in the sweet spot. WHO DOES THAT???? You can't lean over. You can't even really turn your head, because your ears are not co-located. (rotation implies translation for at least 1 ear, except rotation around the axis that passes through the ears) You are prisoner to the music.

(1)