Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


cafebabe (894)

cafebabe
(email not shown publicly)

Journal of cafebabe (894)

The Fine Print: The following are owned by whoever posted them. We are not responsible for them in any way.
Friday July 07, 17
04:31 AM
Software

(This is the ninth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

A well-known trick with streaming media is to give priority to audio. For many viewers, video can glitch and frame rate can reduce significantly if audio quality is maintained. Within this constraint, monophonic audio can provide some continuity even when surround sound or stereo sound fails. This requires audio to be encoded as a monophonic channel and a left-minus-right channel but MLP demonstrates that mono, stereo and other formats can be encoded together in this form and decoded as appropriate. How far does this technique go? Well, Ambisonics is the application of Laplace Spherical Harmonics to audio rather than chemistry, weather simulation or a multitude of other purposes.

After getting through all the fiddly stuff like cardiods, A-format, B-format, UHJ format, soundfield microphones, higher order Ambisonics or just what the blazes is Ambisonics? we get to the mother of all 3D sound formats and why people buy hugely expensive microphones, record ambient sounds and sell the recordings to virtual reality start-ups who apply trivial matrix rotation to obtain immersive sound.

Yup. That's it. Record directional sound. Convert it into one channel of omnidirectional sound and three channels of directional sound (left-minus-right, front-minus-back, top-minus-bottom). Apply sines and cosines as required or mix like a pro. The result is a four channel audio format which can be streamed as three dimensional sound, two dimensional sound, one dimensional sound or zero dimensional sound and mapped down to any arrangement of speakers.

Due to technical reasons a minimum of 12 speakers (and closer to 30 speakers) are required for full fidelity playback. This can be implemented as a matrix multiplication with four inputs and 30 outputs for each time-step of audio. The elements of the matrix can be pre-computed for each speaker's position, to attenuate recording volume and to cover differences among mis-matched speakers. (Heh, that's easy than buying 30 matched speakers.) At 44.1kHz (Compact Disc quality), 1.3 million multiplies per second are required. At 192kHz, almost six million multiplies per second are required for immersive three dimensional sound.

For downward compatibility, it may be useful to encode 5.1 surround sound, 7.1 surround sound with Ambisonics. Likewise, it may be useful to arrange speakers such that legacy 5.1 surround sound, 7.1 surround sound, 11.1 surround sound or 22.2 surround sound can be played without matrix mixing.

Using audio amplifiers, such as the popular PAM8403, it is possible to put 32×3W outputs in a 1U box. This is sufficiently loud for most domestic environments.

Display Options Threshold/Breakthrough Reply to Article Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0) by Anonymous Coward on Friday July 07 2017, @12:42PM (6 children)

    by Anonymous Coward on Friday July 07 2017, @12:42PM (#536087)

    At 192kHz, almost six million multiplies per second are required for immersive three dimensional sound.

    Why would you do this? Unless you're using ultrasound speakers, the output from an DA at that sample rate would need to be low-passed to prevent IMD product in the tweeter. No?

    • (Score: 1, Informative) by Anonymous Coward on Friday July 07 2017, @04:33PM (2 children)

      by Anonymous Coward on Friday July 07 2017, @04:33PM (#536165)

      Couple of reasons.

      First reason: studio-level mastering. You always master with as many bits and herz as you can scrounge up, and downsample for production. This is because many studio processes effectively suck up bits and herz. For a result that makes no mathematical difference to the audience, you have to go over human auditory levels of precision and accuracy in pre-production.

      Second reason: When you have multiple speakers, it is possible for otherwise inaudible frequencies to interact in ways that are audible. Minimising theoretically inaudible aliasing artifacts reduces that concern.

      Third reason: Amps and speakers (especially high end designs) work hard to get the speaker cones in the right place at the right time, and damp extraneous movement, but realistically they're not perfect, and the more aliased the signal the jerkier their motion will be (obviously DA converter quality plays a role here) and this can add an actually audible layer of noise. Reducing that at levels that are completely inaudible in theory can have practical results. In headphone speakers this is basically a non-issue, but at the theatre level it's a major design concern.

      • (Score: 0) by Anonymous Coward on Friday July 07 2017, @09:02PM (1 child)

        by Anonymous Coward on Friday July 07 2017, @09:02PM (#536263)

        First reason: studio-level mastering. You always master with as many bits and herz as you can scrounge up, and downsample for production. This is because many studio processes effectively suck up bits and herz. For a result that makes no mathematical difference to the audience, you have to go over human auditory levels of precision and accuracy in pre-production.

        Converters and plugin effects oversample. I asked why one would output to (non-ultrasound) loudspeakers at a sample rate that would require additional filtering to prevent introduction of intermodulation distortion due to limited bandwidth of the tweeter.

        Second reason: When you have multiple speakers, it is possible for otherwise inaudible frequencies to interact in ways that are audible. Minimising theoretically inaudible aliasing artifacts reduces that concern.

        True but there would be no aliasing with a steep enough reconstruction filter at 48kHz and since DA's already oversample...

        Third reason: Amps and speakers (especially high end designs) work hard to get the speaker cones in the right place at the right time, and damp extraneous movement, but realistically they're not perfect, and the more aliased the signal the jerkier their motion will be (obviously DA converter quality plays a role here) and this can add an actually audible layer of noise. Reducing that at levels that are completely inaudible in theory can have practical results. In headphone speakers this is basically a non-issue, but at the theatre level it's a major design concern.

        All DA's oversample so the reconstruction filters can be just as steep at 48kHz as 192kHz, same effect on aliasing. Again, why send a signal containing program content two octaves above the frequency we can hear at when common ribbon twitters will not reproduce audio above ~35kHz?

        • (Score: 0) by Anonymous Coward on Friday July 07 2017, @10:26PM

          by Anonymous Coward on Friday July 07 2017, @10:26PM (#536292)

          You're absolutely dead right. Inevitably, indubitably, implacably right.

          For home audio purposes.

          If you're genuinely sending an audio stream to Joe Beergut watching his 3D presentation of Backdoor Sluts 9, then there's no point whatsoever in going above 44.1, or maybe 48kHz.

          However, that's not what's under discussion here. What's under discussion is the mission of sending data online - and you have no idea a priori of the context of that data sending. It's entirely possible that the needs of Joe Beergut are utterly negligible compared to those of bat researchers. Or someone else. I have no idea - and in advance of their researches, neither does anybody else.

          In the studio, needs are clearly advanced beyond what other people would require at point of consumption, and in research labs, the sky could be the limit. In modern studios, IP transmission is becoming increasingly common (parenthetically, it's becoming common for MIDI signals as well).

          If cafebabe is trying to figure out an arbitrary streaming media system, the sky should be taken to be the limit.

    • (Score: 2) by cafebabe on Saturday July 08 2017, @09:29PM (2 children)

      by cafebabe (894) on Saturday July 08 2017, @09:29PM (#536647) Journal

      A Nyquist sampling frequency is a minimum and not a recommendation. Exceeding the minimum allows phase information to be represented more accurately. Phase information becomes increasingly important as the number of speakers increase.

      High quality amplifiers generally provide linear response up to 20kHz and then taper response above this frequency. Without this tapering, the majority of energy would be dumped into ultra-sonic sound. However, this efficiency is largely de-coupled from sampling frequency or sampling bit-depth.

      A related issue with sampling and reproduction occurs with scanning and printing. Pixels are scanned in a range of shades but printing tends to be much more binary. (Ink or no ink.) This creates a typical situation where scanning resolution greatly exceeds print quality.

      --
      1702845791×2
      • (Score: 0) by Anonymous Coward on Sunday July 09 2017, @03:41PM (1 child)

        by Anonymous Coward on Sunday July 09 2017, @03:41PM (#536841)

        A Nyquist sampling frequency is a minimum and not a recommendation. Exceeding the minimum allows phase information to be represented more accurately. Phase information becomes increasingly important as the number of speakers increase.

        The phase of out of band harmonics and extraneous noise? Historically, low pass reconstruction filters introduced a linear phase roll off. In an age of oversampling convertors and linear phase DSP why use the higher sample rate?

        High quality amplifiers generally provide linear response up to 20kHz and then taper response above this frequency. Without this tapering, the majority of energy would be dumped into ultra-sonic sound.

        So we're feeding 192kHz for better phase response when power amplifiers have a non-linear response above 20kHz? Even the best ribbon tweeters only respond upto 50kHz and having low 5Hz frequencies present is going to reduce overall headroom. It's the low frequencies that require energy - the amount of air to be moved doubles per octave down.

        A related issue with sampling and reproduction occurs with scanning and printing. Pixels are scanned in a range of shades but printing tends to be much more binary. (Ink or no ink.) This creates a typical situation where scanning resolution greatly exceeds print quality.

        Digital audio is more akin to vector art, you do not lose "resolution" when you zoom in because the discrete samples already have all the required information to reproduce the original signal. Oversampling and optical HPF are used extensively in imaging devices to avoid aliasing and here it makes sense to capture at higher resolution but again it doesn't make sense to distribute material beyond the limits of human visual acuity.

        • (Score: 2) by cafebabe on Monday July 10 2017, @05:16AM

          by cafebabe (894) on Monday July 10 2017, @05:16AM (#537036) Journal

          So we're feeding 192kHz for better phase response when power amplifiers have a non-linear response above 20kHz?

          Yes. The sample rate may be about 10 times the frequency of the highest sound.

          --
          1702845791×2
(1)