The first technique is to de-correlate audio. This involves cross-correlation and auto-correlation. Cross-correlation has a strict hierarchy. Specifically, all of the audio channels directly or indirectly hinge upon a monophonic audio channel. It may be useful for three dimensional sound to be dependant upon two dimensional sound which is dependant stereophonic sound which is dependant upon monophonic sound. Likewise, it may be useful for 5.1 surround sound and 7.1 surround sound to be dependant upon the same stereophonic sound. Although this allows selective streaming and decoding within the constraints of available hardware, it creates a strict pecking-order when attempting to compress common information across multiple audio channels.
Cross-de-correlated data is then auto-de-correlated and the resulting audio channels are reduced to a few bits per sample per channel. In the most optimistic case, there will always be one bit of data per sample per channel. This applies regardless of sample quality. For muffled, low quality input, it won't be much higher. However, for high quality digital mastering, expect a residual of five bits per sample per channel. So, on average, it is possible to reduce WXYZ three dimensional Ambisonics to 20 bits per time-step. However, that's just an average. So, we cannot arrange this data into 20 priority levels where, for example, vertical resolution dips first.
Thankfully, we can split data into frequency bands before or after performing cross-de-correlation. By the previously mentionedgeometric series, this adds a moderate overhead but allows low frequencies to be preserved even when bandwidth is grossly inadequate.
It also allows extremely low frequencies to be represented with ease.
High Quality Audio, Part 2
(This is the 10th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
Audio and video streaming benefits from stream priorities and three dimensional sound can be streamed in four channels but how can this be packed to avert network saturation while providing the best output when data fails to arrive in full?
The first technique is to de-correlate audio. This involves cross-correlation and auto-correlation. Cross-correlation has a strict hierarchy. Specifically, all of the audio channels directly or indirectly hinge upon a monophonic audio channel. It may be useful for three dimensional sound to be dependant upon two dimensional sound which is dependant stereophonic sound which is dependant upon monophonic sound. Likewise, it may be useful for 5.1 surround sound and 7.1 surround sound to be dependant upon the same stereophonic sound. Although this allows selective streaming and decoding within the constraints of available hardware, it creates a strict pecking-order when attempting to compress common information across multiple audio channels.
Cross-de-correlated data is then auto-de-correlated and the resulting audio channels are reduced to a few bits per sample per channel. In the most optimistic case, there will always be one bit of data per sample per channel. This applies regardless of sample quality. For muffled, low quality input, it won't be much higher. However, for high quality digital mastering, expect a residual of five bits per sample per channel. So, on average, it is possible to reduce WXYZ three dimensional Ambisonics to 20 bits per time-step. However, that's just an average. So, we cannot arrange this data into 20 priority levels where, for example, vertical resolution dips first.
Thankfully, we can split data into frequency bands before or after performing cross-de-correlation. By the previously mentioned geometric series, this adds a moderate overhead but allows low frequencies to be preserved even when bandwidth is grossly inadequate.
It also allows extremely low frequencies to be represented with ease.