Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


High Quality Audio, Part 3

Posted by cafebabe on Friday July 07 2017, @10:41PM (#2478)
4 Comments
Software

What is the minimum frequency response which should be represented by an audio codec? Well, what is the best analog or digital system and is it worthwhile to approach, match or exceed its specification?

Analog record players have an exceptionally good minimum frequency response. Technically, the minimum frequency response is determined by the duration of the recording. So, a 45 minute long-play record has a fundamental frequency of 1/2700Hz - which is about 0.0004Hz. Does this make any practical difference? Most definitely yes. People lampoon analog purists or stated that LP and CD are directly equivalent. Some of the lampooning is justisfied but the core argument is true. Most implementations of Compact Disc Audio don't achieve useful functionality.

Many Compact Disc players boldly state "1-Bit DAC Oversampling" or suchlike. The 1-Bit DAC is simplistically brilliant and variants are used in many contexts. However, oversampling is the best of a bad bodge. The optics of a Compact Disc are variable and, even with FEC in the form of Reed-Solomon encoding, about 1% of audio sectors don't get read. What happens in this case? Very early CD players would just output nothing. This technique was superceded with duplicate output. (Early Japanese CD players had an impressive amount of analog circuitry and no other facility to stretch audio.)

Eventually, manufacturers advanced to oversampling techniques. In the event that data cannot be obtained optically from disc, gaps are smoothed with a re-construction from available data. Unfortunately, there is a problem with this technique. Nothing below 43Hz can be re-constructed. 2KB audio sectors have 1024×16 bit samples and samples are played at exactly 44.1kHz. So, audio sectors are played at the rate of approximately 43Hz. However, any technique which continues audio waves has a fundamental frequency of 43Hz. Given that drop-outs occur with some correlation at a rate of 1%, this disrupts any reproduction of frequencies below 43Hz. For speech, this would be superior to GSM's baseline AMR which has 50Hz audio blocks. For music, this is a deal-breaker.

Remove the intermittant reading problems and the fundamental frequency is the length of the recording. So, a 16 bit linear PCM .WAV of 74 minutes has a fundamental frequency of approximately, 0.0002Hz. The same data burned as audio sectors on a Compact Disc only has a fundamental frequency of 43Hz. (I'll ignore the reverse of this process due to cooked sectors.)

So, it is trivial to retain low frequencies within a digital system. Just ensure that all data is present and correct. Furthermore, low frequencies require minimal storage and can be prioritized when streaming.

A related question is the minimum frequency response for a video codec. That's an equally revealing question.

High Quality Audio, Part 2

Posted by cafebabe on Friday July 07 2017, @05:56AM (#2477)
0 Comments
Software

(This is the 10th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Audio and video streaming benefits from stream priorities and three dimensional sound can be streamed in four channels but how can this be packed to avert network saturation while providing the best output when data fails to arrive in full?

The first technique is to de-correlate audio. This involves cross-correlation and auto-correlation. Cross-correlation has a strict hierarchy. Specifically, all of the audio channels directly or indirectly hinge upon a monophonic audio channel. It may be useful for three dimensional sound to be dependant upon two dimensional sound which is dependant stereophonic sound which is dependant upon monophonic sound. Likewise, it may be useful for 5.1 surround sound and 7.1 surround sound to be dependant upon the same stereophonic sound. Although this allows selective streaming and decoding within the constraints of available hardware, it creates a strict pecking-order when attempting to compress common information across multiple audio channels.

Cross-de-correlated data is then auto-de-correlated and the resulting audio channels are reduced to a few bits per sample per channel. In the most optimistic case, there will always be one bit of data per sample per channel. This applies regardless of sample quality. For muffled, low quality input, it won't be much higher. However, for high quality digital mastering, expect a residual of five bits per sample per channel. So, on average, it is possible to reduce WXYZ three dimensional Ambisonics to 20 bits per time-step. However, that's just an average. So, we cannot arrange this data into 20 priority levels where, for example, vertical resolution dips first.

Thankfully, we can split data into frequency bands before or after performing cross-de-correlation. By the previously mentioned geometric series, this adds a moderate overhead but allows low frequencies to be preserved even when bandwidth is grossly inadequate.

It also allows extremely low frequencies to be represented with ease.

High Quality Audio, Part 1

Posted by cafebabe on Friday July 07 2017, @04:31AM (#2476)
7 Comments
Software

(This is the ninth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

A well-known trick with streaming media is to give priority to audio. For many viewers, video can glitch and frame rate can reduce significantly if audio quality is maintained. Within this constraint, monophonic audio can provide some continuity even when surround sound or stereo sound fails. This requires audio to be encoded as a monophonic channel and a left-minus-right channel but MLP demonstrates that mono, stereo and other formats can be encoded together in this form and decoded as appropriate. How far does this technique go? Well, Ambisonics is the application of Laplace Spherical Harmonics to audio rather than chemistry, weather simulation or a multitude of other purposes.

After getting through all the fiddly stuff like cardiods, A-format, B-format, UHJ format, soundfield microphones, higher order Ambisonics or just what the blazes is Ambisonics? we get to the mother of all 3D sound formats and why people buy hugely expensive microphones, record ambient sounds and sell the recordings to virtual reality start-ups who apply trivial matrix rotation to obtain immersive sound.

Yup. That's it. Record directional sound. Convert it into one channel of omnidirectional sound and three channels of directional sound (left-minus-right, front-minus-back, top-minus-bottom). Apply sines and cosines as required or mix like a pro. The result is a four channel audio format which can be streamed as three dimensional sound, two dimensional sound, one dimensional sound or zero dimensional sound and mapped down to any arrangement of speakers.

Due to technical reasons a minimum of 12 speakers (and closer to 30 speakers) are required for full fidelity playback. This can be implemented as a matrix multiplication with four inputs and 30 outputs for each time-step of audio. The elements of the matrix can be pre-computed for each speaker's position, to attenuate recording volume and to cover differences among mis-matched speakers. (Heh, that's easy than buying 30 matched speakers.) At 44.1kHz (Compact Disc quality), 1.3 million multiplies per second are required. At 192kHz, almost six million multiplies per second are required for immersive three dimensional sound.

For downward compatibility, it may be useful to encode 5.1 surround sound, 7.1 surround sound with Ambisonics. Likewise, it may be useful to arrange speakers such that legacy 5.1 surround sound, 7.1 surround sound, 11.1 surround sound or 22.2 surround sound can be played without matrix mixing.

Using audio amplifiers, such as the popular PAM8403, it is possible to put 32×3W outputs in a 1U box. This is sufficiently loud for most domestic environments.

Test Data For Codecs

Posted by cafebabe on Friday July 07 2017, @12:30AM (#2475)
6 Comments
Software

Test data for audio codecs:-

  • Boston: More Than A Feeling - Tom Scholz invented a sound engineering technique where vocals, lead guitar and base guitar are each recorded as four separate sessions. Two recordings are played hard left and two recordings are played hard right. This creates a phased sound which "pops" in a stiking enough manner to form a band and tour.
  • Alice Cooper: Poison - Noise and high dynamic range.
  • Foo Fighters: This Is A Call - Track 1 of the first album has some of Dave Grohl's expert drumming. Cooler than the Phil Collins drumming used as test data for MLP.
  • Trailer for Step Up 4 - Urban in a very Disneyfied manner. However, has a tortuous mix of voice, music, sirens and rumble in one track. May be available with stereoscopic video.

Test data for video codecs:-

UDP Servers, Part 2

Posted by cafebabe on Thursday July 06 2017, @09:41PM (#2474)
0 Comments
Software

(This is the eighth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

How can the most data be sent to the most users using the least hardware and bandwidth? Minimize transient state. TCP has a large, hidden amount of state; typically more than 1MB per connection for bulk transfers. This hidden cost can be eliminated with ease but it introduces several other problems. Some of these problems are solved, some are unsolved.

The most obvious reason to use TCP is to obtain optimal PMTU utilization. Minimizing packet header overhead is a worthy goal but it doesn't work. Multi-path TCP or just an increasingly dynamic network makes optimal PMTU discovery a rapidly shifting task. Ignoring this, is it really worth stuffing payloads with the optimal number of bytes when all opportunity to multi-cast is discarded? That depends upon workload but it only takes a moderate number of scale-out cases to skew the general case. One such case is television.

Optimal PMTU utilization also fails when an ISP uses an unusual PMTU. PMTU discovery is significantly more challenging when IPv6 has no packet fragmentation and many IPv6 devices implement the minimum specification of a 1280 byte payload. I'd be inclined to ignore that but tunneling and inter-operability means the intersection rather than union of IPv4 and IPv6 features have to be considered. (It has been noted that IPv6 raises the specified minimum payload but triple-VLAN IPv4 over Ethernet has a larger MTU and is worse case in common use.)

UDP competes at a disadvantage within a POSIX environment. The total size of UDP buffers may be severely hobbled. Likewise, POSIX sendfile() is meaningless when applied to UDP. (This is the technique which allows lighttpd to serve thousands of static files from a single-threaded server. However, sendfile() only works with unencrypted connections. Netflix has an extension which allows SSL certificates to be shared with a FreeBSD kernel but the requirement to encrypt significantly erodes TCP's advantage.)

Some people have an intense dislike of UDP streaming quality but most experience occurred prior to BitTorrent or any Kodi plug-in which utilizes BitTorrent in real-time. No-one complains about about the reliability or accuracy of BitTorrent although several governments and corporations would love to permanently eliminate anything ressembling BitTorrent plus Kodi.

From techniques in common use, a multi-cast video client has several options when a packet is dropped and there is sufficient time for re-send:-

  • Wait for FEC to overcome minor losses.
  • Use a mono-cast back-channel to obtain missed pieces.
  • Exchange data with peers. This typically requires signing or some other fore-knowledge of trustworthy data.

When time for re-send is insufficient, there are further options:-

  • Establish stream priority. In decreasing priority: maintain captions, maintain monophonic audio, maintain stereophonic audio, maintain surround sound, maintain video key frames, maintain six-axis movement, maintain full video stream.
  • Switch to lower bandwidth stream.
  • Pause and allow buffering to occur. Each time this occurs, latency from live broadcast increases but it is adaptive until all re-sends are satisfied.
  • Display data with holes. Historically, this has been poor. However, this was prior to Dirac diff trees and other techniques.
  • Back-fill data. In this case, live-display is low quality but any recording of a stream is high quality.

For a practical example of low quality live-streaming with high quality recording, see FPV drones. In this scenario, remote control airplanes may broadcast low quality video. Video may be monochromatic and/or analog NTSC. Video may be stereoscopic and possibly steerable from an operator's headset. Several systems super-impose altitude, bearing, temperature, battery level and other information. With directional antennas, this is sufficient to fly 10km or more from a launch site. Meanwhile, 1920×1080p video is recorded on local storage. The low quality video is sufficient for precision flying while the high quality video can be astounding beautiful.

Anyhow, UDP video can be as live and crappy as a user chooses. Or it may be the optimal method for distributing cinema quality video. And a user may choose either case when viewing the same stream.

UDP Servers, Part 1

Posted by cafebabe on Thursday July 06 2017, @06:16AM (#2473)
10 Comments
Software

(This is the seventh of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Using UDP for streaming video seems idiotic and is mostly deprecated. However, it has one huge advantage. It works really well with a large number of concurrent users. Within reason, it is worthwhile to maximize this number and minimize cost. Network games can be quite intensive in this regard with SecondLife previously rumored to having two users per server and WorldOfWarcraft having 20 users per server. At the other extreme, Virtudyne (Part 1, Part 2, Part 3, Part 4) aimed to run productivity software with 20 million users per server. That failed spectacularly after US$200 million of investment.

Maintaining more than 10000 TCP connections requires a significant quantity of RAM; often more than 1MB per connection. A suitably structured UDP implementation doesn't have this overhead. For example, it is possible to serve 40 video streams from a single-threaded UDP server with a userspace of less than 1MB and kernel network buffers less than 4MB. All of the RAM previously allocated to TCP windowing can be re-allocated to disk caching. Even with a mono-casting implementation, there are efficiency gains when multiple users watch the same media. With stochastic caching techniques, it is easy for network bandwidth to exceed storage bandwidth.

There is another advantage with UDP. FEC can be sent speculatively or such that one resend may satisfy multiple clients who each lack differing fragments of data. It is also easy to make servers, caches and clients which are tolerant to bit errors. This is particularly important if a large quantity of data is cached in RAM for an extended period.

So, what is a suitable structure for a UDP server? Every request from a client should incur zero or one packets of response. Ideally, a server should respond to every packet. However, some communication is obviously corrupted, malformed or malicious and isn't worth the bandwidth to respond. Also, in a scheme where all communication state is held by a client and all communication is pumped by a client, resends are solely the responsibility of a client.

From experience, it is wrong to implement a singleton socket or a set of client sockets mutually exclusive with a set of server sockets. Ideally, library code should allow multiple server sockets and multiple client sockets to exist within userspace. This facilitates cache or filter implementation where one program is a client for upstream requests and a server for downstream requests.

"Streaming" Video Versus Streaming Video

Posted by cafebabe on Wednesday July 05 2017, @11:15PM (#2471)
5 Comments
Software

(This is the sixth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I've got to practical problems with resource naming and practical problems with networking. These problems interact badly with video over the Internet.

In the 1990s, video on the Internet was typically sent to users as lossy UDP. Apple's QuickTime3 had an option to optimize .mov files to aid this type of distribution. This functionality seems to be deprecated and we've devolved to the situation where "streaming" video involves mono-casting small chunks of video over HTTP/1.1 over TCP. There is a top-tier of companies which likes this arrangement. This includes Google, Apple, Microsoft, Netflix and a few parties who provide infrastructure, such as Akamai and CloudFlare. Many of them prefer the analytics which can be gained from mono-casting. To differing degrees, each knows where and when you've paused a video. Big Data provides the opportunity to discover why this occurs. Perhaps the content is violent, challenging or difficult to understand. Perhaps you had a regular appointment to be elsewhere. It should be understood that mono-casting over TCP provides the least amount of privacy for users.

TCP is completely incompatible with multi-cast and therefore chosing TCP excludes the option of sending the same packets to different users. The investment in infrastructure would be completely undermined if anyone could live-stream multi-cast video without centralized intermediaries running demographic profiling advert brokers on sanitized, government approved content.

Admittedly, UDP has a reputation for packet flood but TCP congestion control is wishful thinking which is actively undermined by Google, Microsoft and others. Specifically, RFC3390 specifies that unacknowledged data sent over TCP should be capped. In practice, the limit is about 4KB. However, Microsoft crap-floods 100KB over a fresh TCP connection. Google does similar. If you've ever heard an end-user say that YouTube is reliable or responsive, that's because YouTube's servers are deliberately mis-configured to shout over your other connections. Many of these top-tier companies are making a mockery of TCP congestion control.

Ignoring the packet floods, there are some fundamentals that cannot be ignored. Lossy video uses small packets. The scale of compression which can be attained within a IPv6 1280 byte packet is quite limited, especially when encryption headers are taken into account. The big guns of compression (lz4, lzo, lzma, Burrows-Wheeler transform) are still warming-up in the first 1MB. That dove-tails nicely with HTTP/1.1 but restricts UDP to old-school compression, such as LZ77, Shannon-Fano or arithmetic compression.

However, if we define video tiles in a suitable manner, incorporate degradation in the form of a Dirac codec style diff tree and make every video tile directly addressable then everything works for multiple users at different sites. The trick is to make video tiles of a suitable form. I'm glad that quadtree video has finally become mainstream with HEVC because this demonstrates that practical savings which can be made by amortizing co-ordinates. Further savings can be made via the Barnsley's Collage theorem. The net result is that disparate tile types can be butted together and the seams will be no less awful than JPEG macro-blocks.

Unfortunately, if we wish to be fully and practically compatible with embedded IPv6 devices, we have a practical UDP payload of 1024 bytes or 8192 bits. This determines maximum video quality when using different tile sizes:-

  • 32×32 pixel blocks permits 8 bits per pixel.
  • 64×64 pixel blocks permits 2 bits per pixel.
  • 128×128 pixel blocks permits 0.5 bits per pixel.

There is a hard upper-bound on payload size but are these figures good or bad? Well, H.264 1920×1080p at 30 frames per second may encode from 4GB per hour to 6GB per hour. That's 224 billion pixels and an upper-bound of 48 billion bits - or about 0.23 bits per pixel.

So, what type of tiles would be encoded in the quadtree? Well, enabling different subsets of tiles would provide a codec which ranged from symmetric lossless to highly asymmetric lossy. At a minimum, an H.261 style DCT is essential. On its own, it would provide symmetric lossy encoding which matches MJPEG for quality while only incurring a small overhead for (rather redundant) quadtree encoding. It would also be useful to add texture compression and some rather basic RLE. This would be particularly useful for desktop windowing because it allows, for example, word-processing and web browsing to be rendered accurately and concisely.

What is the maximum overhead of quadtree encoding? Well, for every four leaves, there is one branch and for every four branches, there is another branch. By the geometric series, the ratio of branches to leaves is 1:3 - and branches are always a single byte prior to compression. So, quadtree overhead is small.

So, we've established that it is possible to make a video system which is:-

  • Lossy or lossless.
  • Provides symmetric or asymmetric encode/decode time.
  • Provides low-latency encoding modes.
  • Works with directly addressable video tiles of 64×64 pixels or larger.
  • Works with embedded devices which implement a strict 1280 byte packet limit.
  • Works with 70% round-trip packet loss.
  • Allows frame rate degradation.
  • Can be configured to incur worst-case storage or bandwidth cost which is only marginally worse than MJPEG.
  • May be suitable as transport for a windowing system.
  • Allows content to be viewed on an unlimited number of devices and/or by an unlimited number of users at any number of sites.
  • Allows implementation of video walls of arbitrary size and resolution.

It won't match the bit-rate of H.264, WebM or HEVC but it works over a wide range of scenarios where these would completely fail.

My 4th of July Coup

Posted by The Mighty Buzzard on Wednesday July 05 2017, @01:52PM (#2469)
20 Comments
Soylent

In case anyone's been wondering why a hefty plurality if not an outright majority of the stories pushed over the weekend and the first couple days of the week came from yours truly, it's because I hopped on my cavalry bear, rode all over the country, and beat all the Editors except martyb (who I saved for last and only had to threaten into submission) upside the head with a double-barrel chainsaw.

Or it's because I saw there was pretty much nothing except partisan hack jobs and bloody stupid garbage that I sincerely hope the Eds never publish in the submission queue and quickly subbed everything remotely interesting that I found in my feed reader.

Believe whichever amuses you the most.

Benefits Of Internet Protocol 6 Greatly Over-Stated

Posted by cafebabe on Wednesday July 05 2017, @02:52AM (#2468)
13 Comments
Software

(This is the fifth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

What are the properties usually associated with Internet Protocol 6? Multi-cast? A huge address space? A cleaner implementation with more packet throughput? Whatever.

Multi-Cast

Internet Protocol 4 addresses from 224.0.0.0 to 239.255.255.255 are for multi-cast, as defined in RFC1112. Unfortunately, multi-cast is incompatible with TCP, so that's 2^26 Internet Protocol 4 addresses and 2^120 Internet Protocol 6 addresses which don't work with YouTube or QuickTime.

Address Space

Internet Protocol 6 has 128 bit addresses. That's more addresses than atoms in the visible universe. However, there are edge cases where that's insufficient. Internet Protocol 4 has 32 bit addresses (by default) and that was considered vast when it was devised. That was especially true when total human population was less than 2^32 people. Superficially, it was possible to give every person a network address.

Several address extension schemes have been devised. The best is RFC1365 which uses option blocks to optionally extend source and destination fields in a manner which is downwardly compatible. So, what size is an Internet Protocol 4 address? 32 or more bits, as defined by RFC1365.

Header Size

Internet Protocol 4 is often described as having a 20 byte (or larger) header while Internet Protocol 6 is often described as having a header which is exactly 40 bytes. This is false. IPv6 has option blocks just like IPv4 and therefore both have variable length headers. The difference is that IPv6 headers are usually 20 bytes larger.

Packet Size

IPv4 typically supports a PMTU of 4KB or more. Admittedly, there are no guarantees but Ethernet without packet fragmentation provides about 1500 bytes. With PPPoA or PPPoE over AAL5 over ATM, 9KB payloads only fragment over the last hop. This is ideal for video delivery. IPv6 only guarantees 1280 bytes. How common is this? Numerous variants of micro-controller networking only support 1280 buffers. This is especially true for IPv6 over IEEE802.15.4 implementations. This is especially bad for video.

Packet Fragmentation

IPv6 has no packet fragmentation. IPv6 packets which exceed MTU always disappear.

Packet Throughput

Compared to IPv4, IPv6 generally has longer headers, longer addresses and shorter payloads. On this basis, how would you expect packet throughput of IPv6 to match or exceed IPv4?

Summary

The introduction of IPv6 provides no particular benefit to end-users. IPv6 is detrimental payload size and this is particularly detrimental to video delivery.

Problems With Names, Part 3

Posted by cafebabe on Wednesday July 05 2017, @01:02AM (#2467)
0 Comments
Software

(This is the fourth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Via a fairly arbitrary process, I've got to a stage of explaining pragmatic problems with naming stuff in a computer.

Many naming schemes require a name resolution process. The process used by DNS is overly complicated. Although it would be easy to poke fun at systemd's repeated failures in this field, djbdns has been attacked with cache poisoning and steps have been taken to mitigate timing attacks by implementing UDP port randomization. An observation which could be made from djb's similar syncookies is that it is insufficient to implement baseline TCP or DNS because the Internet has become too hostile.

I investigated the implementation of Project Xanadu transclusion using DNS. The use of UDP to assemble small fragments of data in a fault-tolerant manner has obvious appeal. DNS's maximum fragment length of 255 bytes is unfortunate but can be overcome. However, popular DNS implementations, such as BIND are don't cache unknown response types. This means A records, TXT records or suchlike would have to be overloaded. Ideally, this should be handled carefully due to compressed encoding workarounds and record permutation compatible with the POSIX legacy singleton record interface.

It is possible to ignore these pleasantries and transport raw data as A records but this has two problems. Firstly, to obtain raw records, the local resolving has to be located without the benefit of the host's DHCP or suchlike. Secondly, administrators baulk at volume of data transported via their DNS infrastructure - and rightly so because this doesn't degrade gracefully. One user's browsing would be sufficient to cause intermittant or sustained host resolution failure within a local network. Video streaming over DNS is feasible to implement but would be anti-social to anyone sharing upstream infrastructure.

An erroneous assumption made by Project Xanadu is that multi-media can be handled almost as a corollary of handling text. I presume the logic is that multi-media is traditionally serialized and therefore can be subjected to similar constraints as paragraphs of text. However, I believe this is backward. If multi-media can be handled as three dimensional data (horizontal, vertical, frame number) or two dimensional data (audio channel, position) then one dimensional data (text) is the corollary.

So, an outline of requirements is as follows:-

  • Simplified DNS is desirable.
  • Traffic priority should be lower than legacy DNS implementation.
  • Should allow server fail-over.
  • Should allow 8 bit clean queries.
  • Should allow 8 bit clean responses.
  • Should have a compact representation.
  • Should provide authentication.
  • Should provide encryption.
  • Should provide server delegation.
  • Should provide a correct implementation of cache invalidation.
  • Data should be served as UDP or similar.
  • Should be optimized for small payloads but allow arbitrary volumes of data.
  • Should permit addressing of one frame of video.
  • If possible, should permit addressing of a range of video frames.

Although this looks like a system favoring a hierarchy of servers within one domain of trust, it remains possible to implement a federation of server hierarchies.