Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


High Quality Audio, Part 5

Posted by cafebabe on Saturday July 08 2017, @03:25AM (#2481)
4 Comments
Software

(This is the 13th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I sent similar text to a friend:-

It occurred to me that you've got no idea what I've been doing. Essentially, I'm going up the stack to increase value. The merit of doing this was demonstrated very succinctly by a friend who made US$800 per month on EBay by selling 3D printers and 3D printer parts. (Apparently, people pay US$50 for a print head which consists of a bolt with a hole through its length and a coil of wire which acts as a heating element.) My friend explained principles which could be applied widely and they seem to work. Specifically, adding another step in the chain often doubles value.

When this principle is applied to network protocols, the obvious move is to have content which can be delivered over a network. Even if this strategy fails, the content has value and alternative methods of distribution can be found. Following this principle, I suggested the development of content. I wish that I had emphasised this much more. Since suggesting this, companies such as Amazon have:-

  1. Formed streaming video divisions.
  2. Developed content in parallel with distribution systems.
  3. Gained subscribers for content.
  4. Obtained awards for content.

Indeed, it has been noted that traditional US broadcasters received no awards at the 2015 Golden Globes and this trend may continue.

Streaming video may be a saturated market but streaming audio has been neglected. With bandwidth sufficient for competitive streaming video, it is now possible to stream high-quality, 24 bit, surround-sound audio. Indeed, from empirical research, it appears that audio can be streamed over a connection with 70% packet loss and still retain more quality than a RealAudio or Skype stream with 0% packet loss.

From attempts to replicate the work of others, I've found a method to split audio into perceptual bands. If particular bands are retrieved with priority, it is possible to obtain approximately half of the perceptual quality in 5% of the bandwidth. The technique uses relatively little processing power; to the extent that it is possible to encode or decode CD quality audio at 500KB/s on an eight year old, single core laptop.

The technique uses the principle of sigma-delta coding where it is not possible to represent an unchanging level of sound. This limitation can be mitigated by having a hierarchy of deltas. (And where this fails, we follow Meridian Lossless Packing and provide a channel for residual data.) Ordinarily, most people would choose a binary hierarchy but this leaves us two techniques deep and still encountering significant technical problems. Specifically, a binary hierarchy of sigma-delta encodings practically doubles the size of the encoding and may increase the required processing power by a factor of 40 or more.

A consideration of buffers allows other hierarchical schemes to be considered. The buffer for encoding and decoding is w*x^y samples where w is always a multiple of 8 (to allow encodings to always be represented with byte alignment). After rejecting x=1 (the trivial implementation of sigma-delta encoding) and x=2 (binary hierarchy), other values were investigated. x=4, x=8 and x=9 resolve to other cases. The most promising case is x=3 which provides a good balance between choice of buffer size, minimum frequency response, packet requests and re-requests, perceptual quality in the event of packet loss, encoding overhead and processing power requirements.

Unlike x=5 or x=7, x=3 also provides the most bias for arithmetic compression. Given that encoding is represented as 1:3 hierarchies of differences, an approximation at one level often creates three opposing approximations at the next level. Over one cascade, zero, one, two or three increases correspond with three, two, one or zero decreases and, in aggregate, a bias at one tier dovetails with an opposing bias in the preceding tier to the extent that arithmetic compression of 20% can be expected with real-world 16 bit audio data.

x=3 also provides a compact representation for URLs when requesting fragments of data from stateless servers. Many representations are functionally equivalent. One particularly graphic representation [not enclosed] shows how tiers of ternary data may be represented in one byte of a URL. Although the representation appears sparse, approximately half of the possible representations are used and therefore only one bit per byte is wasted. An alternative representation is pairs of bits, aa bb cc dd, ee ff gg hh, ii jj kk ll where each pair may be 01, 10 or 11 when traversing down the tree and 00 is a placeholder when the desired node has been reached. This creates the constraint that sequences of 00 must be contiguous and stem from one end. However, this also allows URLs to be abbreviated to reduce bandwidth. A further representation provides three sub tiers per byte rather than four but allows logging in printable ASCII.

The BBC's Dirac video codec shows that it is possible to encode frames of video in an analogous manner. Specifically, frames of video can be encoded as a tree of differences. Trees may be binary trees, ternary trees or other shapes.

Overall, this follows a strategy in which:-

  1. Users have a compelling reason to use a system.
  2. Value is obtained by folding applications into a URL-space rather than fitting code to legacy constraints.
  3. Servers have low computational and interactive requirements.

High Quality Audio, Part 4

Posted by cafebabe on Friday July 07 2017, @11:33PM (#2480)
2 Comments
Software

Continuing from previous article, outline specification sent to a de-correlation expert:-

A system exists for transferring data. This system allows data to be transferred reliably outside of the parameters of TCP/IP with Window Scaling and DSACK. Development of the system began as an independent project but has subsequently been funded by [redacted]. At one point, the system was being developed by five programmers, one mathematician, one graphic designer and two administrative staff.

Regardless, the system does not process data as it is received but it is desirable to add such functionality as a library. This capability has been identified as a financially viable market. (See Janus Friis and Niklas Zennström's development of Kazaa, Skype, Joost and other ventures.) Furthermore, the market can be segmented into real-time streaming of low-quality audio and delayed delivery of high quality audio. In each case, it is possible to exceed the perceptual quality of Compact Discs and Long Playing Records respectively.

Many systems provide streaming below this quality (FM radio, digital radio, NICAM, MP3 delivery, RealAudio) but the ability to provide higher quality audio will improve as bandwidth increases. Even if this market is not broadly viable, it remains a lucrative niche for audiophiles with large disposable income. Even if this market is unviable, there remain applications for higher quality audio or a more compact representation of lossless audio. This may be with or without an accompanying video codec.

The benchmark for sound quality is Compact Disc "Red Book" audio; introduced in 1980. This specifies 2KB sectors with Reed-Solomon regenerative checksums. Each sector provides 1,024 16 bit PCM samples for one of two audio channels at a sample rate of 44.1kHz. Techniques to smooth the 1% of dropped sectors create a frequency floor of approximately 43Hz irrespective of data encoded on a disc. This creates a notable absence of bass frequencies and a perceptual comb of frequencies which are integer multiples of 43Hz.

Although modern audio codecs have good perceptual response and raise the nominal sampling frequency from 44.1kHz to 48kHz or higher, the encoded datarate of MP3, AAC and similar codecs is often less than 25KB/s over two channels. This is opposed to 176KB/s for CD audio.

For an extended period, one of the system developers has been aware of efforts by a company in Cambridge, England which may be regarded as a direct competitor. Meridian first came to our attention around 1982 when the development of [MLP] Meridian Lossless Packing was demonstrated in a particularly rigged manner on the BBC [British Broadcasting Corporation]'s programme Tomorrow's World. The demo involved the chassis of a CD player and two digit LED display. When playing a conventional Compact Disc, the display read "16" to indicate 16 bit PCM [Pulse Code Modulation] data. When playing a MLP Disc, the display read "4" to indicate the bitrate of MLP. However, a cursory investigation of MLP, such as http://www.meridian-audio.com/w_paper/mlp_jap_new.PDF via http://en.wikipedia.org/wiki/Meridian_Lossless_Packing, reveals that the lossless encoding technique removes an *average* of 11 bits of data per sample - leaving approximately 5 bits per sample. Furthermore, the switch from 16 bit to 4 bit was performed during a cutaway shot and therefore the BBC may have been complicit in this rigged demo. Finally, any attempt to exhibit the perceptual quality of MLP over a television broadcast was futile because the channel capacity of television audio is lower than Compact Disc quality.

Further involvement with Meridian came in the form of a declined interview after a long discussion about digital speaker synchronisation and the computational requirements of wireless digital headphones.

One of the system developers is also aware of SACD [Super Audio Compact Disc], an audio encoding standard devised by Professor Jamie Angus and included in the first and second revision of the Sony PlayStation 3. Attempts to re-implement this technique in a digital system required an inordinate amount of bandwidth. Regardless, experiments led to a novel encoding technique and a greater appreciation of analog circuitry.

Regardless, Meridian's technique for reducing datarate is an ideal template for further development. Although care has to be taken to avoid known patents. The basic MLP specification is that:-

  • An encoding technique should allow multiple, arbitrary 24 bit streams to be interleaved without change to the bit sequences.
  • An encoding technique may make assumptions about the presence of return-to-zero data and therefore compression may only occur when suitable audio data is presented.
  • Significant compression may be obtained by observing correlations between channels.
  • Finite Impulse Response filters can be used to reduce the volume of data to be encoded.
  • Lossy techniques and predictive systems may be deployed within an encoder and a decoder. A residual channel of data makes lossy, deterministic techniques into a lossless system.

Further requirements for streaming are as follows:-

  • It is desirable to split streams into components so that they may be prioritised to maximise perceptual quality.
  • Therefore, it is desirable to extract signal from streams prior to the application of techniques such as FIR.

In addition to streaming and otherwise exceeding the MLP specification, the following is desirable:-

  • It should be possible to decode sound on legacy hardware. An embedded device should be able to decode one channel of sound and output 6 bit quality or better. A legacy desktop should be able to decode two or more channels of sound at 8 bit quality.
  • It should also be possible to encode sound of limited quality on legacy hardware.
  • When data is missing, it should be possible to re-construct sound with frequencies below 43Hz. Ideally, it should be an option to encode and re-construct sound with frequencies which are multiples of 1nHz or less. (At data rates exceeding 44.1kHz, this may require block sizes exceeding 44 billion samples. For 24 bit samples, this requires 120GB or more per channel. It may also require handling of RMS [Root Mean Square] errors with a magnitude of 82 bits or more.)
  • Entropy encoding may be more effective if there is a bias in the symbol frequencies. Therefore, blocks may be a multiple of x^y where x is odd rather than even.

Finally, all techniques are valid. A direct clone of SACD or MLP is not desirable due to licensing issues. However, simplified or novel techniques around SACD and MLP are highly desirable.

Further mention of Meridian, sigma-delta encoding and related topics.

High Quality Audio, Part 3

Posted by cafebabe on Friday July 07 2017, @10:41PM (#2478)
4 Comments
Software

What is the minimum frequency response which should be represented by an audio codec? Well, what is the best analog or digital system and is it worthwhile to approach, match or exceed its specification?

Analog record players have an exceptionally good minimum frequency response. Technically, the minimum frequency response is determined by the duration of the recording. So, a 45 minute long-play record has a fundamental frequency of 1/2700Hz - which is about 0.0004Hz. Does this make any practical difference? Most definitely yes. People lampoon analog purists or stated that LP and CD are directly equivalent. Some of the lampooning is justisfied but the core argument is true. Most implementations of Compact Disc Audio don't achieve useful functionality.

Many Compact Disc players boldly state "1-Bit DAC Oversampling" or suchlike. The 1-Bit DAC is simplistically brilliant and variants are used in many contexts. However, oversampling is the best of a bad bodge. The optics of a Compact Disc are variable and, even with FEC in the form of Reed-Solomon encoding, about 1% of audio sectors don't get read. What happens in this case? Very early CD players would just output nothing. This technique was superceded with duplicate output. (Early Japanese CD players had an impressive amount of analog circuitry and no other facility to stretch audio.)

Eventually, manufacturers advanced to oversampling techniques. In the event that data cannot be obtained optically from disc, gaps are smoothed with a re-construction from available data. Unfortunately, there is a problem with this technique. Nothing below 43Hz can be re-constructed. 2KB audio sectors have 1024×16 bit samples and samples are played at exactly 44.1kHz. So, audio sectors are played at the rate of approximately 43Hz. However, any technique which continues audio waves has a fundamental frequency of 43Hz. Given that drop-outs occur with some correlation at a rate of 1%, this disrupts any reproduction of frequencies below 43Hz. For speech, this would be superior to GSM's baseline AMR which has 50Hz audio blocks. For music, this is a deal-breaker.

Remove the intermittant reading problems and the fundamental frequency is the length of the recording. So, a 16 bit linear PCM .WAV of 74 minutes has a fundamental frequency of approximately, 0.0002Hz. The same data burned as audio sectors on a Compact Disc only has a fundamental frequency of 43Hz. (I'll ignore the reverse of this process due to cooked sectors.)

So, it is trivial to retain low frequencies within a digital system. Just ensure that all data is present and correct. Furthermore, low frequencies require minimal storage and can be prioritized when streaming.

A related question is the minimum frequency response for a video codec. That's an equally revealing question.

High Quality Audio, Part 2

Posted by cafebabe on Friday July 07 2017, @05:56AM (#2477)
0 Comments
Software

(This is the 10th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Audio and video streaming benefits from stream priorities and three dimensional sound can be streamed in four channels but how can this be packed to avert network saturation while providing the best output when data fails to arrive in full?

The first technique is to de-correlate audio. This involves cross-correlation and auto-correlation. Cross-correlation has a strict hierarchy. Specifically, all of the audio channels directly or indirectly hinge upon a monophonic audio channel. It may be useful for three dimensional sound to be dependant upon two dimensional sound which is dependant stereophonic sound which is dependant upon monophonic sound. Likewise, it may be useful for 5.1 surround sound and 7.1 surround sound to be dependant upon the same stereophonic sound. Although this allows selective streaming and decoding within the constraints of available hardware, it creates a strict pecking-order when attempting to compress common information across multiple audio channels.

Cross-de-correlated data is then auto-de-correlated and the resulting audio channels are reduced to a few bits per sample per channel. In the most optimistic case, there will always be one bit of data per sample per channel. This applies regardless of sample quality. For muffled, low quality input, it won't be much higher. However, for high quality digital mastering, expect a residual of five bits per sample per channel. So, on average, it is possible to reduce WXYZ three dimensional Ambisonics to 20 bits per time-step. However, that's just an average. So, we cannot arrange this data into 20 priority levels where, for example, vertical resolution dips first.

Thankfully, we can split data into frequency bands before or after performing cross-de-correlation. By the previously mentioned geometric series, this adds a moderate overhead but allows low frequencies to be preserved even when bandwidth is grossly inadequate.

It also allows extremely low frequencies to be represented with ease.

High Quality Audio, Part 1

Posted by cafebabe on Friday July 07 2017, @04:31AM (#2476)
7 Comments
Software

(This is the ninth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

A well-known trick with streaming media is to give priority to audio. For many viewers, video can glitch and frame rate can reduce significantly if audio quality is maintained. Within this constraint, monophonic audio can provide some continuity even when surround sound or stereo sound fails. This requires audio to be encoded as a monophonic channel and a left-minus-right channel but MLP demonstrates that mono, stereo and other formats can be encoded together in this form and decoded as appropriate. How far does this technique go? Well, Ambisonics is the application of Laplace Spherical Harmonics to audio rather than chemistry, weather simulation or a multitude of other purposes.

After getting through all the fiddly stuff like cardiods, A-format, B-format, UHJ format, soundfield microphones, higher order Ambisonics or just what the blazes is Ambisonics? we get to the mother of all 3D sound formats and why people buy hugely expensive microphones, record ambient sounds and sell the recordings to virtual reality start-ups who apply trivial matrix rotation to obtain immersive sound.

Yup. That's it. Record directional sound. Convert it into one channel of omnidirectional sound and three channels of directional sound (left-minus-right, front-minus-back, top-minus-bottom). Apply sines and cosines as required or mix like a pro. The result is a four channel audio format which can be streamed as three dimensional sound, two dimensional sound, one dimensional sound or zero dimensional sound and mapped down to any arrangement of speakers.

Due to technical reasons a minimum of 12 speakers (and closer to 30 speakers) are required for full fidelity playback. This can be implemented as a matrix multiplication with four inputs and 30 outputs for each time-step of audio. The elements of the matrix can be pre-computed for each speaker's position, to attenuate recording volume and to cover differences among mis-matched speakers. (Heh, that's easy than buying 30 matched speakers.) At 44.1kHz (Compact Disc quality), 1.3 million multiplies per second are required. At 192kHz, almost six million multiplies per second are required for immersive three dimensional sound.

For downward compatibility, it may be useful to encode 5.1 surround sound, 7.1 surround sound with Ambisonics. Likewise, it may be useful to arrange speakers such that legacy 5.1 surround sound, 7.1 surround sound, 11.1 surround sound or 22.2 surround sound can be played without matrix mixing.

Using audio amplifiers, such as the popular PAM8403, it is possible to put 32×3W outputs in a 1U box. This is sufficiently loud for most domestic environments.

Test Data For Codecs

Posted by cafebabe on Friday July 07 2017, @12:30AM (#2475)
6 Comments
Software

Test data for audio codecs:-

  • Boston: More Than A Feeling - Tom Scholz invented a sound engineering technique where vocals, lead guitar and base guitar are each recorded as four separate sessions. Two recordings are played hard left and two recordings are played hard right. This creates a phased sound which "pops" in a stiking enough manner to form a band and tour.
  • Alice Cooper: Poison - Noise and high dynamic range.
  • Foo Fighters: This Is A Call - Track 1 of the first album has some of Dave Grohl's expert drumming. Cooler than the Phil Collins drumming used as test data for MLP.
  • Trailer for Step Up 4 - Urban in a very Disneyfied manner. However, has a tortuous mix of voice, music, sirens and rumble in one track. May be available with stereoscopic video.

Test data for video codecs:-

UDP Servers, Part 2

Posted by cafebabe on Thursday July 06 2017, @09:41PM (#2474)
0 Comments
Software

(This is the eighth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

How can the most data be sent to the most users using the least hardware and bandwidth? Minimize transient state. TCP has a large, hidden amount of state; typically more than 1MB per connection for bulk transfers. This hidden cost can be eliminated with ease but it introduces several other problems. Some of these problems are solved, some are unsolved.

The most obvious reason to use TCP is to obtain optimal PMTU utilization. Minimizing packet header overhead is a worthy goal but it doesn't work. Multi-path TCP or just an increasingly dynamic network makes optimal PMTU discovery a rapidly shifting task. Ignoring this, is it really worth stuffing payloads with the optimal number of bytes when all opportunity to multi-cast is discarded? That depends upon workload but it only takes a moderate number of scale-out cases to skew the general case. One such case is television.

Optimal PMTU utilization also fails when an ISP uses an unusual PMTU. PMTU discovery is significantly more challenging when IPv6 has no packet fragmentation and many IPv6 devices implement the minimum specification of a 1280 byte payload. I'd be inclined to ignore that but tunneling and inter-operability means the intersection rather than union of IPv4 and IPv6 features have to be considered. (It has been noted that IPv6 raises the specified minimum payload but triple-VLAN IPv4 over Ethernet has a larger MTU and is worse case in common use.)

UDP competes at a disadvantage within a POSIX environment. The total size of UDP buffers may be severely hobbled. Likewise, POSIX sendfile() is meaningless when applied to UDP. (This is the technique which allows lighttpd to serve thousands of static files from a single-threaded server. However, sendfile() only works with unencrypted connections. Netflix has an extension which allows SSL certificates to be shared with a FreeBSD kernel but the requirement to encrypt significantly erodes TCP's advantage.)

Some people have an intense dislike of UDP streaming quality but most experience occurred prior to BitTorrent or any Kodi plug-in which utilizes BitTorrent in real-time. No-one complains about about the reliability or accuracy of BitTorrent although several governments and corporations would love to permanently eliminate anything ressembling BitTorrent plus Kodi.

From techniques in common use, a multi-cast video client has several options when a packet is dropped and there is sufficient time for re-send:-

  • Wait for FEC to overcome minor losses.
  • Use a mono-cast back-channel to obtain missed pieces.
  • Exchange data with peers. This typically requires signing or some other fore-knowledge of trustworthy data.

When time for re-send is insufficient, there are further options:-

  • Establish stream priority. In decreasing priority: maintain captions, maintain monophonic audio, maintain stereophonic audio, maintain surround sound, maintain video key frames, maintain six-axis movement, maintain full video stream.
  • Switch to lower bandwidth stream.
  • Pause and allow buffering to occur. Each time this occurs, latency from live broadcast increases but it is adaptive until all re-sends are satisfied.
  • Display data with holes. Historically, this has been poor. However, this was prior to Dirac diff trees and other techniques.
  • Back-fill data. In this case, live-display is low quality but any recording of a stream is high quality.

For a practical example of low quality live-streaming with high quality recording, see FPV drones. In this scenario, remote control airplanes may broadcast low quality video. Video may be monochromatic and/or analog NTSC. Video may be stereoscopic and possibly steerable from an operator's headset. Several systems super-impose altitude, bearing, temperature, battery level and other information. With directional antennas, this is sufficient to fly 10km or more from a launch site. Meanwhile, 1920×1080p video is recorded on local storage. The low quality video is sufficient for precision flying while the high quality video can be astounding beautiful.

Anyhow, UDP video can be as live and crappy as a user chooses. Or it may be the optimal method for distributing cinema quality video. And a user may choose either case when viewing the same stream.

UDP Servers, Part 1

Posted by cafebabe on Thursday July 06 2017, @06:16AM (#2473)
10 Comments
Software

(This is the seventh of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Using UDP for streaming video seems idiotic and is mostly deprecated. However, it has one huge advantage. It works really well with a large number of concurrent users. Within reason, it is worthwhile to maximize this number and minimize cost. Network games can be quite intensive in this regard with SecondLife previously rumored to having two users per server and WorldOfWarcraft having 20 users per server. At the other extreme, Virtudyne (Part 1, Part 2, Part 3, Part 4) aimed to run productivity software with 20 million users per server. That failed spectacularly after US$200 million of investment.

Maintaining more than 10000 TCP connections requires a significant quantity of RAM; often more than 1MB per connection. A suitably structured UDP implementation doesn't have this overhead. For example, it is possible to serve 40 video streams from a single-threaded UDP server with a userspace of less than 1MB and kernel network buffers less than 4MB. All of the RAM previously allocated to TCP windowing can be re-allocated to disk caching. Even with a mono-casting implementation, there are efficiency gains when multiple users watch the same media. With stochastic caching techniques, it is easy for network bandwidth to exceed storage bandwidth.

There is another advantage with UDP. FEC can be sent speculatively or such that one resend may satisfy multiple clients who each lack differing fragments of data. It is also easy to make servers, caches and clients which are tolerant to bit errors. This is particularly important if a large quantity of data is cached in RAM for an extended period.

So, what is a suitable structure for a UDP server? Every request from a client should incur zero or one packets of response. Ideally, a server should respond to every packet. However, some communication is obviously corrupted, malformed or malicious and isn't worth the bandwidth to respond. Also, in a scheme where all communication state is held by a client and all communication is pumped by a client, resends are solely the responsibility of a client.

From experience, it is wrong to implement a singleton socket or a set of client sockets mutually exclusive with a set of server sockets. Ideally, library code should allow multiple server sockets and multiple client sockets to exist within userspace. This facilitates cache or filter implementation where one program is a client for upstream requests and a server for downstream requests.

"Streaming" Video Versus Streaming Video

Posted by cafebabe on Wednesday July 05 2017, @11:15PM (#2471)
5 Comments
Software

(This is the sixth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I've got to practical problems with resource naming and practical problems with networking. These problems interact badly with video over the Internet.

In the 1990s, video on the Internet was typically sent to users as lossy UDP. Apple's QuickTime3 had an option to optimize .mov files to aid this type of distribution. This functionality seems to be deprecated and we've devolved to the situation where "streaming" video involves mono-casting small chunks of video over HTTP/1.1 over TCP. There is a top-tier of companies which likes this arrangement. This includes Google, Apple, Microsoft, Netflix and a few parties who provide infrastructure, such as Akamai and CloudFlare. Many of them prefer the analytics which can be gained from mono-casting. To differing degrees, each knows where and when you've paused a video. Big Data provides the opportunity to discover why this occurs. Perhaps the content is violent, challenging or difficult to understand. Perhaps you had a regular appointment to be elsewhere. It should be understood that mono-casting over TCP provides the least amount of privacy for users.

TCP is completely incompatible with multi-cast and therefore chosing TCP excludes the option of sending the same packets to different users. The investment in infrastructure would be completely undermined if anyone could live-stream multi-cast video without centralized intermediaries running demographic profiling advert brokers on sanitized, government approved content.

Admittedly, UDP has a reputation for packet flood but TCP congestion control is wishful thinking which is actively undermined by Google, Microsoft and others. Specifically, RFC3390 specifies that unacknowledged data sent over TCP should be capped. In practice, the limit is about 4KB. However, Microsoft crap-floods 100KB over a fresh TCP connection. Google does similar. If you've ever heard an end-user say that YouTube is reliable or responsive, that's because YouTube's servers are deliberately mis-configured to shout over your other connections. Many of these top-tier companies are making a mockery of TCP congestion control.

Ignoring the packet floods, there are some fundamentals that cannot be ignored. Lossy video uses small packets. The scale of compression which can be attained within a IPv6 1280 byte packet is quite limited, especially when encryption headers are taken into account. The big guns of compression (lz4, lzo, lzma, Burrows-Wheeler transform) are still warming-up in the first 1MB. That dove-tails nicely with HTTP/1.1 but restricts UDP to old-school compression, such as LZ77, Shannon-Fano or arithmetic compression.

However, if we define video tiles in a suitable manner, incorporate degradation in the form of a Dirac codec style diff tree and make every video tile directly addressable then everything works for multiple users at different sites. The trick is to make video tiles of a suitable form. I'm glad that quadtree video has finally become mainstream with HEVC because this demonstrates that practical savings which can be made by amortizing co-ordinates. Further savings can be made via the Barnsley's Collage theorem. The net result is that disparate tile types can be butted together and the seams will be no less awful than JPEG macro-blocks.

Unfortunately, if we wish to be fully and practically compatible with embedded IPv6 devices, we have a practical UDP payload of 1024 bytes or 8192 bits. This determines maximum video quality when using different tile sizes:-

  • 32×32 pixel blocks permits 8 bits per pixel.
  • 64×64 pixel blocks permits 2 bits per pixel.
  • 128×128 pixel blocks permits 0.5 bits per pixel.

There is a hard upper-bound on payload size but are these figures good or bad? Well, H.264 1920×1080p at 30 frames per second may encode from 4GB per hour to 6GB per hour. That's 224 billion pixels and an upper-bound of 48 billion bits - or about 0.23 bits per pixel.

So, what type of tiles would be encoded in the quadtree? Well, enabling different subsets of tiles would provide a codec which ranged from symmetric lossless to highly asymmetric lossy. At a minimum, an H.261 style DCT is essential. On its own, it would provide symmetric lossy encoding which matches MJPEG for quality while only incurring a small overhead for (rather redundant) quadtree encoding. It would also be useful to add texture compression and some rather basic RLE. This would be particularly useful for desktop windowing because it allows, for example, word-processing and web browsing to be rendered accurately and concisely.

What is the maximum overhead of quadtree encoding? Well, for every four leaves, there is one branch and for every four branches, there is another branch. By the geometric series, the ratio of branches to leaves is 1:3 - and branches are always a single byte prior to compression. So, quadtree overhead is small.

So, we've established that it is possible to make a video system which is:-

  • Lossy or lossless.
  • Provides symmetric or asymmetric encode/decode time.
  • Provides low-latency encoding modes.
  • Works with directly addressable video tiles of 64×64 pixels or larger.
  • Works with embedded devices which implement a strict 1280 byte packet limit.
  • Works with 70% round-trip packet loss.
  • Allows frame rate degradation.
  • Can be configured to incur worst-case storage or bandwidth cost which is only marginally worse than MJPEG.
  • May be suitable as transport for a windowing system.
  • Allows content to be viewed on an unlimited number of devices and/or by an unlimited number of users at any number of sites.
  • Allows implementation of video walls of arbitrary size and resolution.

It won't match the bit-rate of H.264, WebM or HEVC but it works over a wide range of scenarios where these would completely fail.

Benefits Of Internet Protocol 6 Greatly Over-Stated

Posted by cafebabe on Wednesday July 05 2017, @02:52AM (#2468)
13 Comments
Software

(This is the fifth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

What are the properties usually associated with Internet Protocol 6? Multi-cast? A huge address space? A cleaner implementation with more packet throughput? Whatever.

Multi-Cast

Internet Protocol 4 addresses from 224.0.0.0 to 239.255.255.255 are for multi-cast, as defined in RFC1112. Unfortunately, multi-cast is incompatible with TCP, so that's 2^26 Internet Protocol 4 addresses and 2^120 Internet Protocol 6 addresses which don't work with YouTube or QuickTime.

Address Space

Internet Protocol 6 has 128 bit addresses. That's more addresses than atoms in the visible universe. However, there are edge cases where that's insufficient. Internet Protocol 4 has 32 bit addresses (by default) and that was considered vast when it was devised. That was especially true when total human population was less than 2^32 people. Superficially, it was possible to give every person a network address.

Several address extension schemes have been devised. The best is RFC1365 which uses option blocks to optionally extend source and destination fields in a manner which is downwardly compatible. So, what size is an Internet Protocol 4 address? 32 or more bits, as defined by RFC1365.

Header Size

Internet Protocol 4 is often described as having a 20 byte (or larger) header while Internet Protocol 6 is often described as having a header which is exactly 40 bytes. This is false. IPv6 has option blocks just like IPv4 and therefore both have variable length headers. The difference is that IPv6 headers are usually 20 bytes larger.

Packet Size

IPv4 typically supports a PMTU of 4KB or more. Admittedly, there are no guarantees but Ethernet without packet fragmentation provides about 1500 bytes. With PPPoA or PPPoE over AAL5 over ATM, 9KB payloads only fragment over the last hop. This is ideal for video delivery. IPv6 only guarantees 1280 bytes. How common is this? Numerous variants of micro-controller networking only support 1280 buffers. This is especially true for IPv6 over IEEE802.15.4 implementations. This is especially bad for video.

Packet Fragmentation

IPv6 has no packet fragmentation. IPv6 packets which exceed MTU always disappear.

Packet Throughput

Compared to IPv4, IPv6 generally has longer headers, longer addresses and shorter payloads. On this basis, how would you expect packet throughput of IPv6 to match or exceed IPv4?

Summary

The introduction of IPv6 provides no particular benefit to end-users. IPv6 is detrimental payload size and this is particularly detrimental to video delivery.