(This is the 27th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
An obscure topic on video compression forums is the merits of hpel versus qpel. Consensus is that qpel should only should only be enabled where the majority of content moves slowly. What is this and is it even halfway correct?
A video compression motion delta may be specified in different forms. MPEG1 allows half pixel per frame movement to be specified. MPEG4 allows quarter pixel per frame movement. These are known a hpel and qpel units respectively. Half pixel movement has a profound limitation. Consider a checkerboard of highly contrasting pixels. Half pixel movement and the associated averaging will convert every pixel to mid gray in one step. And that's just in one direction. An hpel applies to horizontal and vertical movement. Therefore, it is common for a chunk of screen to become a four-way average of nearby texture. A qpel offers the possibility of a 1:3 mix in one direction or a 1:3:3:9 mix in two directions. This preserves some texture.
However, qpel has a different limitation. If bits per delta are fixed, a qpel only covers 1/4 of the screen area of an hpel and therefore a worthwhile match is less likely to occur.
A reasonable compromise can be made by specifying one third pixel steps. This is a tpel. A tpel provides maximum contrast which is moderately worse than qpel. However, it never incurs a qpel's worst case. Furthermore, a tpel is able to texture match over a larger area than a qpel.
Perhaps we should search further into the realm of 1/n motion deltas? 1/5 provides minimal benefit. 1/6 provides much of the functionality of 1/2 and 1/3. And anything smaller than a qpel provides very little area in which to match texture. So, pel, hpel, tpel and qpel offer the most range and flexibility. If downward compatibility is ignored, specifying a motion delta in tpel only would be a moderate choice. However, when transcoding legacy content, the failure to match features incurs either a loss of quality, an increased bit-rate (codec impedence mismatch) or a mix of these disadvantages.
Texture matching is a particularly asymmetric and processor intensive task. Where horizontal and vertical displacement are each defined as eight bit fields, there are more than 65000 potential matches and each may require a minimum of 16×16 unaligned memory accesses in each of three color-planes.
On some processor architectures, this invokes an edge case where unaligned memory access across a virtual memory page boundary incurs a 3600 clock cycle delay (and here).
Potential matches can be significantly reduced by setting a maximum radius. For real-time transcoding, the maximum radius may be dynamically adjusted up to a specificied maximum.
A further catch is that texture matching may be performed using the wrong quality metric. As noted in the defunct Diary Of An X264 Developer, if the quality metric is to obtain an approximate texture then the result will be an approximate texture but if the quality metric is to obtain sharpness between pixels then the result will be sharpness between pixels. That requires computing the difference between horizontally adjacent pixels and vertically adjacent pixels and using those as additional inputs for the approximation. That will be slower and especially so if code is written such that (2^n)-1 loop iterations prevent loops being unrolled properly or at all.
(This is the 26th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
For many years, the standard technique for video compression was to have one key-frame followed by a succession of differences. The standard technique became entrenched to the point that "key-frame every 16 frames" was almost mandatory. In extreme circumstances, I've stretched this to "key-frame every 600 frames" but I really strongly don't recommend repeating it because it is high susceptible to corruption and seeking within the video is extremely unresponsive.
In theory, many video formats allow bi-directional playback. So, it should be equally easy to play a video forwards and backwards. This relies on motion deltas being specified in a bi-directional manner. However, this feature is usually ignored because it significantly increases size but only provides marginal benefit. Also, it provide no benefit at all when randomly seeking to a frame of video. When seeking to a frame before a key-frame, the preceeding key-frame must be decoded in full and then the next 15 diffs must be applied. For any amount of processing power, there is a resolution of video in which this process cannot be performed rapidly. Even when reverse differences are available, a key-frame must be decoded in full and up to eight diffs may be applied.
The best feature of the BBC's Dirac codec greatly improves this situation. It simultaneously decreases the frequency (and bulk) of key-frames and increases the quality of the remaining frames. It also improves random access to arbitrary frames and this may be the reason for its development.
The BBC wishes to produce all video content from one common platform. This means dumping all raw camera footage to one respository and generating XML edit lists which reference video within the respository. It would also be useful if archived video and streamed video was in the same high quality format. Well, the BBC has some success with edit lists. However, the remainder has been disappointing with the exception that a codec with a very natty feature was developed.
Dirac arranges frames into a B+ tree. The root of the tree is a key-frame and the remainder are diffs. In theory, each tier of the tree may have a different fan-out. Unless you're doing anything particularly odd, the tree will always be a binary tree. (Tiger trees used in BitTorrent are similar. Fan-out can be anything but is invariably two.)
Tree decode require multiple sets of video buffers. However, even for 3840×2160 RGB at 16 bits per channel, that requires 50MB per tier. And there is scope to trade storage for processing power. Regardless, the advantage of this arrangement is considerable. For n tiers of tree, a key-frame spacing is (2^n)-1 but diffs never exceed n-1. So, for eight tiers, key-frame spacing is 255 but a frame is never constructed from more than seven diffs. That means image quality and seek time is superior to MPEG1 while storage for key-frames is significantly reduced. Admittedly, there is more change between many of the diffs. A minimum of 1/3 of the diffs are to the immediately following frame. A minimum of 1/3 of the diffs are to the subsequent frame. And the remainder cover larger spans. This increases the size of the average diff but it is invariably smaller than frequent key-frames. It is also more resilient to corruption.
Oh, we might be in one of the odd cases where a binary tree isn't the obvious option. With ternary audio, it might be beneficial to match it with ternary video. Unfortunately, if we retrieve 8192 audio samples per request and play 2000, 1920, 1600, 960 or 800 samples per frame then it might not make a jot of difference.
(This is the 25th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
MPEG1 was a huge advance in video compression but there was one feature which struck me as idiotic.
Before MPEG1, schemes to encode video included CinePak and MJPEG. CinePak uses a very crude color-space matrix transform to reduce bandwidth and processing load. With processing power available nowadays, this could be replaced with something more efficient. CinePak defines horizontal regions. Again, this could be replaced with tiles. CinePak also works in a manner in which it degrades into a rather organic stipling effects when it is overwhelmed. Overall it has good features which are worth noting.
Unfortunately, it was overshadowed by MJPEG. This was a succession of JPEG pictures and had the distinct advantage that encode time was approximately equal to decode time. This made it suitable for low latency, real-time applications, such as video conferencing. Unfortunately, that is about the extent of MJPEG's advantages. The disadvantage is MJPEG only has the image quality of JPEG. The most significant limitation in JPEG is that the use of DCT over small regions leads to artifacts between regions. In the worst case, JPEG artifacts make a picture look like a collection of jigsaw pieces. When applied to MJPEG, the hard boundaries are unwaveringly in the same position in each frame. MJPEG also fails to utilize any similarity between successive frames of video.
MPEG1 changed matters drastically. Like MJPEG, MPEG1 used a JPEG DCT. However, it is typically used in bulk every 16 frames. The 15 frames in between are a succession of differences. In practice, this reduces the volume of data by a factor of three. However, the techniques used between key-frames are truly awful and there is a noticable difference in sharpness when each key-frame is displayed. At a typical 24 frames per second, this occurs every 2/3 second.
That's an unfortunate effect which otherwise allowed full audio and video to be played from CDROM at no more than 150KB/s. The idiotic part is that the techniques don't scale but that hasn't stopped people taking them to absurd extremes.
Between MPEG1 key-frames, a range of techniques can be applied. Horizontal or vertical strips of screen can be replaced in full. This is rarely applied. The exception is captions and titling which may cause significant typically cause significant change to a small strip at the bottom of a screen. Another technique is lightening or darkening of small regions or strips. However, it is the motion delta functionality which is most problematic and not just because movement is mutually exclusive with change in brightness.
Within MPEG1, it is possible to specify 16×16 pixel chunks of screen which move over a number of frames. There is an allocation of deltas and, in any given frame, movement can be started or stopped. Therefore, a chunk of screen which moves over eight frames requires no more encoding overhead than a chunk of screen which moves over one frame. Unfortunately, the best encoding quickly becomes a combinatorial explosion of possibilities to the extent that early MPEG1 encoding required hiring a super-computer.
Having regions of screen moving about autonomously isn't the worst problem. As screen resolution increases, motion deltas have to increase in size or quantity. This isn't a graceful process. If horizontal resolution doubles and vertical resolution doubles then motion deltas should quadruple in size and/or quantity. At 352×288 pixels (or less), hundreds of motion deltas are worthwhile but at 1920×1080 pixels other techniques are required.
The simplest technique to ensure scalability is to use a quadtrees. Or, more accurately, define a set of tiles where each tile is an separate quadtree. This technique has one obvious advantage. Each tile may be lightened, darkened, moved or replaced with very concise descriptions. Furthermore, each tile can be sub-divided, as required. Therefore, irregular regions can be lightened, darkened, moved or replaced. If an object spins across a screen, this could easily overwhelm MPEG1. The output would look awful. However, with a quadtree, each piece of an object can be approximated. Likewise for zoom. Likewise for camera shake.
(Use of quadtrees does not eliminate co-ordinates when describing regions. Use of quadtrees merely amortizes the common top bits of co-ordinates during recursive descent and eliminates bottom bits when action for a large region is described.)
The tricky part is to define an encoding where a collage of tile types can co-exist. When this is achieved, a very useful, practical property arises. It is possible make an encoder in which choice of tile type may be restricted. Therefore, one implementation of encoder and decoder may cover the range from low latency, symmetric time, lossless compression to high latency, asymmetric time, lossy compression. By the geometric series, a quadtree has (approximately) one branch for every three leaves. Where branches are always one byte and leaves are always larger, the overhead of branches never exceeds 1/6 of the video stream. Indeed, if tile types are restricted to only JPEG DCT of the smallest size, performance is only a little outside of MJPEG parameters. When more tile types are enabled, performance exceeds MPEG1 by size and quality.
However, the really good part is that quality can be maintained within known bounds even if a piece of a video frame is missing. Tolerance to error becomes increasingly important as the volume of data increases. It also provides options to watch true multi-cast video and/or watch video over poor network connections.
From empirical testing with the 4K trailer for Elysium, I've found that everything is awesome at 4K (3840×2160 pixels). When the encoder recurses from 64×64 pixel tiles down to 8×8 pixel tiles and incorrectly picks a solid color tile, the result still looks good because the effective resolution is 480×270 pixels. So, even when the codec under development goes awry, it often exceeds MPEG1's default resolution.
(This is the 24th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
(This description of color video excludes SECAM and many other details whih are not relevant to explaining current or future techniques.)
Historically, television was broadcast as an analog signal. Various formats were devised. The two most commonly used formats were NTSC and PAL. NTSC was generally used in parts of the world where mains electricity supply was 60Hz. PAL was generally used in parts of the world where mains electricity supply was 50Hz. Both schemes use interlacing which is reasonably good compromise for automatically representing rapidly moving images at low quality and detailed images at higher quality.
NTSC has one odd or even field per 1/60 second. PAL has one odd or even field per 1/50 second. This matching of frame rate and electricity supply ensures that any distortion on a CRT [Cathode Ray Tube] due to power fluctuation is minimized because it is steady and consistant across many frames of video.
NTSC and PAL differ in matters where PAL had the benefit of hindsight. NTSC uses logarithmic brightness. This works well in ideal conditions but otherwise creates additional harmonics. PAL uses linear brightness. Also, PAL [Phase Alternate Line] inverts lines of video such that imperfections in the video signal are self-compensating. NTSC doesn't have this feature and is cruelly known as Never Twice the Same Color.
Analog television was originally broadcast in monochrome. As you'd expect, NTSC and PAL were extended in a similar manner. It was a clever technique which was downwardly compatible with monochrome receivers. It also takes into account perceptual brightness of typical human vision.
Two high frequency signals were added to the base signal. These conceptually represent red minus green and red minus blue. When the signal was rendered on a legacy monochromatic screen, objects are shown with expected brightness. When the signal was rendered on a color screen, the additional data can be decoded into three primary colors: red, green and blue. Furthermore, the signal may be encoded in proportion to light sensitivity. Human vision is typically sensitive to broad spectrum brightness and three spectral peaks. The base signal plus modulated color provide good representation in both modes.
So, we have two schemes which minimize analog distortion and maximize perceptual detail. A full frame of video is broadcast at 25Hz or 30Hz. A field is broadcast at 50Hz or 60Hz. One line is broadcast at approximately 15.6kHz. Color may be modulated at approximately 3.58MHz for NTSC and 4.43MHz for PAL. The effective bandwidth of the signal was 6Mb/s. All of this may be recorded to tape, sent between devices as composite video or broadcast over UHF.
Digital video follows many of these principles. Most significantly, digital video is typically encoded as a brightness and two components of color. Although the details vary, this can be regarded as a three dimensional matrix rotation. For eight bit per channel video, rotation creates horrible rounding errors but the output is significantly more compressible. An object of a particular color may have minor variation in brightness and color. However, across one object or between objects, brightness varies more than color. It is for this reason that human vision is typically more sensitive to brightness.
A practical, binary approximation of light sensitivity is a Bayer filter in which green pixels occur twice as frequently as red or blue pixels. Differing arrangements of cells used by different camera manufacturers contribution to a proliferation of high-end "raw" still image formats. It can lead to a infuriating lack of detail about camera resolution. Technically, single CCD cameras don't have any pixels because they are all interpolated from the nearest cells in a Bayer filter. Regardless, use of a Bayer filter accounts for the switch from analog blue screen compositing to digital green screen compositing.
With or without interpolation, color is typically stored at half or quarter resolution. So, a 2×2 grid of pixels may be encoded as 4:4:4 data, 4:2:2 data, 4:1:1 data or other encoding. Typical arrangements include one average value for each of the two color channels or vertical subsampling for one color channel and horizontal subsampling for the other color channel. (Yes, this creates a smudged appearance but it only affects fine detail.)
Channels may be compressed into one, two or three buffers. One buffer minimizes memory and compression symbol tables. Two buffers allow brightness and color to be compressed separately. Three buffers allow maximum compression but requires maximum resources.
There are at least eight colorspace conversion techniques in common use. The simplest technique in used in CinePak video compression. This only requires addition, subtraction and single position bit shifts (×2, ÷2) to convert three channels into RGB data. This crude approximation of YUV to RGB conversion minimizes processor load and maximizes throughput. This allowed it to become widespread before other techniques became viable. Regardless, the encoding and decoding process is representative of other color-spaces.
For CinePak color-space decode, variables y, u, v are converted to r, g, b with the following code:-
r=y+v*2;
g=y-u/2-v;
b=y+u*2;
This is effectively a matrix transform and the encode process uses the inverse of the matrix. This is significantly more processor intensive and requires constants which are all integer multiples of 1/14. Specifically:-
y=(r*4+g*8+b*2)/14;
u=(r*-2+g*-4+b*6)/14;
v=(r*5+g*-4+b*-1)/14;
Other color-spaces use more complicated sets of constants and may be optimized for recording video in studio conditions, outside broadcast or perceptual bias. In the case of JPEG or MJPEG, color-space may be implicit. So, although it is possible to encode and decode pictures and video as RGB, arbitrary software will decode it as if it was YIV color-space.
Images and video may be encoded in other color-spaces. One option may be palette data. In this case, arbitrary colors may be decoded from one channel via one level of indirection. Typically, a table of RGB colors is provided and a decoded value represents one color in the table. One or more palette values may have special attributes. For example, GIF allows one palette entry to represent full transparency. PNG eschews palettes and allows arbitrary RGBA encoding. (Furthermore, GIF and PNG each have an advanced interlace option.)
No concessions are made for the relatively common case of red-green color-blindness. In this case, DNA in a human X chromosome has instructions to make rhodopsin with a yellow spectral peak instead of a green spectral peak. With the widespread use of Bayer filters, digital compression and LCD or LED display, it is possible to provide an end-to-end system which accommodates the most widespread variants of color response. At the very least, it should be considered a common courtesy to provide color-space matrix transforms to approximate the broader spectral response when using legacy RGB hardware. Even when a transform is implemented in hardware, two or more options should be given.
It may also be worthwhile to encode actinic (also known as thule). This is near ultra-violet which is perceived as purple white by people without ultra-violet filtering eye lens. This may occur due to developmental issue, injury or surgery. Early artificial lenses lacked UV filtering and therefore cateract patients commonly gained the ability to see near UV.
Alternatively, it may be worthwhile to define a number of arbitrary spectral peaks and optionally allow them to be grouped into two, three or four corollated channels.
One desirable task for a video codec is screen or window remoting. For best effect, this requires channels outside RGBA or equivalent. Other channels may include a blur map and a horizontal and vertical displacement map. The QNX Photon GUI allows this functionality. However, it is implemented as an event driven model where re-draw requests to an application may be re-emitted to windows further down the window stack. The returned bitmap may be arbitrarily processed before it is passed back up the stack. When the response to a re-draw request reaches the top of the stack, it is rendered to a display buffer which may or may not be virtualized. While this architecture allows arbitrary transparency effects, it also allows any malicious application to snapshot, OCR or otherwise leak data which is displayed by other applications.
There is a further limitation with the compositing event model. A window may only be in one place. It cannot be remoted to multiple devices or shared with multiple users. However, if each display performs compositing, a window may appear correctly on all of them.
In summary, a video codec should provide:-
(This is the 23rd of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
There comes a point during network protocol development when someone decides to aggregate requests and/or acknowledgements. Don't do this. For acknowledgements, it will screw statistical independence of round-trips. It may also set implicit assumptions about the maximum rate of packet loss in which a protocol may work.
In the case of requests, aggregation may be a huge security risk. When I first encountered this problem, I didn't know what I was facing but, instinctively, it made me very uneasy. The problem was deferred but not resolved. A decision was made to implement a UDP server such that a one packet request led to zero or one packets in response. (This ignores IPv4 fragmentation and/or intended statistical dependence of multiple responses.)
In the simplest form, every request generates exactly one response. This greatly simplifies protocol analysis. It is only complicated by real-world situations, such as packet loss (before or after reaching a server) and crap-floods (accidental or intentional). Some of this can be handled with a hierarchy of Bloom filters but that's a moderate trade of time versus state.
On the server, there was great concern for security. In particular, logging was extensive to the point that a particular code-path may set a bit within a response code which was logged but not sent to a client. Indeed, logging was extensive to the point that there was a collection of log utilities around the server; performing basic tasks, such as teeing the primary text log and bulk inserting it into an indexed, relational OLAP database. (A task that systemd has yet to achieve with any competence or integrity.) The importance of doing this correctly allowed text logs to be compressed and archived while simultaneously allowing real-time search while simultaneously allowing a lack of bulk inserts to raise a warning over SMS.
However, the asymmetry between the size of request packets and response packets created pressure to aggregate requests. This was particularly pressing on a kernel, such as MacOSX 10.6, which limited UDP buffers to a total of 3.8MB.
The Heartbleed attack left me extremely mixed. I was concerned that my ISPs would be hacked. I was relieved that it didn't increase my workload. It also resolved my unease. That was not immediately apparent. However, SSL negotiation begins with an escape sequence out of HTTP. From there, a number of round-trips allow common ciphers to be established and keys to be exchanged. Unfortunately, none of this process is logged. This was a deliberate design decision to maintain a legacy log format and implement ciphers with a loosely inter-operable third-party library which has no access to the server's log infrastructure.
This immediately brought to mind an old EngEdu Aspect Orientation presentation. Apparently, logging is a classic use case for aspect orientated programming. (People get stuck on efficient implementation of aspects but I believe techniques akin to vtable compression can be applied to aspects, such as Bloom filters. A more pressing problem is register allocation for return stack, exception stack and other state. Perhaps pushing the state of a Bloom filter of dynamic vtable addresses on a return stack can double as a guard value?)
Anyhow, the current HTTPS implementations completely fail to follow good practice. In particular, if each round-trip of negotiation was logged, it would be possible to find clients doing strange things and failing to connect. And with appropriate field formats, truncated strings would not be accepted. Ignoring all of this, an HTTPS log entry is a summary of a transaction whereas we want each stage. How would this apply to aggregated UDP requests?
An inline sequence of requests invite problems because the retrospective concatenation of requests may not be isolated in all cases. Having a request type which is a set of requests is no better. Firstly, it is not possible to prevent third-parties from ever implementing such a request type. Secondly, it is not possible to prevent third-parties from ever accepting nested request sets. Perhaps it would be better to make the core protocol handle a set of requests? Erm, why are we adding this bloat to every request when it does not preclude the previous cases?
And there's the core problem. It is absolutely not possible to prevent protocol extensions which are obviously flawed to anyone who understands software architecture. Nor is it possible to prevent more subtle cases.
(This is the 22nd of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
I've noted that a small fixed length cell format is viable for communication between computers. What's the fascination with this task?
For these reasons, it'll get used even in circumstances where it seems like a really tortuous choice.
The first place where it will get used is between a host computer and a speaker array's sound processor. For development, this will use an Arduino Due. This oddly named device contains an 84MHz ARM Cortex M3 processor. Unfortunately, it also comes with a hateful development environment. Furthermore, support code (boot-loader, libraries) is supplied under differing licences. Most critically, the fast USB interface is largely undocumented and data transfer may be initially implemented over a virtual serial port.
Although it seems mad to send 32 byte, bit stuffed, fixed length cells over a virtual serial port over USB, it has the following advantages:-
There is one horrible exception to using the cell structure everywhere. By volume, the typical case for data transfer is a host computer sending sound samples to sound array processor. This communication occurs in one direction over USB. When data is received by a USB interface, it may transferred to main memory at a specified address. It may be an advantage to perform this such that:-
This should be sufficient to implement many of the basic requirements of a speaker array within the available processing power.
(This is the 21st of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
It is possible to tunnel arbitrarily long packets over fixed length cell networks. This works with conventional error detection and error correction schemes. Even if you dispute this, I'd like to describe my preferred, concise implementation.
A 24 byte fixed length cell is significantly smaller than ATM's 53 byte fixed length cell. However, 24 bytes is sufficient for reliable signaling, voice communication, slow-scan video and encapsulation of other network protocols. Furthermore, 24 bytes after 4B5B (or suchlike) bit stuffing is 240 bits. With the addition of a 16 bit cell frame marker, we have 256 bits. So, 24 bytes get encoded as 32 bytes. Over high speed networking, this can be implemented with an eight bit binary counter. Over low speed networking, multiple channels can be bit-banged in parallel.
There exists a low overhead method for bit stuffing. There exists a cell frame marker which always violates bit stuffing. Therefore, nothing in payload can immitate a frame marker. Therefore, the system is fairly immune to packet-in-packet attack without further consideration.
Addressing may be performed with a routing tag within each cell and a source address and/or destination address within each packet. Partial decode of the bit stuffing allows cells to be routed without decoding or encoding contents in full. Combined with techniques such as triple-buffering, each channel requires no more than 96 bytes excluding pointers and one common decode buffer. Eight channels require less than 1KB including pointers and common state. Therefore, it may be possible to implement cell networking on very basic hardware. This includes an eight bit micro-controller with less than 1KB RAM. Furthermore, it is possible to perform routing of packets which exceed 1KB via such a device.
However, more resources are required to perform security functions. In particular, key exchange and hash verification is very likely to require fields which exceed a 24 byte cell. Therefore, secure end-to-end communication with a leaf node requires a device which can unpack a payload which spans multiple cells. It remains desirable to implement triple-buffering at this level but it is also desirable to have an MTU which greatly exceeds 1KB. This is in addition to cryptography state, entropy state and application state. Despite these constraints, it is possible for a system with 64KB RAM to provide secure console and graphical services which are extremely tolerant to packet loss.
(This is the 20th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
While I await components for a toy robot, a somewhat toy 52 Watt quadcopter and a very serious 3D speaker array system, I'll explain some previous research which is not widely known. I'll start with an introduction, move to implementation details, limitations and a possible solution.
Cells
Local Area Networks typically use variable length packets. Our dependence upon wired Ethernet and wireless Ethernet is so widespread that people have difficulty imagining any other techniques. However, while Ethernet dominates short spans of network, long-distance connections invariably use fixed length cells. This includes the majority of digital satellite communication, cell-phones and broadband systems. The division between LAN and WAN [Local Area Networking and Wide Area Networking] remains very real due to pragmatic reasons.
Variable length packets maximize bandwidth but fixed length cells maximize reliability. For long-distance communication, reliability is more important than bandwidth. Indeed, without reliability over a long span, bandwidth is zero.
Framing
Finding the start of a packet is difficult and cumbersome. Typically, there is a start sequence. This applies even in the trivial case of RS-232 serial communication in which start bits, stop bits, payload bits and parity bits are all configurable. For Ethernet, the start sequence is a known pattern of eight bytes. For Zigbee, it is a shorter pattern of four bytes. This inefficient but acceptably so within the span of a broadcast network. However, pre-amble is highly insecure because the patterns used in the start sequence are also valid patterns within a payload. This makes Ethernet, Zigbee and many other protocols vulnerable to packet-in-packet attack. I've described the danger of packet-in-packet attack applied to AFDX but this is often met with dis-belief.
Cell structures don't have this problem because data is inserted between a regular spacing of boundary markers. Admittedly, this arrangement is superior for continuous point-to-point links; especially when a continuous stream of empty padding cells is sent to maintain a link. This isn't an option for burst protocols between many nodes but it is very useful between long-term partners.
Cell boundary markers work in a similar fashion to NTSC horizontal sync markers and allow transmitter and receiver to stay on track over long durations. In the very worst case, the marker pattern should never be off by more than one bit from its expected position. (Any difference occurs because the transmitter oscillator and the receiver oscillator run at very similar frequencies but not identical frequencies.) Furthermore, when this case occurs, the contents of a cell is known to be suspect.
Adaption
It is very useful to transfer packets over cells. This is achieved by fragmenting packets across multiple cells. In this case, a proportion of a cell is required to indicate the first fragment, the last fragment and/or a fragment number. For the remainder of the cell, each packet begins at a cell payload boundary. The last cell of a set may be padded with zeros. This ensures that the next packet begins at the next cell payload boundary.
A common case is also a worked example. If a single byte payload is sent over TCP/IPv4 over PPPoA over AAL5 [ATM Adaption Layer 5] then packet length is typically 41 bytes (40 byte IPv4 header, 40 byte TCP header, one byte payload) and cell size is 48 bytes. This would appear to fit into one cell. However, AAL5 encapulation overhead is 6-8 bytes per cell plus one bit(!) within ATM's five byte header. Anyhow, in this scenario, a packet may be split at the end of its TCP header. In typical cases, TCP/IPv4 packets fragment over PPPoA over AAL5. In all cases, TCP/IPv4 packets fragment over PPPoE over AAL5. However, there is a large amount of inefficiency with this arrangement. It almost shouts "This is badly implemented! Fix me!"
(Technical details are taken from an expired patent application and are therefore public domain.)
A consideration of AAL5's large headers and a consideration of cell re-fragmentation across multiple cell networks led to the insight that cells within a set should be sent backwards with a cell number. This arrangement permits several tricks with buffers and state machines. In particular, it permits simple low-bandwidth implementations and optimized high-bandwidth implementations. In all cases, receipt of fragment zero indicates that the final cell within a set has been received. A counter may determine if fragments are missing. FEC may be performed and/or the next protocol layer may be informed of holes.
Where cells are re-fragmented, a decrementing cell offset allows arbitrary re-fragmentation without knowing the exact length of encapsulated payload. (This incurs up to one cell of additional padding per re-fragmentation.) In some cases, length of bridged headers may not be known either. This occurs at near-line-speed without receipt of a full packet.
Further consideration of payload size and resilience leads to an encapulation header with two or three BER fields. The first field is the fragment number. This can be partially decoded without consideration of packet type. The second field is packet type. For trivial decoders, this acts as a rip-stop for a corrupted fragment number. In the case of frag=0, a third field is present. This is the exact payload length. Many protocols which could be encapsulated (Ethernet, IPv4, IPv6, TCP, UDP) already have one or more payload lengths. This field covers trivial protocols which don't specify their own length. It is typically no more than two bytes. (For BER, this allows values up to 2^14-1 - which is 16383 bytes.) It can be de-normalized with leading zeros but this introduces significant inefficiency in boundary cases. Specifically, a de-normalized byte may incur one extra cell. However, during particular cases of re-fragmentation, there may be boundary cases where the payload length field is itself an ambiguous length. In this case, it must be assumed to be maximum length. Foreseeably, this leads to inefficiency when handling particular packet lengths.
This arrangement allows arbitrary packets to be tunnelled over protocols ranging from CANBus (8 bytes) to TeTRa (10-16 bytes) to ATM (48 byte) to USB (512-1024 bytes).
A variant of this arrangement permits Ethernet over a two byte payload. This is an extreme example which remains impractical even if you convince someone that it is possible. A 16 bit payload can be divided into a variable length fragment number field and a variable length payload field. If the first bit is zero then 10 bits represent fragment number and five bits represent payload. If the first bit is one then 11 bits represent fragment number and four bits represent payload. It is implicit that all 10 bit fragment numbers precede all 11 bit fragment numbers. Regardless, this is sufficient to send Ethernet over two byte payloads in a manner which is optimized for shorter Ethernet packets. Jumbo Ethernet over CANBus is suggested as an exercise.
Trump Jr.’s Russia meeting sure sounds like a Russian intelligence operation (archive.is)
Donald Trump Jr. is seeking to write off as a nonevent his meeting last year with a Russian lawyer who was said to have damaging information about Hillary Clinton. “It was such a nothing,” he told Fox News’s Sean Hannity on Tuesday. “There was nothing to tell.”
But everything we know about the meeting — from whom it involved to how it was set up to how it unfolded — is in line with what intelligence analysts would expect an overture in a Russian influence operation to look like. It bears all the hallmarks of a professionally planned, carefully orchestrated intelligence soft pitch designed to gauge receptivity, while leaving room for plausible deniability in case the approach is rejected. And the Trump campaign’s willingness to take the meeting — and, more important, its failure to report the episode to U.S. authorities — may have been exactly the green light Russia was looking for to launch a more aggressive phase of intervention in the U.S. election campaign.
Emphasis mine.
On June 2, in a discussion about whether browsers ought to support JavaScript in the first place, I wrote a comment that cited the article "Please don't use Slack for FOSS projects" by Drew DeVault that recommended IRC over Slack, Skype, Discord, and other proprietary web chat platforms. I mentioned that IRC alone is incomplete for the job without a logging bouncer to keep a log and an attachment pastebin to hold pictures, documents, and the like.
In that comment, I mentioned having read a different article about why IRC is just as bad because real-time communication discriminates against users in minority time zones, who might miss the opportunity to participate due to being asleep or at work. But I couldn't dig it up at the time. Today I happened upon it again: "Why Slack is inappropriate for open source communications" by Dave Cheney recommends that projects instead use forum-like asynchronous communication, such as mailing lists and issue trackers, where each thought has its own URI and there's not as much shame in being a day behind.
It goes to show the use of trying different search engines. Google and DuckDuckGo gave different results for a query expressing the key concept that I took from Cheney's article (chat discriminates by time zone). DuckDuckGo pulled up Cheney's article first, while Google tried to second guess what I wanted: "Missing: chat discriminates"