Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


Rectangular Tuit, Part 1

Posted by cafebabe on Friday July 21 2017, @02:42AM (#2518)
2 Comments
Hardware

(This is the 33rd of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I'm working on a 3D surround sound speaker array. I received a clone Arduino Due about a week earlier than expected. This has put me in a moderate panic in which I have a depreciating asset which is not working. I've been jolted into resolving loose ends, such as Voltage levels, API and object code deployment. From personal experience and with sincerity, I recommended acquisition of a vintage round tuit. Well, I'm going to consider an Arduino Due as an even more effective Rectangular Tuit.

This tuit is about the size, power and price of a Nokia 3210. Unlike many credit card computers, it does not require a storage card during operation. Instead of screen, keys and radio, there are about 50 I/O lines. Operating system overhead is optional. Therefore, it is expected that timer interrupts at 48kHz will be the highest priority interrupt (or second highest if USB communication takes precedence.) 84MHz divided by 48kHz is exactly 1750 processor ticks per sound sample. Within this period, a processor must:-

  • Context switch.
  • Maintain a large circular buffer of sound samples received from host.
  • Maintain a small circular buffer of sound samples taken from the large buffer for the purpose of compensating for differing speaker distance.
  • For each speaker, obtain a historical sample from the small buffer where time delay compensates for speaker distance.
  • For each speaker, for each channel, multiply and add each historical sample against values which are pre-computed to cover speaker angle, speaker distance and master volume.
  • Bit-bang values to 12 bit, 16 bit, 18 bit or 24 bit DACs. It may be worthwhile to interleave multiplication and bit-banging so that signals appear on outputs for maximum duration.
  • Bit-banging of cell networking may also occur.

It may be useful to vary this process for 192kHz monophonic sound or suchlike. Regardless, with available processing power and I/O, it may be possible to drive 64 speakers and/or multiple sets of headphones. Each set of headphones may have accelerometers and/or Hall effect devices for the purpose of maintaining Ambisonic sound-stage position.

Unfortunately, timing is complicated by the Arduino API which only permits timers to be specified to the nearest microsecond (rather than nanosecond). Therefore, it may be easier to play sound in multiples of 50kHz. The alternative is patch the Arduino libraries or implement a system which plays samples slightly too fast. The pitch shift would be approximately 0.2% but I find it less tacky to omit or repeat 1:600 samples or so rather than endure a pitch shift.

There are further limitations. The tuit uses a different instruction set to other Arduino boards. Support is greatly restricted. The Jul 2017 release of Raspbian doesn't work with the tuit. Nor is it available from an Ubuntu Linux desktop which was configured to install everything. However, it appears that using a hateful IDE is definitely optional.

Typically, micro-controller development uses an IDE. This may be Eclipse, Atmel Studio, Arduino's IDE or something else. I vastly prefer to be unconstrained by text editor. This should be the norm when using serial line programming (or virtual serial port programming tunnelled over USB). However, it is now custommary to have configuration within an IDE to specify virtual serial port name and invoke an external chip programming utility from a menu item.

For Atmel AVR chips, avrdude appears to be the default choice. For Atmel ARM chips, bossac performs the same function - or not if distributions don't have the configuration. Either way, I'd prefer to perform these steps via a Makefile. In addition to simplifying deployment, it fixes problems with deployment speed.

The End Of "Piracy"

Posted by cafebabe on Thursday July 20 2017, @01:27PM (#2516)
10 Comments
Techonomics

(This is the 32nd of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I'm promoting a set of ideas which rely upon multi-media and gain maximum value from ease of access to third-party multi-media. I would like to see people fairly rewarded for producing high quality, factually accurate content. Unfortunately, by mentioning "fair", "quality" and "factual", I'm already in a quagmire. Without resolving these issues, I'll extrapolate content distribution which is likely to occur with or without the legal permission of content producers.

Although patents and copyright should be considered separately, I have one patent pending with a failing nation state. Also, I believe that it is pertinent to mention that my beliefs differ from law. I believe that privacy should have more emphasis than free speech. I believe that rights should be extended to animals. This includes privacy. I believe that copyright, work-for-hire and privacy can be aligned towards maximum collective benefit rather than historical levels of effort. I assume that I'm a pragmatist who can overcome some bias to understand people who don't follow law or social norms.

I hope that people understand the distinction between theft of physical property and the "theft" of information. The former may include physical violence or may otherwise incur permanent injury or risk to life. The latter leaves people (mostly) with the same bit patterns but may incur loss of privacy and/or loss of expected future revenue. Physical loss and data "loss" can both be devastating but people affected by "theft" are very likely to be content rich, physically secure and unaffected by an illusory social safety net.

It should be obvious that large-scale audio copying preceded video copying because audio generally requires less resources. However, a friend and I were quite curious about a potential rise in book copying. Historically, the easiest method to copy a book was photocopying and this was generally more expensive than buying another, legitimate copy of a book. Therefore, photocopying was generally restricted to sections of a book, rare books or matters of urgency. However, technology has changed and it is now possible to copy thousands of books per minute.

My friend, an author, was concerned that discs with 20000 books were openly advertised on EBay and elsewhere. I assumed it was only books that were freely available. My friend was less certain. I was asked to investigate. So, I sent about US$5 to a relatively reputable party and recieved a disc. Indeed, it had about 20000 books and, indeed, it was a smattering of text files with Gutenberg legal notices. However, some of it was certainly violating copyright. I gave the disc to my friend without taking a copy. (What am I going to do with 20000 fiction books? Read them all?)

My friend has now come to the conclusion that it is a sign of merit to be included in a general corpus of literature. Indeed:-

You do anything in the world to gain a reputation. As soon as you have one, you seem to want to throw it away. It is silly of you, for there is only one thing in the world worse than being talked about, and that is not being talked about. -- The Picture Of Dorian Gray by Oscar Wilde.

However, a major question remains. Why did book copying gain popularity after music and film copying? It is large *sets* of books and therefore the volume of data exceeds the average film. From minor extrapolation, there will be a subsequent round of music and film copying but each will be a set of media. So, expect stuff like 1960s-Motown.zip followed by French_Noir_(Subtitled).7z and finally Terran-Culture-(1650-2050).tar.xz

However, what happens when those fine pioneers of the digital frontier run out of third-party content? Will they live-stream themselves at increasingly detailed resolution? Or will they find other stuff to distribute? We've already seen US-Diplomatic-Cables-(1966-2010) (and the resulting chaos). Will we see US-Tax-Records-(2010-2022).zip? Maybe Medical-Records-UK-NHS-2035.zip? DNA-From-Iceland-2042.7z? What havoc will this cause? Will we see the Helvetican War where lawyers, bankers and sociopaths are hung from lamp-posts? Will this be followed by bio-engineered genocide? I don't know. However, I know that:-

  • Bad things happen when the bread and circuses stop.
  • Models are useless if they assume that everyone is rational.

Homage To Space Westerns

Posted by cafebabe on Thursday July 20 2017, @06:44AM (#2515)
0 Comments
Software

(This is the 31st of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

We've come to the end of computing as originally envisioned by Xerox, a photocopier company and, to quote one end-user, our tech is shit. Furthermore, it is directionless with the exception that we use science fiction as our template to a greater or lesser amount. Firefly and The Expanse is a more pragmatic outlook than StarWars or the utopia of StarTrek. But what gives us best bang-for-buck? How do we get from here to there while avoiding a deluge of foreseen consequences? I see a way which is broadly consistent to the point that I encourge over-use to improve portability. However, it infuriates that similar solutions aren't being widely sought.

The approach is pretend that almost every technical development from Eternal September to present didn't occur. That would be an era without top-posting, PDFs or web browser DRM. It is an alternate path where Windows didn't dominate. If that scenario actually occurred, I doubt that vertically integrated companies, such as Commodore, Apple or Sun, would have been a better outcome than the Microsoft monopoly. Regardless, we have the hindsight to pick and choose from a rich history (while being very careful about recent developments) and then apply it to the current scale of integration - or anything simpler.

Currently, consumer and small business computing is meeting in the middle of Windows and Linux. Computing has become laughably insecure and it is getting worse. A chancer like John McAfee can suggest a "Privacy Phone" with physical switches to I/O. This is intended to be a stop-gap solution to improve security. However, look at the placement of the switches. They've never made anything close to a mock-up but none of the established interests dismiss it because they have no better solutions (and nothing to gain from highlighting the current situation).

Privacy looks like an illusion; a temporary aberration in a narrative from tribes to cyborgs. That's how major corporations and nation states like to portray the situation, especially when they have unwarranted access to servers, desktops, laptops, tablets and phones. The official recommendations of the 1988 Internet Worm were never heeded. (Hey, what happened to the author of the first Internet Worm? Did they put him in jail for 60 years? No, the Ivy League son of a spook became a venture capitalist.)

At a minimum, we should have five of everything in common use: processors, compilers, operating systems, databases, application suites. We are failing miserably. Rock's law may impinge on hardware but we have no excuse in any other category. And if we believe that North Korea attacked UK government infrastructure then we are in a theater of war where it is advantageous to have the low ground.

We have barely enough infrastructure to co-ordinate and disseminate sustainable solutions. This would be solutions which don't empower a nation state and don't enrich a Silicon Valley oligarch. This is difficult but I'll try.

I envision components which are mostly software or can be implemented as software rather than hardware. Components can be used in quadcopters, robots, hi-fi, sensors and actuators. This is technology which should be Christmas light divisible: it should do something sensible or it should be open to modification. Do you remember when Spock re-wired his flip-phone to get extra range? Or when Geordi LaForge gets rescued because Federation technology inter-operates so well? That's Christmas light divisible - and we should aspire to this because there are no silver bullets which cover every case.

Consider an environment of isolinear relays and LCARS terminals. (I'd love a system which is functionally equivalent to LCARS. By happenstance, I found that oversize text is an emergent property of one fairly efficient implementation.)

It may be in a mundane environment which waters your plants, feeds your fishes or brews you beer. You wouldn't object if surplus bandwidth provided perimeter security. However, you would object if each device cost US$50. So, how do we make this secure and cost effective? Cut the RAM and processing power to minimum. This also makes it economically viable. We compete on our terms by getting closer to the theoretical optimal solution. The pay-off for halving RAM, I/O hardware or processing power is huge. The goal is to have one or more implementations which are good enough to maintain a network effect of inter-operability. If we consider ITRON, there is the outside possibility that billions of devices may be manufactured over decades. (ITRON is for micro-controllers and is a portable interface like POSIX but is concerned with timers and event buffers rather than files and messages.)

I've tried really hard to not be a greenfield techie. I tried to graft more innovation onto the ediface of contemporary computing. However, it is becoming increasingly difficult and may be doing all of us a dis-service.

I spent about eight months attempting to compile software securely and efficiently. It is a lost cause. It is possible to re-compile one component if every other component remains stable. However, coupling between components is far too tight. This has two practical consequences. Firstly, it is not possible within one lifetime to solder a trustworthy CPU and I/O devices, run an audited kernel, run audited applications and access the majority of the contemporary Internet. NAND To Tetris is a rigged academic exercise which should be best practice but is wishful thinking. Secondly, there is no meaningful software provenance. When software components directly rely upon dozens of other components, a succession of re-compiled components may create hundreds of additional references to software versions. A global re-compilation of an open source operating system may extend the chain of provenance by hundreds of thousands of references. We cannot trust our processors, our kernels, our compilers, our text editors, our web browsers or anything else - and our collective chain of trust is hopelessly lost in the mist of time.

It is tempting to discard everything and follow NAND To Tetris rigidly. But what does that achieve? You'll have something which is slower than a Commodore 64 and you'll be the only user. Instead, apply Christmas light divisibility. What was the most worthwhile development from Eternal September to present? 23 years of popular culture: film, music and literature. Forget about Solaris, ICQ, Flash animation, Kazaa and Windows Vista. Instead, remember Bourne, Bond and Batman. Oh, and people built up a huge pile of proprietary documents and hyperlinks.

Rather than treat legacy content as a first-class case, treat everything which is secure and verifiable as the first-class case and treat all of the whiz-bang, high bandwidth multi-media as a second-class case. It would be great to provide 100% downward compatibility but cases such as Flash are unviable. You'll be glad to know that I've considered a large number of other cases and the vast majority can be salvaged. It requires one or more user interfaces which provide:-

  • High bandwidth video.
  • High quality audio.
  • Legacy application bridge.
  • Document processing.
  • Internet access.
  • Home automation.
  • Home security.
  • User functionality.

This may be based around an interface like Kodi - or Kodi itself. We can also run a subset of this functionality on a secure console.

Video Codec: Text Console

Posted by cafebabe on Tuesday July 18 2017, @06:16AM (#2511)
0 Comments
Software

(This is the 30th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

XWindows has a lovely feature. Actually, it has more than one lovely feature but I wish to concentrate on one feature in particular. XWindows has multiple types of windows. The most common is a rectangular bitmap which is maintained in response to application re-draw requests. There is a stereoscopic mode for windows. There is also an irregular bitmap. However, I wish to concentrate on text window mode.

I fear that text window mode, used by xterm, is being deprecated by gterm which uses an arbitrary bitmap for the purpose of implementing tabs. Text used to label each tab is drawn at a size specified by Gnome preferences and this rendering of tabs is sufficiently messy to cast a window from text to bitmap.

This led me to the consideration that if a video codec is used to implement a windowing system and the video codec is a collection of disparate tile types within quadtrees then it may be possible to unify text and bitmap windows - with the exception that it may be preferable to resize a text window in larger increments than arbitrary pixels. It might also be worthwhile to fix ANSI color and make foreground and background colors contrasting in the case that a user has a light or dark background color. It is also possible to implement eight bit color, 16 bit color or 24 bit color without the legacy of ncurses5.

With a quadtree, it is also possible to define characters which are 2×2 default size or 4×4 default size. This may be excessive and therefore it may be preferable to define double width cells (akin to Epson double width escape code) or double height cells (akin to Teletext double height escape code). However, I am definitely not implementing blink because this is not conducive to low-power display.

A quick method to bootstrap display of extended symbols is to allow multiple glyphs to be sent within one cell. In the case of diacritical marks, such as umlaut, this is fairly easy to implement. In the the absurd reduction, it should be possible to send left ascender and right ascender so that b and d can be rendered as glyphs.

I already mentioned that it is possible to route quadtree packets over intermediate nodes with 1KB RAM and perform screen remoting to micro-controllers with 64KB RAM. It should now be apparent that such a system is going to be blocky and colorful in the manner of a Commodore 64.

Video Codec: Low Color Modes

Posted by cafebabe on Tuesday July 18 2017, @05:07AM (#2510)
2 Comments
Software

(This is the 29th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

In some circumstances, 10 bit per channel HDR color (1 billion colors) isn't enough. In other circumstances, 64 colors is more than enough. If only there was a way to switch to a different color scheme with an escape code or something. The benefit would be an instant reduction in bit-rate. Anyhow, this is a suggested list of color palettes which can be represented in eight bits:-

  • Monochrome.
  • 6×6×6 color cube.
  • Pastel palette.
  • Leafy green palette.
  • Sky palette.
  • Hair color palette.
  • Skin-tone palette.

However, my favorite scheme is a 16 bit respresentation. It allows four hexadecimal digits to be mixed in approximate ratios of 4:3:2:1. However, the ratios are slightly skewed so 3:2:1 and 2:1 ratios can also be approximated by repeating digits. The skew prevents strict repetition and therefore it is possible to obtain a smooth transition of eight colors by defining colors of the form:-

  • (red,red,red)
  • (red,red,blue)
  • (red,blue,red)
  • (red,blue,blue)
  • (blue,red,red)
  • (blue,red,blue)
  • (blue,blue,red)
  • (blue,blue,blue)

My favorite feature of this scheme is that one or more colors can be reserved for user defined colors. This may include default foreground color, default background color and prefered highlight color. That technically means that an encoder using such representations doesn't know how it will be decoded. However, it permits basic styling to be performed. In a windowing environment, different preferences may be set for different servers. So:-

  • Windows for user applications may appear in corporate colors.
  • Windows for server adminstration may appear in "danger" colours.
  • Windows for personal stuff may appear in preferred colors.

In each case, a server sends mixes of reserved colors. So, for example, to obtain four shades between a reserved color and white, use the following:-

  • (reserve,reserve)
  • (reserve,white)
  • (white,reserve)
  • (white,white)

However, I have yet to explain why I find this scheme desirable.

Video Codec: Affine Deltas

Posted by cafebabe on Tuesday July 18 2017, @01:09AM (#2509)
0 Comments
Software

(This is the 28th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

In addition to a video compression motion delta, it is possible to specify differences between frames where a square of texture is shrunk and/or rotated.

An affine is a fancy term which often evokes fractal ferns or the rather hypnotic Electric Sheep screen saver. But affine means rotation and translation. You may change the color of something, shrink it or otherwise distort it but that process will include a rotation and a translation. Oddly, an affine may have zero degree rotation, zero translation plus miscellaneous stuff. So, technically, any action in a two dimensional space (or higher) is an affine. In practice, anything with a meaningful matrix operation is an affine.

For bitmaps and codecs, we can restrict rotation to four positions. This rotation can be represented very concisely in two bits. With another two bits, we can represent optional horizontal and vertical mirroring. With more bits, we can represent scaling factors, relative brightness and inversion of texture.

In practice, everything except translation and a fixed scaling factor of two is required. I'm sorry to disappoint anyone who thinks I'm numerologically obsessed with base three audio, base three video and 1/3 pixel motion deltas. In this case, scaling down by a factor of two is the most likely to match, easiest to implement and incurs the least processing load. This is particularly important on a computer with a data cache hierarchy. Finding potential matches at a scaling factor of three incurs at least twice as much cache churn.

In either case, the binary representation of a motion delta and an affine delta may differ by only one bit. The code to perform matches may have common functionality and the access patterns may be similar. This is particularly true of there is a maximum radius for potential matches. Therefore, it is strongly encouraged to interleave motion delta and affine delta matching. This allows motion delta matching to occur with very left overhead when affine matching is also performed.

However, with very infrequent key-frames and affine quadtrees, switching between live video streams could be made deliberately rough. This would be partly to avoid bandwidth peaks when fetching data and partly to ease implementation. The result would look rather like a blocky, dissolving replacement algorithm. The nearest example I can find is the rather jingoistic parody adverts from the film: Starship Troopers which generally end with "Would you like to know more?" (For a particularly simple video interface, see sequences from any episode of comedy sketch show the Glam Metal Detectives, although the content ranges from surreal to disturbing.)

Video Codec: Fractional Pels

Posted by cafebabe on Monday July 17 2017, @06:41PM (#2508)
5 Comments
Software

(This is the 27th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

An obscure topic on video compression forums is the merits of hpel versus qpel. Consensus is that qpel should only should only be enabled where the majority of content moves slowly. What is this and is it even halfway correct?

A video compression motion delta may be specified in different forms. MPEG1 allows half pixel per frame movement to be specified. MPEG4 allows quarter pixel per frame movement. These are known a hpel and qpel units respectively. Half pixel movement has a profound limitation. Consider a checkerboard of highly contrasting pixels. Half pixel movement and the associated averaging will convert every pixel to mid gray in one step. And that's just in one direction. An hpel applies to horizontal and vertical movement. Therefore, it is common for a chunk of screen to become a four-way average of nearby texture. A qpel offers the possibility of a 1:3 mix in one direction or a 1:3:3:9 mix in two directions. This preserves some texture.

However, qpel has a different limitation. If bits per delta are fixed, a qpel only covers 1/4 of the screen area of an hpel and therefore a worthwhile match is less likely to occur.

A reasonable compromise can be made by specifying one third pixel steps. This is a tpel. A tpel provides maximum contrast which is moderately worse than qpel. However, it never incurs a qpel's worst case. Furthermore, a tpel is able to texture match over a larger area than a qpel.

Perhaps we should search further into the realm of 1/n motion deltas? 1/5 provides minimal benefit. 1/6 provides much of the functionality of 1/2 and 1/3. And anything smaller than a qpel provides very little area in which to match texture. So, pel, hpel, tpel and qpel offer the most range and flexibility. If downward compatibility is ignored, specifying a motion delta in tpel only would be a moderate choice. However, when transcoding legacy content, the failure to match features incurs either a loss of quality, an increased bit-rate (codec impedence mismatch) or a mix of these disadvantages.

Texture matching is a particularly asymmetric and processor intensive task. Where horizontal and vertical displacement are each defined as eight bit fields, there are more than 65000 potential matches and each may require a minimum of 16×16 unaligned memory accesses in each of three color-planes.

On some processor architectures, this invokes an edge case where unaligned memory access across a virtual memory page boundary incurs a 3600 clock cycle delay (and here).

Potential matches can be significantly reduced by setting a maximum radius. For real-time transcoding, the maximum radius may be dynamically adjusted up to a specificied maximum.

A further catch is that texture matching may be performed using the wrong quality metric. As noted in the defunct Diary Of An X264 Developer, if the quality metric is to obtain an approximate texture then the result will be an approximate texture but if the quality metric is to obtain sharpness between pixels then the result will be sharpness between pixels. That requires computing the difference between horizontally adjacent pixels and vertically adjacent pixels and using those as additional inputs for the approximation. That will be slower and especially so if code is written such that (2^n)-1 loop iterations prevent loops being unrolled properly or at all.

Video Codec: Key-Frames

Posted by cafebabe on Monday July 17 2017, @04:53AM (#2506)
0 Comments
Software

(This is the 26th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

For many years, the standard technique for video compression was to have one key-frame followed by a succession of differences. The standard technique became entrenched to the point that "key-frame every 16 frames" was almost mandatory. In extreme circumstances, I've stretched this to "key-frame every 600 frames" but I really strongly don't recommend repeating it because it is high susceptible to corruption and seeking within the video is extremely unresponsive.

In theory, many video formats allow bi-directional playback. So, it should be equally easy to play a video forwards and backwards. This relies on motion deltas being specified in a bi-directional manner. However, this feature is usually ignored because it significantly increases size but only provides marginal benefit. Also, it provide no benefit at all when randomly seeking to a frame of video. When seeking to a frame before a key-frame, the preceeding key-frame must be decoded in full and then the next 15 diffs must be applied. For any amount of processing power, there is a resolution of video in which this process cannot be performed rapidly. Even when reverse differences are available, a key-frame must be decoded in full and up to eight diffs may be applied.

The best feature of the BBC's Dirac codec greatly improves this situation. It simultaneously decreases the frequency (and bulk) of key-frames and increases the quality of the remaining frames. It also improves random access to arbitrary frames and this may be the reason for its development.

The BBC wishes to produce all video content from one common platform. This means dumping all raw camera footage to one respository and generating XML edit lists which reference video within the respository. It would also be useful if archived video and streamed video was in the same high quality format. Well, the BBC has some success with edit lists. However, the remainder has been disappointing with the exception that a codec with a very natty feature was developed.

Dirac arranges frames into a B+ tree. The root of the tree is a key-frame and the remainder are diffs. In theory, each tier of the tree may have a different fan-out. Unless you're doing anything particularly odd, the tree will always be a binary tree. (Tiger trees used in BitTorrent are similar. Fan-out can be anything but is invariably two.)

Tree decode require multiple sets of video buffers. However, even for 3840×2160 RGB at 16 bits per channel, that requires 50MB per tier. And there is scope to trade storage for processing power. Regardless, the advantage of this arrangement is considerable. For n tiers of tree, a key-frame spacing is (2^n)-1 but diffs never exceed n-1. So, for eight tiers, key-frame spacing is 255 but a frame is never constructed from more than seven diffs. That means image quality and seek time is superior to MPEG1 while storage for key-frames is significantly reduced. Admittedly, there is more change between many of the diffs. A minimum of 1/3 of the diffs are to the immediately following frame. A minimum of 1/3 of the diffs are to the subsequent frame. And the remainder cover larger spans. This increases the size of the average diff but it is invariably smaller than frequent key-frames. It is also more resilient to corruption.

Oh, we might be in one of the odd cases where a binary tree isn't the obvious option. With ternary audio, it might be beneficial to match it with ternary video. Unfortunately, if we retrieve 8192 audio samples per request and play 2000, 1920, 1600, 960 or 800 samples per frame then it might not make a jot of difference.

Video Codec: Motion Deltas

Posted by cafebabe on Monday July 17 2017, @03:13AM (#2505)
0 Comments
Software

(This is the 25th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

MPEG1 was a huge advance in video compression but there was one feature which struck me as idiotic.

Before MPEG1, schemes to encode video included CinePak and MJPEG. CinePak uses a very crude color-space matrix transform to reduce bandwidth and processing load. With processing power available nowadays, this could be replaced with something more efficient. CinePak defines horizontal regions. Again, this could be replaced with tiles. CinePak also works in a manner in which it degrades into a rather organic stipling effects when it is overwhelmed. Overall it has good features which are worth noting.

Unfortunately, it was overshadowed by MJPEG. This was a succession of JPEG pictures and had the distinct advantage that encode time was approximately equal to decode time. This made it suitable for low latency, real-time applications, such as video conferencing. Unfortunately, that is about the extent of MJPEG's advantages. The disadvantage is MJPEG only has the image quality of JPEG. The most significant limitation in JPEG is that the use of DCT over small regions leads to artifacts between regions. In the worst case, JPEG artifacts make a picture look like a collection of jigsaw pieces. When applied to MJPEG, the hard boundaries are unwaveringly in the same position in each frame. MJPEG also fails to utilize any similarity between successive frames of video.

MPEG1 changed matters drastically. Like MJPEG, MPEG1 used a JPEG DCT. However, it is typically used in bulk every 16 frames. The 15 frames in between are a succession of differences. In practice, this reduces the volume of data by a factor of three. However, the techniques used between key-frames are truly awful and there is a noticable difference in sharpness when each key-frame is displayed. At a typical 24 frames per second, this occurs every 2/3 second.

That's an unfortunate effect which otherwise allowed full audio and video to be played from CDROM at no more than 150KB/s. The idiotic part is that the techniques don't scale but that hasn't stopped people taking them to absurd extremes.

Between MPEG1 key-frames, a range of techniques can be applied. Horizontal or vertical strips of screen can be replaced in full. This is rarely applied. The exception is captions and titling which may cause significant typically cause significant change to a small strip at the bottom of a screen. Another technique is lightening or darkening of small regions or strips. However, it is the motion delta functionality which is most problematic and not just because movement is mutually exclusive with change in brightness.

Within MPEG1, it is possible to specify 16×16 pixel chunks of screen which move over a number of frames. There is an allocation of deltas and, in any given frame, movement can be started or stopped. Therefore, a chunk of screen which moves over eight frames requires no more encoding overhead than a chunk of screen which moves over one frame. Unfortunately, the best encoding quickly becomes a combinatorial explosion of possibilities to the extent that early MPEG1 encoding required hiring a super-computer.

Having regions of screen moving about autonomously isn't the worst problem. As screen resolution increases, motion deltas have to increase in size or quantity. This isn't a graceful process. If horizontal resolution doubles and vertical resolution doubles then motion deltas should quadruple in size and/or quantity. At 352×288 pixels (or less), hundreds of motion deltas are worthwhile but at 1920×1080 pixels other techniques are required.

The simplest technique to ensure scalability is to use a quadtrees. Or, more accurately, define a set of tiles where each tile is an separate quadtree. This technique has one obvious advantage. Each tile may be lightened, darkened, moved or replaced with very concise descriptions. Furthermore, each tile can be sub-divided, as required. Therefore, irregular regions can be lightened, darkened, moved or replaced. If an object spins across a screen, this could easily overwhelm MPEG1. The output would look awful. However, with a quadtree, each piece of an object can be approximated. Likewise for zoom. Likewise for camera shake.

(Use of quadtrees does not eliminate co-ordinates when describing regions. Use of quadtrees merely amortizes the common top bits of co-ordinates during recursive descent and eliminates bottom bits when action for a large region is described.)

The tricky part is to define an encoding where a collage of tile types can co-exist. When this is achieved, a very useful, practical property arises. It is possible make an encoder in which choice of tile type may be restricted. Therefore, one implementation of encoder and decoder may cover the range from low latency, symmetric time, lossless compression to high latency, asymmetric time, lossy compression. By the geometric series, a quadtree has (approximately) one branch for every three leaves. Where branches are always one byte and leaves are always larger, the overhead of branches never exceeds 1/6 of the video stream. Indeed, if tile types are restricted to only JPEG DCT of the smallest size, performance is only a little outside of MJPEG parameters. When more tile types are enabled, performance exceeds MPEG1 by size and quality.

However, the really good part is that quality can be maintained within known bounds even if a piece of a video frame is missing. Tolerance to error becomes increasingly important as the volume of data increases. It also provides options to watch true multi-cast video and/or watch video over poor network connections.

From empirical testing with the 4K trailer for Elysium, I've found that everything is awesome at 4K (3840×2160 pixels). When the encoder recurses from 64×64 pixel tiles down to 8×8 pixel tiles and incorrectly picks a solid color tile, the result still looks good because the effective resolution is 480×270 pixels. So, even when the codec under development goes awry, it often exceeds MPEG1's default resolution.

Video Codec: Color And Transparency

Posted by cafebabe on Monday July 17 2017, @12:18AM (#2504)
0 Comments
Hardware

(This is the 24th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

(This description of color video excludes SECAM and many other details whih are not relevant to explaining current or future techniques.)

Historically, television was broadcast as an analog signal. Various formats were devised. The two most commonly used formats were NTSC and PAL. NTSC was generally used in parts of the world where mains electricity supply was 60Hz. PAL was generally used in parts of the world where mains electricity supply was 50Hz. Both schemes use interlacing which is reasonably good compromise for automatically representing rapidly moving images at low quality and detailed images at higher quality.

NTSC has one odd or even field per 1/60 second. PAL has one odd or even field per 1/50 second. This matching of frame rate and electricity supply ensures that any distortion on a CRT [Cathode Ray Tube] due to power fluctuation is minimized because it is steady and consistant across many frames of video.

NTSC and PAL differ in matters where PAL had the benefit of hindsight. NTSC uses logarithmic brightness. This works well in ideal conditions but otherwise creates additional harmonics. PAL uses linear brightness. Also, PAL [Phase Alternate Line] inverts lines of video such that imperfections in the video signal are self-compensating. NTSC doesn't have this feature and is cruelly known as Never Twice the Same Color.

Analog television was originally broadcast in monochrome. As you'd expect, NTSC and PAL were extended in a similar manner. It was a clever technique which was downwardly compatible with monochrome receivers. It also takes into account perceptual brightness of typical human vision.

Two high frequency signals were added to the base signal. These conceptually represent red minus green and red minus blue. When the signal was rendered on a legacy monochromatic screen, objects are shown with expected brightness. When the signal was rendered on a color screen, the additional data can be decoded into three primary colors: red, green and blue. Furthermore, the signal may be encoded in proportion to light sensitivity. Human vision is typically sensitive to broad spectrum brightness and three spectral peaks. The base signal plus modulated color provide good representation in both modes.

So, we have two schemes which minimize analog distortion and maximize perceptual detail. A full frame of video is broadcast at 25Hz or 30Hz. A field is broadcast at 50Hz or 60Hz. One line is broadcast at approximately 15.6kHz. Color may be modulated at approximately 3.58MHz for NTSC and 4.43MHz for PAL. The effective bandwidth of the signal was 6Mb/s. All of this may be recorded to tape, sent between devices as composite video or broadcast over UHF.

Digital video follows many of these principles. Most significantly, digital video is typically encoded as a brightness and two components of color. Although the details vary, this can be regarded as a three dimensional matrix rotation. For eight bit per channel video, rotation creates horrible rounding errors but the output is significantly more compressible. An object of a particular color may have minor variation in brightness and color. However, across one object or between objects, brightness varies more than color. It is for this reason that human vision is typically more sensitive to brightness.

A practical, binary approximation of light sensitivity is a Bayer filter in which green pixels occur twice as frequently as red or blue pixels. Differing arrangements of cells used by different camera manufacturers contribution to a proliferation of high-end "raw" still image formats. It can lead to a infuriating lack of detail about camera resolution. Technically, single CCD cameras don't have any pixels because they are all interpolated from the nearest cells in a Bayer filter. Regardless, use of a Bayer filter accounts for the switch from analog blue screen compositing to digital green screen compositing.

With or without interpolation, color is typically stored at half or quarter resolution. So, a 2×2 grid of pixels may be encoded as 4:4:4 data, 4:2:2 data, 4:1:1 data or other encoding. Typical arrangements include one average value for each of the two color channels or vertical subsampling for one color channel and horizontal subsampling for the other color channel. (Yes, this creates a smudged appearance but it only affects fine detail.)

Channels may be compressed into one, two or three buffers. One buffer minimizes memory and compression symbol tables. Two buffers allow brightness and color to be compressed separately. Three buffers allow maximum compression but requires maximum resources.

There are at least eight colorspace conversion techniques in common use. The simplest technique in used in CinePak video compression. This only requires addition, subtraction and single position bit shifts (×2, ÷2) to convert three channels into RGB data. This crude approximation of YUV to RGB conversion minimizes processor load and maximizes throughput. This allowed it to become widespread before other techniques became viable. Regardless, the encoding and decoding process is representative of other color-spaces.

For CinePak color-space decode, variables y, u, v are converted to r, g, b with the following code:-

r=y+v*2;
g=y-u/2-v;
b=y+u*2;

This is effectively a matrix transform and the encode process uses the inverse of the matrix. This is significantly more processor intensive and requires constants which are all integer multiples of 1/14. Specifically:-

y=(r*4+g*8+b*2)/14;
u=(r*-2+g*-4+b*6)/14;
v=(r*5+g*-4+b*-1)/14;

Other color-spaces use more complicated sets of constants and may be optimized for recording video in studio conditions, outside broadcast or perceptual bias. In the case of JPEG or MJPEG, color-space may be implicit. So, although it is possible to encode and decode pictures and video as RGB, arbitrary software will decode it as if it was YIV color-space.

Images and video may be encoded in other color-spaces. One option may be palette data. In this case, arbitrary colors may be decoded from one channel via one level of indirection. Typically, a table of RGB colors is provided and a decoded value represents one color in the table. One or more palette values may have special attributes. For example, GIF allows one palette entry to represent full transparency. PNG eschews palettes and allows arbitrary RGBA encoding. (Furthermore, GIF and PNG each have an advanced interlace option.)

No concessions are made for the relatively common case of red-green color-blindness. In this case, DNA in a human X chromosome has instructions to make rhodopsin with a yellow spectral peak instead of a green spectral peak. With the widespread use of Bayer filters, digital compression and LCD or LED display, it is possible to provide an end-to-end system which accommodates the most widespread variants of color response. At the very least, it should be considered a common courtesy to provide color-space matrix transforms to approximate the broader spectral response when using legacy RGB hardware. Even when a transform is implemented in hardware, two or more options should be given.

It may also be worthwhile to encode actinic (also known as thule). This is near ultra-violet which is perceived as purple white by people without ultra-violet filtering eye lens. This may occur due to developmental issue, injury or surgery. Early artificial lenses lacked UV filtering and therefore cateract patients commonly gained the ability to see near UV.

Alternatively, it may be worthwhile to define a number of arbitrary spectral peaks and optionally allow them to be grouped into two, three or four corollated channels.

One desirable task for a video codec is screen or window remoting. For best effect, this requires channels outside RGBA or equivalent. Other channels may include a blur map and a horizontal and vertical displacement map. The QNX Photon GUI allows this functionality. However, it is implemented as an event driven model where re-draw requests to an application may be re-emitted to windows further down the window stack. The returned bitmap may be arbitrarily processed before it is passed back up the stack. When the response to a re-draw request reaches the top of the stack, it is rendered to a display buffer which may or may not be virtualized. While this architecture allows arbitrary transparency effects, it also allows any malicious application to snapshot, OCR or otherwise leak data which is displayed by other applications.

There is a further limitation with the compositing event model. A window may only be in one place. It cannot be remoted to multiple devices or shared with multiple users. However, if each display performs compositing, a window may appear correctly on all of them.

In summary, a video codec should provide:-

  • Different resolutions.
  • Different frame rates.
  • Interlace.
  • Monochrome.
  • Arbitrary permutations of red, yellow, green and blue channels.
  • Color-planes outside of visible spectrum.
  • Transparency.
  • Other per-pixel meta-data channels.