Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


UDP Servers, Part 1

Posted by cafebabe on Thursday July 06 2017, @06:16AM (#2473)
10 Comments
Software

(This is the seventh of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Using UDP for streaming video seems idiotic and is mostly deprecated. However, it has one huge advantage. It works really well with a large number of concurrent users. Within reason, it is worthwhile to maximize this number and minimize cost. Network games can be quite intensive in this regard with SecondLife previously rumored to having two users per server and WorldOfWarcraft having 20 users per server. At the other extreme, Virtudyne (Part 1, Part 2, Part 3, Part 4) aimed to run productivity software with 20 million users per server. That failed spectacularly after US$200 million of investment.

Maintaining more than 10000 TCP connections requires a significant quantity of RAM; often more than 1MB per connection. A suitably structured UDP implementation doesn't have this overhead. For example, it is possible to serve 40 video streams from a single-threaded UDP server with a userspace of less than 1MB and kernel network buffers less than 4MB. All of the RAM previously allocated to TCP windowing can be re-allocated to disk caching. Even with a mono-casting implementation, there are efficiency gains when multiple users watch the same media. With stochastic caching techniques, it is easy for network bandwidth to exceed storage bandwidth.

There is another advantage with UDP. FEC can be sent speculatively or such that one resend may satisfy multiple clients who each lack differing fragments of data. It is also easy to make servers, caches and clients which are tolerant to bit errors. This is particularly important if a large quantity of data is cached in RAM for an extended period.

So, what is a suitable structure for a UDP server? Every request from a client should incur zero or one packets of response. Ideally, a server should respond to every packet. However, some communication is obviously corrupted, malformed or malicious and isn't worth the bandwidth to respond. Also, in a scheme where all communication state is held by a client and all communication is pumped by a client, resends are solely the responsibility of a client.

From experience, it is wrong to implement a singleton socket or a set of client sockets mutually exclusive with a set of server sockets. Ideally, library code should allow multiple server sockets and multiple client sockets to exist within userspace. This facilitates cache or filter implementation where one program is a client for upstream requests and a server for downstream requests.

"Streaming" Video Versus Streaming Video

Posted by cafebabe on Wednesday July 05 2017, @11:15PM (#2471)
5 Comments
Software

(This is the sixth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

I've got to practical problems with resource naming and practical problems with networking. These problems interact badly with video over the Internet.

In the 1990s, video on the Internet was typically sent to users as lossy UDP. Apple's QuickTime3 had an option to optimize .mov files to aid this type of distribution. This functionality seems to be deprecated and we've devolved to the situation where "streaming" video involves mono-casting small chunks of video over HTTP/1.1 over TCP. There is a top-tier of companies which likes this arrangement. This includes Google, Apple, Microsoft, Netflix and a few parties who provide infrastructure, such as Akamai and CloudFlare. Many of them prefer the analytics which can be gained from mono-casting. To differing degrees, each knows where and when you've paused a video. Big Data provides the opportunity to discover why this occurs. Perhaps the content is violent, challenging or difficult to understand. Perhaps you had a regular appointment to be elsewhere. It should be understood that mono-casting over TCP provides the least amount of privacy for users.

TCP is completely incompatible with multi-cast and therefore chosing TCP excludes the option of sending the same packets to different users. The investment in infrastructure would be completely undermined if anyone could live-stream multi-cast video without centralized intermediaries running demographic profiling advert brokers on sanitized, government approved content.

Admittedly, UDP has a reputation for packet flood but TCP congestion control is wishful thinking which is actively undermined by Google, Microsoft and others. Specifically, RFC3390 specifies that unacknowledged data sent over TCP should be capped. In practice, the limit is about 4KB. However, Microsoft crap-floods 100KB over a fresh TCP connection. Google does similar. If you've ever heard an end-user say that YouTube is reliable or responsive, that's because YouTube's servers are deliberately mis-configured to shout over your other connections. Many of these top-tier companies are making a mockery of TCP congestion control.

Ignoring the packet floods, there are some fundamentals that cannot be ignored. Lossy video uses small packets. The scale of compression which can be attained within a IPv6 1280 byte packet is quite limited, especially when encryption headers are taken into account. The big guns of compression (lz4, lzo, lzma, Burrows-Wheeler transform) are still warming-up in the first 1MB. That dove-tails nicely with HTTP/1.1 but restricts UDP to old-school compression, such as LZ77, Shannon-Fano or arithmetic compression.

However, if we define video tiles in a suitable manner, incorporate degradation in the form of a Dirac codec style diff tree and make every video tile directly addressable then everything works for multiple users at different sites. The trick is to make video tiles of a suitable form. I'm glad that quadtree video has finally become mainstream with HEVC because this demonstrates that practical savings which can be made by amortizing co-ordinates. Further savings can be made via the Barnsley's Collage theorem. The net result is that disparate tile types can be butted together and the seams will be no less awful than JPEG macro-blocks.

Unfortunately, if we wish to be fully and practically compatible with embedded IPv6 devices, we have a practical UDP payload of 1024 bytes or 8192 bits. This determines maximum video quality when using different tile sizes:-

  • 32×32 pixel blocks permits 8 bits per pixel.
  • 64×64 pixel blocks permits 2 bits per pixel.
  • 128×128 pixel blocks permits 0.5 bits per pixel.

There is a hard upper-bound on payload size but are these figures good or bad? Well, H.264 1920×1080p at 30 frames per second may encode from 4GB per hour to 6GB per hour. That's 224 billion pixels and an upper-bound of 48 billion bits - or about 0.23 bits per pixel.

So, what type of tiles would be encoded in the quadtree? Well, enabling different subsets of tiles would provide a codec which ranged from symmetric lossless to highly asymmetric lossy. At a minimum, an H.261 style DCT is essential. On its own, it would provide symmetric lossy encoding which matches MJPEG for quality while only incurring a small overhead for (rather redundant) quadtree encoding. It would also be useful to add texture compression and some rather basic RLE. This would be particularly useful for desktop windowing because it allows, for example, word-processing and web browsing to be rendered accurately and concisely.

What is the maximum overhead of quadtree encoding? Well, for every four leaves, there is one branch and for every four branches, there is another branch. By the geometric series, the ratio of branches to leaves is 1:3 - and branches are always a single byte prior to compression. So, quadtree overhead is small.

So, we've established that it is possible to make a video system which is:-

  • Lossy or lossless.
  • Provides symmetric or asymmetric encode/decode time.
  • Provides low-latency encoding modes.
  • Works with directly addressable video tiles of 64×64 pixels or larger.
  • Works with embedded devices which implement a strict 1280 byte packet limit.
  • Works with 70% round-trip packet loss.
  • Allows frame rate degradation.
  • Can be configured to incur worst-case storage or bandwidth cost which is only marginally worse than MJPEG.
  • May be suitable as transport for a windowing system.
  • Allows content to be viewed on an unlimited number of devices and/or by an unlimited number of users at any number of sites.
  • Allows implementation of video walls of arbitrary size and resolution.

It won't match the bit-rate of H.264, WebM or HEVC but it works over a wide range of scenarios where these would completely fail.

@NPR condones liberal violence!

Posted by realDonaldTrump on Wednesday July 05 2017, @04:35PM (#2470)
1 Comment
Topics

retweet
So, NPR is calling for revolution. Interesting way to condone the violence while trying to sound "patriotic". Your implications are clear.

My 4th of July Coup

Posted by The Mighty Buzzard on Wednesday July 05 2017, @01:52PM (#2469)
20 Comments
Soylent

In case anyone's been wondering why a hefty plurality if not an outright majority of the stories pushed over the weekend and the first couple days of the week came from yours truly, it's because I hopped on my cavalry bear, rode all over the country, and beat all the Editors except martyb (who I saved for last and only had to threaten into submission) upside the head with a double-barrel chainsaw.

Or it's because I saw there was pretty much nothing except partisan hack jobs and bloody stupid garbage that I sincerely hope the Eds never publish in the submission queue and quickly subbed everything remotely interesting that I found in my feed reader.

Believe whichever amuses you the most.

Benefits Of Internet Protocol 6 Greatly Over-Stated

Posted by cafebabe on Wednesday July 05 2017, @02:52AM (#2468)
13 Comments
Software

(This is the fifth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

What are the properties usually associated with Internet Protocol 6? Multi-cast? A huge address space? A cleaner implementation with more packet throughput? Whatever.

Multi-Cast

Internet Protocol 4 addresses from 224.0.0.0 to 239.255.255.255 are for multi-cast, as defined in RFC1112. Unfortunately, multi-cast is incompatible with TCP, so that's 2^26 Internet Protocol 4 addresses and 2^120 Internet Protocol 6 addresses which don't work with YouTube or QuickTime.

Address Space

Internet Protocol 6 has 128 bit addresses. That's more addresses than atoms in the visible universe. However, there are edge cases where that's insufficient. Internet Protocol 4 has 32 bit addresses (by default) and that was considered vast when it was devised. That was especially true when total human population was less than 2^32 people. Superficially, it was possible to give every person a network address.

Several address extension schemes have been devised. The best is RFC1365 which uses option blocks to optionally extend source and destination fields in a manner which is downwardly compatible. So, what size is an Internet Protocol 4 address? 32 or more bits, as defined by RFC1365.

Header Size

Internet Protocol 4 is often described as having a 20 byte (or larger) header while Internet Protocol 6 is often described as having a header which is exactly 40 bytes. This is false. IPv6 has option blocks just like IPv4 and therefore both have variable length headers. The difference is that IPv6 headers are usually 20 bytes larger.

Packet Size

IPv4 typically supports a PMTU of 4KB or more. Admittedly, there are no guarantees but Ethernet without packet fragmentation provides about 1500 bytes. With PPPoA or PPPoE over AAL5 over ATM, 9KB payloads only fragment over the last hop. This is ideal for video delivery. IPv6 only guarantees 1280 bytes. How common is this? Numerous variants of micro-controller networking only support 1280 buffers. This is especially true for IPv6 over IEEE802.15.4 implementations. This is especially bad for video.

Packet Fragmentation

IPv6 has no packet fragmentation. IPv6 packets which exceed MTU always disappear.

Packet Throughput

Compared to IPv4, IPv6 generally has longer headers, longer addresses and shorter payloads. On this basis, how would you expect packet throughput of IPv6 to match or exceed IPv4?

Summary

The introduction of IPv6 provides no particular benefit to end-users. IPv6 is detrimental payload size and this is particularly detrimental to video delivery.

Problems With Names, Part 3

Posted by cafebabe on Wednesday July 05 2017, @01:02AM (#2467)
0 Comments
Software

(This is the fourth of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Via a fairly arbitrary process, I've got to a stage of explaining pragmatic problems with naming stuff in a computer.

Many naming schemes require a name resolution process. The process used by DNS is overly complicated. Although it would be easy to poke fun at systemd's repeated failures in this field, djbdns has been attacked with cache poisoning and steps have been taken to mitigate timing attacks by implementing UDP port randomization. An observation which could be made from djb's similar syncookies is that it is insufficient to implement baseline TCP or DNS because the Internet has become too hostile.

I investigated the implementation of Project Xanadu transclusion using DNS. The use of UDP to assemble small fragments of data in a fault-tolerant manner has obvious appeal. DNS's maximum fragment length of 255 bytes is unfortunate but can be overcome. However, popular DNS implementations, such as BIND are don't cache unknown response types. This means A records, TXT records or suchlike would have to be overloaded. Ideally, this should be handled carefully due to compressed encoding workarounds and record permutation compatible with the POSIX legacy singleton record interface.

It is possible to ignore these pleasantries and transport raw data as A records but this has two problems. Firstly, to obtain raw records, the local resolving has to be located without the benefit of the host's DHCP or suchlike. Secondly, administrators baulk at volume of data transported via their DNS infrastructure - and rightly so because this doesn't degrade gracefully. One user's browsing would be sufficient to cause intermittant or sustained host resolution failure within a local network. Video streaming over DNS is feasible to implement but would be anti-social to anyone sharing upstream infrastructure.

An erroneous assumption made by Project Xanadu is that multi-media can be handled almost as a corollary of handling text. I presume the logic is that multi-media is traditionally serialized and therefore can be subjected to similar constraints as paragraphs of text. However, I believe this is backward. If multi-media can be handled as three dimensional data (horizontal, vertical, frame number) or two dimensional data (audio channel, position) then one dimensional data (text) is the corollary.

So, an outline of requirements is as follows:-

  • Simplified DNS is desirable.
  • Traffic priority should be lower than legacy DNS implementation.
  • Should allow server fail-over.
  • Should allow 8 bit clean queries.
  • Should allow 8 bit clean responses.
  • Should have a compact representation.
  • Should provide authentication.
  • Should provide encryption.
  • Should provide server delegation.
  • Should provide a correct implementation of cache invalidation.
  • Data should be served as UDP or similar.
  • Should be optimized for small payloads but allow arbitrary volumes of data.
  • Should permit addressing of one frame of video.
  • If possible, should permit addressing of a range of video frames.

Although this looks like a system favoring a hierarchy of servers within one domain of trust, it remains possible to implement a federation of server hierarchies.

Problems With Names, Part 2

Posted by cafebabe on Tuesday July 04 2017, @02:38PM (#2466)
3 Comments
Software

(This is the third of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

How did we get to the current state of desktop computing and why is the future so uncertain? How did this veer so quickly into one of the deep topics of philosophy?

When Xerox PARC was founded in 1970, object programming languages, arbitrary bitmap displays, vector graphic storage formats, laser printing and ubiquitous desktop computer networking had yet to be developed. The development of a kid-safe computer which was no larger or heavier than a pad of paper was a futuristic dream. Nowadays, it seems laughably quaint. Unfortunately, continuous reference to paper has left us in a fascist dystopia of bureaucracy.

Our fundamental unit of information is not a bit, a factoid or a qubit. Our fundamental unit of information is a document, a form, a search, a sale or a recording. Unfortunately, with technology such as voice activated search, these units are often intertwined. In the broader case, this information is often used against a people who can or cannot be easily herded.

One future direction for technology is the full implementation of Ted Nelson's Project Xanadu. Ted Nelson is extremely irked by the current state of hypertext (a word he invented or popularized). However, his vision for orbiting satellites caching data paid with micropayments (another word he invented or popularized) remains outlandish after more than 40 years.

However, it is becoming less outlandish and is within the realm of tractable. Specifically, HTTP/1.0, as defined in RFC1945 reserved response code 402 for payment. When this was defined in May 1996, it was seen as a nod to our forefathers but with crypto-currency worth US$100 billion, implementation is a question which is being asked with increasing frequency.

It may be that Ted Nelson is fully vindicated and that the transition to DOI style tumblers and micropayments required a diversion via Zooko's triangle (however badly implemented). Prior to HTML and HTTP, hypertext was a knot of style, presentation, interface, content and transport. Systems such as Microsoft Help and AmigaGuide typically stored multiple documents in one annotated text file. HTTP/0.9 decisively cleaved storage, transport and presentation. However, it left a foreseeable trail of a trillion broken hyperlinks and an economic model which ate broadcast and print advertising.

If you've been following closely, you'll see that there are multiple possible formats for a reference to a resource. Typical formats include:-

  • foo://host.domain.tld/path/to/res.ext - URL.
  • bar:012345678abcdef - Hash of content.
  • baz:9.17.2.6.22-24 - Tumbler range.

Each has limitations. All function within a string namespace, although, historically, inter-operability usually fails. Ignoring this, questions remain. What do we wish to reference? How do we wish to reference it? (Source. Destination. Quantity. Frequency.)

There are further complications with references and I intend to describe them next.

Problems With Names, Part 1

Posted by cafebabe on Tuesday July 04 2017, @06:44AM (#2464)
2 Comments
Software

(This is the second of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Many of the limitations of computers occur because we have a poor simulation of paper popularized by a photocopying company.

However, there is another way. For this, I choose something which approximates a URL as a building block of information.

All problems in computer science can be solved by another level of indirection. -- Butler Lampson, Xerox.

By chosing a pointer in a namespace, we may reference legacy data (documents, multi-media files) in addition to data in the structure of our choice. If a URL, URN or URI was sufficient for this task then no further work would be required. However, a different approach is strictly necessary because:-

There are only two hard things in computer science: cache invalidation and naming things. -- Phil Karlton, Netscape.

In the current form, URLs, cookies and caching interact badly. For example, HTML requests made with different cookies cannot be shared among users. Whereas, image requests made with different cookies are handled as if the cookies were absent. This situation requires cache authors to implement a heuristic to ensure that most websites are compatible with one cache. Whereas, authors of popular websites implement a heuristic to ensure that most caches are compatible with their websites. The lack of fundamentals (no formal specification for URL caching) requires multiple parties to maintain complicated models. This is required so that a grammatically correct request for grammatically correct content behaves as expected.

A general problem with naming is Zooko's Triangle: Distributed, Secure, Human-Readable: Choose Two. However, even this would be an improvement. URLs incorporating DNS are not distributed. (DNS *servers* are distributed but a DNS namespace straddles one domain of trust.) URLs with or without SSL or TLS are not secure. 95% of users cannot read URLs.

Given an unconstrained choice, something like a Magnet URI would seem beneficial. However, this relies upon a chosen cryptographic hash function being a trapdoor function. If a practical quantum computer cracks the chosen trapdoor function then we have names which are neither secure nor readable. On the basis that security within a name cannot be guarateed, it may be preferable to err towards readable names and seek security elsewhere.

There are further complications with names and I intend to describe them next.

A Post-Xerox World

Posted by cafebabe on Tuesday July 04 2017, @05:14AM (#2463)
0 Comments
Software

(This is the first of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)

Someone from my makerspace said "We're living in a post-Xerox world." and he didn't mean Xerox's business of photocopiers.

The original objectives of Xerox PARC are in widespread use. That includes object programming languages, arbitrary bitmap displays, vector graphic storage formats, laser printing, ubiquitous desktop computer networking and - most notably - ruggedized, kid-safe computers which are no larger or heavier than a pad of paper.

He believes that we're in a malaise because we've reached a widely known goal and there is no agreed continuation. Actually, he believes the problem is more fundamental. Collectively, we don't know that there was a set of ideas popularized by one organization nor do we know that we attained it.

I believe that we have a different problem. We've implemented digital paper rather than digital computing. This would be a problem if it stayed in a computer but it often spills onto paper too. A word-processor and rapid printing is contrary to a paperless workflow. Most significantly, it has continued the trend of increasing bureaucracy and forms. Even in the 18th century, people complained that there were too many forms. Now we have forms on paper, in HTML, PDF, apps and elsewhere.

This is tedious. It costs time and money. There is the obvious cost of time spent completing a form and time spent by administrators to process a form. Many types of fraud can be achieved merely by completing forms. As an example, a dental fraud cost UK taxpayers £1.4 million (US$2 million). It was achieved by completing about 30 forms per day describing fictious dental treatment.

Bureaucracy is organized around institional memory. This may be quipu storage, filing cabinets, file servers or databases. Bureaucrats perform ingress and egress filtering. Both may be infuriating. Ingress filtering should ensure that institional memory is accurate but this rarely occurs when the cost of errors is externalized. For example, when hundreds of large companies have read/write access to credit checks, the majority of credit reports are wrong. Egress filtering should restrict disclosure of sensitive information but, again, costs are externalized. Obtaining any meaningful change of state may be difficult. Ensuring an accurate round-trip of data may be impossible. Likewise for any inter-organizational change of state.

Block-chain enthusiasts claim that smart-contracts will significantly reduce these problems. It could make inter-organizatioal state atomic. However, unwinding bad states may require an individual with the skills of a lawyer and a programmer. Furthermore, the reduced friction of transactions could make state changes more frequent. Essentially, by Jevons paradox, this encourages more bureaucracy. Meanwhile, the interface between person and bureaucracy may become increasingly quixotic.

Forms continue to proliferate without satisfactory user testing and each form remains an oblique signal for a bureaucracy to change state. That's increasingly irrelevant when nation-states are falling apart and corporations are increasingly untrustworthy. Indeed, while peons are completing each other's forms and contact each other in telephone call centers, our overlords collect data in bulk and transform it into structured data. The most benign example is marketing analytics for the purpose finding the local maxima of a business model. Another example is an opt-out, keyword aggregating marketplace which skims value from millions of parties. This is commonly called a search engine. Then there is the fascist, totalitarian panopticon of signals intelligence which treats every citizen as an enemy.

However, there is another way. It requires a more suitable building block of data. I intend to describe this next.

FakeNews CNN is now FNN, the Fraud News Network.

Posted by realDonaldTrump on Sunday July 02 2017, @06:33PM (#2459)
1 Comment
News

I am thinking about changing the name #FakeNews CNN to #FraudNewsCNN! Fraud News Network. #FNN pic.twitter.com/WYUnHjjUjg