There’s now a lot more nerds in elected office. Seventeen candidates with STEM-backgrounds ran their respective races Tuesday, from Virginia governor-elect Ralph Northam—a doctor—to Tiffany Hodgson, a neuroscientist who won a seat on the Wissahickon School Board in eastern Pennsylvania.
Many candidates decided to run only after President Donald Trump ushered in one of the most anti-science administrations in history. And a number of the campaigns sprung out of meetings with 314 Action, a political advocacy group that is helping scientists run for office.
“Voters are ready for candidates who are going to use their STEM training to base policy on evidence rather than intuition,” Shaughnessy Naughton, the founder of 314 Action, said in a press release. “Science will not be silenced.”
Theoretical Physicists Are Getting Closer to Explaining How NASA’s ‘Impossible’ EmDrive Works
Theoretical scientists are trying to understand why and how EmDrive propulsion works. The NASA paper suggests a tentative explanation based on a quantum physics theory, "a nonlocal hidden-variable theory, or pilot-wave theory for short."
A new research paper by a Portuguese scientist, titled "A Possible Explanation for the Em Drive Based on a Pilot Wave Theory" is now trending among EmDrive enthusiasts in the NasaSpaceFlight forum. The paywalled paper proposes a similar model to the NASA one (here's an open access preprint you can read.)
Pilot-wave theories have been proposed since the 1920s by quantum physicists, notably Louis de Broglie and David Bohm, to make sense of the weird behavior of quantum matter. Recently, pilot-wave quantum theories have gained more popularity after it was discovered that pilot-wave quantum-like behavior can be reproduced in classical fluids and explained by classical (non-quantum) fluid dynamics.
Not enough meat on these bones for another story, but you might be interested.
Previously: Explanation may be on the way for the "Impossible" EmDrive
Finnish Physicist Says EmDrive Device Does Have an Exhaust
EmDrive Peer-Reviewed Paper Coming in December; Theseus Planning a Cannae Thruster Cubesat
It's Official: NASA's Peer-Reviewed EmDrive Paper Has Finally Been Published
Space Race 2.0: China May Already be Testing an EmDrive in Orbit
Physicist Uses "Quantised Inertia" to Explain Both EmDrive and Galaxy Rotation
EmDrive 3.0: Wait, Where's EmDrive 2.0?
Trump’s Legacy: Damaged Brains
The pesticide, which belongs to a class of chemicals developed as a nerve gas made by Nazi Germany, is now found in food, air and drinking water. Human and animal studies show that it damages the brain and reduces I.Q.s while causing tremors among children. It has also been linked [open, DOI: 10.1093/jnci/djh324] [DX] to lung cancer and Parkinson's disease [DOI: 10.1136/oemed-2013-101394] [DX] in adults.
[...] This chemical, chlorpyrifos, is hard to pronounce, so let's just call it Dow Chemical Company's Nerve Gas Pesticide. Even if you haven't heard of it, it may be inside you: One 2012 study found that it was in the umbilical cord blood of 87 percent of newborn babies tested. And now the Trump administration is embracing it, overturning a planned ban that had been in the works for many years.
The Environmental Protection Agency actually banned Dow's Nerve Gas Pesticide for most indoor residential use 17 years ago — so it's no longer found in the Raid you spray at cockroaches (it's very effective, which is why it's so widely used; then again, don't suggest this to Dow, but sarin nerve gas might be even more effective!). The E.P.A. was preparing to ban it for agricultural and outdoor use this spring, but then the Trump administration rejected the ban. That was a triumph for Dow, but the decision stirred outrage among public health experts. They noted that Dow had donated $1 million for President Trump's inauguration.
So Dow's Nerve Gas Pesticide will still be used on golf courses, road medians and crops that end up on our plate. Kids are told to eat fruits and vegetables, but E.P.A. scientists found levels of this pesticide on such foods at up to 140 times the limits deemed safe. "This was a chemical developed to attack the nervous system," notes Virginia Rauh, a Columbia professor who has conducted groundbreaking research on it. "It should not be a surprise that it's not good for people."
[...] Democrats sometimes gloat that Trump hasn't managed to pass significant legislation so far, which is true. But he has been tragically effective at dismantling environmental and health regulations — so that Trump's most enduring legacy may be cancer, infertility and diminished I.Q.s for decades to come.
Asked in April whether Pruitt had met with Dow Chemical Company executives or lobbyists before his decision, a EPA spokesman replied: "We have had no meetings with Dow on this topic." In June, after several Freedom of Information Act requests, the EPA released a copy of Pruitt's March meeting schedule which showed that a meeting had been scheduled between Pruitt and Dow CEO Andrew Liveris at a hotel in Houston, Texas, on March 9.[91] Both men were featured speakers at an energy conference. An EPA spokesperson reported that the meeting was brief and the pesticide was not discussed.[92]
In August, it was revealed that in fact Pruitt and other EPA officials had met with industry representatives on dozens of occasions in the weeks immediately prior to the March decision, promising them that it was "a new day" and assuring them that their wish to continue using chlorpyrifos had been heard. Ryan Jackson, Pruitt's chief of staff, said in a March 8 email that he had "scared" career staff into going along with the political decision to deny the ban, adding "[T]hey know where this is headed and they are documenting it well."[93]
Papa John's Blames the NFL for Hurting Pizza Sales 🍕📉🔥
Papa John’s International Inc. founder John Schnatter is going after NFL Commissioner Roger Goodell, saying weak handling of the league’s national-anthem controversy has hammered sales of his pizza.
“The NFL has hurt us by not resolving the current debacle to the players’ and owners’ satisfaction,” Schnatter, who serves as the pizza chain’s chairman and chief executive officer, said on a conference call. “NFL leadership has hurt Papa John’s shareholders.”
The remarks follow a controversy over NFL football players protesting during the national anthem, a movement that started last season. The demonstrations have sparked calls for a boycott and raised concerns among league sponsors. But Schnatter’s comments mark the highest-profile example of an NFL partner publicly blaming the outcry for hurting business.
(This is the 53rd of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
I've been working on a video encoder and decoder which divides frames of video into square tiles and then sub-divides those tiles and applies a specified short-list of matching and compression algorithms to tiles of 256×256 pixels or significantly smaller. What real-time conferencing? Stick to MJPEG tile-types. Want lossless video? Avoid affine tile-types. Want maximum resilience? Minimize use of key-frames. Want best compression? Use everything.
In addition to viewing real-time or pre-recorded video, this arrangement can be used to display contemporary or legacy software which is running elsewhere. Indeed, it is likely to be used extensively for remoting LibreOffice Writer, Calc and Draw.
So, the video codec is likely to be used in scenarios where the majority of a window is plain white and the exceptions have a hard edge. I presumed that the patent-just-expired texture compression algorithms would handle this with ease. However, an ex-housemate suggested inclusion of the simplest historial method. Well, I'm not a complete misanthropic curmudgeon and I do listen to my friends so implemented a Run Length Encoding tile-type. Although, I wish that I hadn't.
Run length encoding schemes differ significantly and, in the case of the Amiga's IFF picture compression, may be defined ambiguously. A typical byte encoding would allow up to 128 repetitions of a following value or up to 128 literal values to follow. However, the cut could be defined anywhere. So, it could define 240 repetitions and 16 literal values. Or the opposite. There's no point defining one repetition and one literal because that's the same thing. Likewise for zero. So, definitions may be shifted by one or two values.
Should Run Length Encoding start afresh on each line? Implementations vary. The next problem is more pernicious. What is a repetition? If there is sequence of pixels and two pixels are the same, should an encoder algorithm cease encoding literals, encode a repetition of two values and then continue encoding literals? For 24 bit color, yes. For 8 bit monochrome, no. For 16 bit data, it depends. So, a dumb Run Length Encoder has to look ahead by one pixel. In some circumstances, a smart Run Length Encoder may have to look further ahead.
Stuff like this caused me to implement an assertion system inside my buffer structures. Specifically, there is a compilation directive which enables a shadow buffer with a type system. Arguably, it should be enabled by default but with intended usage of 3840×2160 pixel video divided into 64×64 pixel tiles and each of the 2040 tiles requiring multiple 4KB buffers, data-types on buffers would require a large amount of extra memory.
However, I've yet to get to the best part. I remember exactly why I didn't implement a Run Length Encoding tile-type. The RMS [Root Mean Square] error (or smarter variant) for Run Length Encoding is always zero. Therefore, when I include Run Length Encoding, the encoder invariantly chooses Run Length Encoding for every part of a video frame. Even if the choice metric is set to error divided by encode length, the result remains zero.
Run Length Encoding greatly improves quality but, also, it greatly increased encoding size. Attempts to restrict matches have been mixed. I've tried setting a minimum tile size and a maximum number of tokens per tile. However, it is easier to exclude it from the encoding process. This experience has made me slightly more of a misanthropic curmudgeon and I'm less inclined to take advice from people who know very little about codecs.
(This is the 52nd of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
I posted example code to perform an 8×8 bit matrix transpose. This is intended for multiple applications including driving multiple serial lines for a network protocol, driving multiple serial lines for I2C, driving multiple serial DACs and bit-plane audio and video compression. For this reason, I want something which is fast, flexible and reliable. I don't want to keep coming back to it. I certainly don't want to break or fork it. Although the example code is supposed to be optimized for eight bit processors, 16 bit processors, 32 bit processors and 64 bit processors, diassembly of 32 bit ARM GCC output found that the results were dismal. I am also disappointed by the amount of effort required to correct this deficiency.
I'm using a failing x86 laptop as a dumb terminal for a Raspberry Pi which is off-line and has a fixed configuration for the duration of the development cycle. Therefore, my first use of objdump -d to disassemble GCC output was applied to 32 bit ARM code on a Raspberry Pi. And the result is horrible. There are hidden functions prior to main(). That is known. But do they have to be trampoline functions? Is this related to ARM Thumb mode? Who knows. The compiled code is also quite laborious with stacks and stack frames. With hindsight, this would be the focus of my optimization. I'm aware of inlining and a large number of other compiler optimization techniques. However, with more streamlined stack conventions, there would be less code and less overhead to amortize when performing many of these optimizations. I suppose this is part of the "Why the blazes did the compiler do that???" response to optimization.
Anyhow, I'm optimizing code size on the basis that I should have unrolled loops and every instruction takes one clock cycle to execute. Therefore, smaller code means faster execution and less energy consumption. There are numerous cases where smaller code is contrary to faster execution. There are also numerous cases where instruction count is not linear with execution time. However, on this architecture, for this subroutine, it is possible to equate length with speed as a first approximation.
It helps if a desirable result is known. On 32 bit ARM, there is a four bit conditional execution field in every instruction. This allows ARM implementations to have a very simple execution unit and avoids instruction decode pipeline stalls when jumping over one or two instructions. It is faster and easier to take the hit of one wasted instruction slot. Unfortunately, the consequence of 32 bit instructions with a four bit conditional field is that code density is terrible.
Anyhow, it would assumed that 32 bit ARM has a register bit test instruction or a method of performing an AND mask operation with a short, immediate value. Being a three-address machine, it would probably have a method of doing this non-destructively. Although, with a four bit conditional field and a four bit pre-scale field, short, immediate values may be mutually exclusive with a third register reference which would be required for a non-destructive test.
Even if the compiler's input was the dumb technique for an 8×8 bit transpose, the output should be a sequence of 128 instructions or maybe 192 instructions plus some instructions to clear registers, marshall data into two of the eight 32 bit registers plus all of that stack frame nonsense. So, what did the "optimized" code produce? About 262 instructions. It this point, I felt like the commedian, Doug Stanhope, when he looks in the mirror and says "Well, that ain't right."
I (thankfully) don't read ARM assembly but a large number of stack references and square brackets indicated that GCC documentation claims about register allocation are complete bunkum. Again, it helps if desirable operation is known. With 64 bits of input packed into two 32 bit registers, the same for output plus one or two working registers, the majority of stack references would be avoided if five or six general purpose registers were available. However, eight char pointers for input and eight char pointers for output, provided as function parameters, had taken precedent over the more frequent operations.
I really wanted to keep the input pointers because I wanted to use the 8×8 bit transpose as a building block for a 12×8 transpose or suchlike for 12 bit DACs. After a re-write (times four due to premature optimization), output performed post-increment on one pointer. This improved code size but wasn't sufficient. Reluctantly, input is also obtained via post-increment of one pointer. This frees sufficient general purpose registers and the compiled code minus stack overhead is about 80 instructions.
Unfortunately, I was less than halfway through this problem. After getting sensible output on 32 bit ARM, I repeated tests on 16 bit Thumb instructions, as used in an Arduino Due, and 16 bit AVR instructions, as used on the majority of Arduino micro-controllers. After about three days, I had something which compiled reliably on x86, 32 bit ARM instructions, 16 bit ARM instructions and AVR. I have no reason to believe that it performs badly on 6502, Z80, 68000, PowerPC, MIPS or any mainstream architecture. However, I'm astounded that it took so long. I got to the point where the default command was functionally equivalent to:-
objdump -d a.out | wc -l
with the occasional back-check to make sure that I wasn't inadvertently optimizing an unrolled loop. (Yes, many hours of fun there.)
After understanding the dumbness of a compiler in this particular process, I devised a method to implement the partial 8×8 bit transpose functions in a fairly efficient manner. Indeed, this can be reduced to a macro which defines a C function. The function has local byte buffers for input and output. Conceptually, the input buffer is cleared and then data is selectively copied into the local input buffer. A call is made to an 8×8 bit transpose and then data is selectively copied from the local output buffer. The compiler is able to unroll the known quantity of clear and copy operations to virtual slots on the data stack. It is also able to eliminate redundant clear operations. Most impressively, it is able to eliminate all operations involving dummy inputs and outputs. The partial bit transpose functions are proportionately smaller than the complete 8×8 bit transpose. Unfortunately, compilation is relatively slow and this is worsened when it is trivial to define many variants of bit transpose function.
So, it is possible to optimize bit transpose functions by minimizing parameter count and by creating data trampolines which are optimized away by a compiler.
Unfortunately, I hoped that a full 8×8 bit transpose for ARM Thumb mode would compile to 40 instructions and with minimal escapes to longer instructions. The result was 80 instructions which mostly of the extended format. This is disappointing. At a minimum, this consumes twice the intended processing budget and this assumes there is no execution penalty for mis-aligned instructions. However, I'm targetting a device which has twice the requested processing power. So, I've soaked available resources and possibly eaten an extra 40% into the processing budget. It may remain desirable to write assembly implementations for one or more architectures. However, I've found a reliable but tedious method to avert this situation in some cases.
(This is the 51st of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
I've described an outline of my ideal filing system. My ideal database allows generalized use of the full-text search facility. This requires an outline of proposed volume storage. This borrows widely from ReiserFS, ZFS and MySQL Server storage engines such as InnoDB, Nitro and NDB Cluster.
Media is striped into 512MB fragments and each unit has a map of stripe allocation types where one type is no allocation and another type is bad stripe. It is envisioned that six out of 67 stripes perform parity functions and this is rotated in a procession across units. Each stripe has a bad sector map. For 1KB sectors, this requires 64KB. For 4KB sectors, this requires 16KB. If these sectors are bad, the stripe cannot be used. If the stripe map is bad, the unit cannot be used.
The remainder of sectors within a stripe are available to an application. However, applications may not be expecting raw, disjoint storage and therefore a standard contiguous mapping with redundancy may be utilized. The full-text search for the filing system utilizes a specialized database in which search terms are striped by length with attributes such as capitalization and accents. Conceptually, "House++" would be stored as HOUSE-15-10000-00000 where digits represent punctuation, capitalization and accented characters. Sequential entries would be compressed into fragments occupying 13 or 21 sequential sectors and would be split or reconciled as required.
The general database storage requires three or more 512MB stripes per table. One or more stripes hold 13 sector fragments. One or more stripes hold 21 sector fragments. One or more stripes hold the index of fragments. All rows within a table are stored in N-dimensional Peano curve format and therefore the table is its own universal index. Sets of eight rows are bit transposed, Peano mixed and arithmetic compressed into one stream. If a fragment exceeds 13 sectors, it is placed into a 21 sector fragment. If a fragment exceeds 21 sectors, it is placed into two 13 sector fragments. All CHAR and VARCHAR fields which are longer than 13 bytes are stored in shadow tables which require their own 512MB stripes. Each definition of VARCHAR(255) requires several gigabytes of space. BLOB fields are stored in an unindexed Fibonacci filing system.
If you doubt the wisdom of chopping data so finely then please investigate the sort utility used in a previous project.
(This is the 50th of many promised articles which explain an idea in isolation. It is hoped that ideas may be adapted, linked together and implemented.)
I have an idea which is based upon Google Search Cache and EMC storage units. If it is possible to de-duplicate files on a global scale then it may be possible to obtain a global scale search engine as a corollary. Unfortunately, my developments in this field have been amusingly dire. I've been unable to implement a de-duplicated filing with write throughput exceeding 11KB/s. However, said filing system did not store a JPEG quantize table more than once. Nor did it store two copies of the same file in a PKZip archive.
A considerably simplified version may incur read fragmentation but would vastly superior write and replication throughput. Even a trivial test system would scale beyond the affordability of many organizations. There are various techniques for filing system de-duplication. Some of these techniques even apply to memory de-duplication used in virtual desktop or virtual server systems.
The simplest technique is to not store any file with a duplicate checksum. Some legal and medical systems use this technique because it has the advantage of providing some tamper-proofing. If a file is modified then its checksum changes and therefore it is not the oroginal file. Dropbox and MegaUpload used a similar technique and this explains the inability to maintain privileges within these systems. If you have you obtain the checksum of a file by any means then revocation of access is contrary to HTTP caching.
Block-level de-duplication works on fixed boundaries; typically 2^n for suitably large n. Linux Volume Management defaults to 4MB blocks. Anything above 256MB is recommended to maintain video or database throughput. However, I discovered a weird result in which block size is immaterial. If block size is very large than it is indistinguishable from file de-duplication. If block size is very small then fragmentation may adversely affect read or write speed. However, in the general case, it does not matter.
Byte-level de-duplication may use propietary algorithms or dedicated hardware to O(n^2) matches over 1KB data or more. This is quite energy intensive but it provides the tightest matches. It also provides the most devastating failure modes because the system has minimal redundancy.
After considering byte-level de-duplication techniques, Huffman compression and various other storage techniques (in memory and on media), I believe that Fibonacci lengths should be seriously considered. If we go through the standard Fibonacci sequence (variants exist and may be useful), we have: 1, 1, 2, 3, 5, 8, 13, 21 and 34. My proposal is that files can be stored exclusively in 13 byte fragments and 21 byte fragments. Furthermore, every fragment can be given an eight byte handle where that handle contains or is augmented with a file server cluster number and fragment checksum. With five bytes or more of contiguous fragment numbers, a malicious user who is trying to exhaust all allocations would require 13*2^40 bytes of storage. This requires a 13TB file quota. With a less rigorous checksum or checksums external to the fragment reference, the required storage is significantly larger.
In typical usage, read and write operations will be large sequential operations with minor fragmentation for data which is more commonly stored. In the worst case, unique data incurs considerable overhead. Indeed, if you only have one or two copies of data then mis-alignment by one byte incurs a duplicate copy with additional fragmentation overhead. However, by random walks and birthday paradox, savings occur rapidly. Four or five copies are likely to achieve alignment. And 22 copies of data under any alignment must attain some savings. So, if you're in an office where 500 people get CCed then you can be certain that a maximum of 21 copies of the data will be stored on disk.
In a clustered environment, there is the possibility that two nodes discover the same fragment of data independently. In the general case, this is of no consequence. After replication, one reference takes precendence and the other falls into obsolescence. To ensure load balancing, precendance may be determined by checksum prior to node ID. Stripping out stale references is an interesting exercise in a clustered environment. Although, a general solution is likely to encourage wear-leveling.
However, the benefit is the ability to perform full-text indexing. Considering seven bit ASCII first, each 13 byte fragment and 21 byte fragment has an upper bound for the number of words which do not occur adjacent to a fragment boundary. Likewise, zero or one words span consecutive fragments. (Words which are longer than 13 bytes may be awkward.) Latin1, UTF-8 and UCS2 are more awkward, especially if it contains long fragments of Japanese or suchlike. Regardless, it is possible to index such text unambiguously after a file has been closed.
All of this can be dumped into a full-text index and searched in a manner which enforces file and directory privileges. It is possible (but not easy) to perform a database join in a manner which maintains partial ordering of resource IDs. It is also possible to perform this in a manner which allows significant fragment ID compression. This should scale beyond 1EB (2^60 bytes) but I may be optimistic.
If you doubt the wisdom of chopping data so finely then please investigate the sort utility used in a previous project.