https://blog.mattstuchlik.com/2024/07/21/fastest-memory-read.html
Summing ASCII Encoded Integers on Haswell at the Speed of memcpy turned out more popular than I expected, which inspired me to take on another challenge on HighLoad: Counting uint8s. I'm currently only #13 on the leaderboard, ~7% behind #1, but I already learned some interesting things. In this post I'll describe my complete solution (skip to that) including a surprising memory read pattern that achieves up to ~30% higher transfer rates on fully memory bound, single core workloads compared to naive sequential access, while apparently not being widely known (skip to that).
As before, the program is tuned to the input spec and for the HighLoad system: Intel Xeon E3-1271 v3 @ 3.60GHz, 512MB RAM, Ubuntu 20.04. It only uses AVX2, no AVX512.
The Challenge
"Print the number of bytes whose value equals 127 in a 250MB stream of bytes uniformly sampled from [0, 255] sent to standard input."
Nothing much to it!
(Score: 0) by Anonymous Coward on Saturday July 27 2024, @06:53PM (6 children)
What ever happened to parallel ports? Why is serial all the rage?
(Score: 0) by Anonymous Coward on Saturday July 27 2024, @07:08PM
Copper prices went up.
(Score: 0) by Anonymous Coward on Saturday July 27 2024, @07:13PM
cheaper to produce.
Imagine the cost of 8x SATA controllers all together, to connect one drive. A 16-port SATA controller costs like $500, new. The south bridge (or is it the north bridge?) typically has 4-6 ports total. Those might even be port-multiplied ports.
Also everyone is all about "thin" lately. Imagine 8 SATA ports in a portable computer now-adays, to connect one drive.
(Score: 2, Insightful) by anubi on Sunday July 28 2024, @12:23AM
"Why is serial all the rage?"
Connections are by far the biggest points of failure, as they are subject to real-world physical abuse and contamination. They are costly to implement. And you only get so many pins on a package...routing to and from them requires yet more precious PCB real estate. Physical size costs more money and decreases the functionality/(size, weight) ratio.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
(Score: 5, Interesting) by tekk on Sunday July 28 2024, @12:28AM
In addition to what other people said: lengths.
When stuff is clocked as fast as high as they are these days it's very hard to, at scale, get the wires exactly the same length so that you don't miss your window for the signal.
(Score: 4, Informative) by Unixnut on Sunday July 28 2024, @09:38AM
The faster you clock a parallel interface the harder it is to:
The higher your clock speed the broader the RFI, which not only affects the signals in the cable, but can radiate out and interfere with other components (or other equipment if your machine case is not a good Faraday cage).
This is one of the reasons they had to move to 80-pin IDE cables from the 40-pin ones (which is a misnomer, both cables were 40 pin, but one had 80 wires and one 40 wires). Once ATA speeds increased beyond 33MHz the crosstalk became a big problem, so the "80-pin" cables had data signal and GND wire interleaving in order to reduce interference enough to clock it higher reliably.
Due to the issues above it became harder to increase the performance of interfaces. The way to increase performance back then was to add more signal wires rather than clock things higher. Hence you would get ever wider buses like was done with SCSI. The first SCSI was 8-bit wide (renamed to "narrow SCSI"), then 16-bit ("wide SCSI") and 32-bit wide ("Ultra wide SCSI"). However even then it was clear that you can't just keep adding wires to improve performance, at some point things would just get silly.
Serial interfaces don't have a problem with crosstalk, they have lower RFI (and can be reduced further with differential signalling and good cable shielding), they are cheaper to produce (less wires needed) and they improve cooling inside machines (less cabling to block airflow). However silicon manufacturing technology had not reached the point where it was cheap and good enough to have mass produced high clock speed serial interfaces.
Once the point was reached where we could design a serial interface with a high enough clock speed to match (or exceed) parallel interface performance for an acceptable cost, the writing was on the wall.
The venerable parallel port was the first to go, replaced with USB (rs232 was also replaced at the same time, but it lingers on in the embedded, automation and manufacturing worlds), but now pretty much every interface in a computer is serial based. In fact I can't think of a single parallel interface on a modern PC. Everything is serial, including CPU-CPU communication buses.
Some motherboards come with a legacy parallel port as a set of pin-headers, but that is it (and I doubt any of the MBs made post 2020 do even that). Still I do miss the parallel port, it was excellent for quick simple circuits to interface to the PC for little hacks. Nowadays I must use a microcontroller with USB interface for the same experience, which is more complex to program.
(Score: 0) by Anonymous Coward on Sunday July 28 2024, @04:22PM
Parallel is only faster on paper. The more you increase transmission speed, the harder it is to keep all those lines in sync.
(Score: 0) by Anonymous Coward on Saturday July 27 2024, @08:20PM
Somebody needs to make cshift faster - it SUCKS ASS.