I've written this code at work and it's not pretty. As usual, it was done in a hurry with a Grumpy Boss Man shouting and making Basil Fawlty appear calm and collected. It also uses code from a third party, unsuitable for our hardware but nonetheless required for integration and testing.
This is the code for an embedded system which was written by Windows people in C++ using a cross-platform GUI toolkit including a GUI but also using this toolkit's message-passing infrastructure to facilitate inter-thread communication. Yes, it's multi-threaded.
Grumpy Boss Man wouldn't let us put this GUI toolkit on our system even just to get this code up and running so I had to re-implement various select parts of said toolkit myself to get the useful buts of the code from the supplier working, fortunately with the GUI thrown out.
Working all hours, with my fingers on fire, my brain melting and all sorts of things I replaced the TCP/IP socket functionality and the thread classes (in a very cheap, Scottish, minimalist, parsimonious way).
Lo-and-behold it ran!
Now this system contains Secret Sauce(TM) that I'm not allowed to see because IP and all that. So I have a target build with a secret binary module provided by the suppliers. I have my own little stub module implementing it's API which I wrote so I could do a host (x86-64) build. Grumpy Boss Man never quite understood why anyone would want to run the code on the host as well as the target (Aarch64).
It's quite simple: expediency. I can compile, link and execute the code in a couple of seconds on the host. I have rigged up a little automated test harness, in addition to my unit tests, which runs the application and sends messages to it, and waits for and checks the replies. I can run it through various test scenarios just by typing make. Remember, this is asynchronous multi-threaded code with TCP/IP sockets. Every time I compile I get free tests. The same tests can be run on the target too (I've done it).
The second reason is that compiling and running (testing) on a different architecture shakes out certain bugs. Ideally, it would be on an architecture with a different endianness and a different OS but the world is becoming more homogeneous these days. Unless there's a SPARC box about, if it's x86-64 or Aarch64, it's going to be Little Endian.
However, x86 is CISC and ARM is RISC and we all know that CISC and RISC processors treat memory differently. Now here comes the fun part.
My host (x86-64) builds/tests were fine. So were my target Aarch64) builds and they ran fine when I put them on the target and ran my tests there.
Our suppliers produced a new version of their Secret Sauce that needed some reconfiguration inside my code. My code (actually, their example code but a bit modified) had a couple of arrays holding certain configuration data and these became twice as large and held more constants.
All the compiles worked. My host regression tests passed. Putting the target binary on the hardware and running it resulted in a crash. It was a nice crash in that my pthread_create() failed with an error code and I printed a nice error message and the rest of the program kept going.
As I said earlier, I had been re-implementing parts of this C++ library at breakneck pace and I was thinking about memory corruption and perhaps I'd made some mistakes in one of the C++ constructors for the thread class.
I instrumented the code six ways to Sunday and came to the conclusion that there was stack corruption somewhere because all the right addresses for the thread main routine and arguments were getting set in the object instances but when pthread_create() was getting called it was returning a nasty error.
Then I remembered the mighty Valgrind. So I installed it.
After about half an hour I had the answer to the problem. I had forgotten to initialise the attributes for the thread (pthread_attr_init()) and then initialise a mutex for a shared buffer (pthread_mutex_init()).
It just so happened that on x86-64, due to the layout of memory, and due to the random contents of that memory, the program was running correctly. On Aarch64 it was falling over in a smouldering pile.
The moral of the story is (1) Don't write code on your own. Get someone to review it. (2) Don't write code in a hurry even when there's a Grumpy Boss Man (3) Compile and test on at least two different architectures and (4) use Valgrind (5) I hate C++.
(Score: 1, Insightful) by Anonymous Coward on Friday May 17 2024, @03:01AM
I hate to say it, but I think your are missing a bit of history. Many languages started out with their reference implementations and then grew into a specification later. Even C started out that way with the C behavior being whatever Bell and later Unix did. A lot of languages and their features are like that but most people have forgotten about the times before they were specified in a way more independent from their reference implementation.
But to your other point. The layering you speak of is just the cycle repeating itself exacerbated by the large expansion of people in the field. Eventually things will stabilize again like the did the previous times. And then it will probably happen again. Such is the way of things. But once the field gets over this painful adolescence, not only will things be more stable but we will also benefit from its maturity.