Stories
Slash Boxes
Comments

SoylentNews is people

Journal by turgid

I've written this code at work and it's not pretty. As usual, it was done in a hurry with a Grumpy Boss Man shouting and making Basil Fawlty appear calm and collected. It also uses code from a third party, unsuitable for our hardware but nonetheless required for integration and testing.

This is the code for an embedded system which was written by Windows people in C++ using a cross-platform GUI toolkit including a GUI but also using this toolkit's message-passing infrastructure to facilitate inter-thread communication. Yes, it's multi-threaded.

Grumpy Boss Man wouldn't let us put this GUI toolkit on our system even just to get this code up and running so I had to re-implement various select parts of said toolkit myself to get the useful buts of the code from the supplier working, fortunately with the GUI thrown out.

Working all hours, with my fingers on fire, my brain melting and all sorts of things I replaced the TCP/IP socket functionality and the thread classes (in a very cheap, Scottish, minimalist, parsimonious way).

Lo-and-behold it ran!

Now this system contains Secret Sauce(TM) that I'm not allowed to see because IP and all that. So I have a target build with a secret binary module provided by the suppliers. I have my own little stub module implementing it's API which I wrote so I could do a host (x86-64) build. Grumpy Boss Man never quite understood why anyone would want to run the code on the host as well as the target (Aarch64).

It's quite simple: expediency. I can compile, link and execute the code in a couple of seconds on the host. I have rigged up a little automated test harness, in addition to my unit tests, which runs the application and sends messages to it, and waits for and checks the replies. I can run it through various test scenarios just by typing make. Remember, this is asynchronous multi-threaded code with TCP/IP sockets. Every time I compile I get free tests. The same tests can be run on the target too (I've done it).

The second reason is that compiling and running (testing) on a different architecture shakes out certain bugs. Ideally, it would be on an architecture with a different endianness and a different OS but the world is becoming more homogeneous these days. Unless there's a SPARC box about, if it's x86-64 or Aarch64, it's going to be Little Endian.

However, x86 is CISC and ARM is RISC and we all know that CISC and RISC processors treat memory differently. Now here comes the fun part.

My host (x86-64) builds/tests were fine. So were my target Aarch64) builds and they ran fine when I put them on the target and ran my tests there.

Our suppliers produced a new version of their Secret Sauce that needed some reconfiguration inside my code. My code (actually, their example code but a bit modified) had a couple of arrays holding certain configuration data and these became twice as large and held more constants.

All the compiles worked. My host regression tests passed. Putting the target binary on the hardware and running it resulted in a crash. It was a nice crash in that my pthread_create() failed with an error code and I printed a nice error message and the rest of the program kept going.

As I said earlier, I had been re-implementing parts of this C++ library at breakneck pace and I was thinking about memory corruption and perhaps I'd made some mistakes in one of the C++ constructors for the thread class.

I instrumented the code six ways to Sunday and came to the conclusion that there was stack corruption somewhere because all the right addresses for the thread main routine and arguments were getting set in the object instances but when pthread_create() was getting called it was returning a nasty error.

Then I remembered the mighty Valgrind. So I installed it.

After about half an hour I had the answer to the problem. I had forgotten to initialise the attributes for the thread (pthread_attr_init()) and then initialise a mutex for a shared buffer (pthread_mutex_init()).

It just so happened that on x86-64, due to the layout of memory, and due to the random contents of that memory, the program was running correctly. On Aarch64 it was falling over in a smouldering pile.

The moral of the story is (1) Don't write code on your own. Get someone to review it. (2) Don't write code in a hurry even when there's a Grumpy Boss Man (3) Compile and test on at least two different architectures and (4) use Valgrind (5) I hate C++.

 

Post Comment

Edit Comment You are not logged in. You can log in now using the convenient form below, or Create an Account, or post as Anonymous Coward.

Public Terminal

Anonymous Coward [ Create an Account ]

Use the Preview Button! Check those URLs!


Score: 0 (Logged-in users start at Score: 1). Create an Account!

Allowed HTML
<b|i|p|br|a|ol|ul|li|dl|dt|dd|em|strong|tt|blockquote|div|ecode|quote|sup|sub|abbr|sarc|sarcasm|user|spoiler|del>

URLs
<URL:http://example.com/> will auto-link a URL

Important Stuff

  • Please try to keep posts on topic.
  • Try to reply to other people's comments instead of starting new threads.
  • Read other people's messages before posting your own to avoid simply duplicating what has already been said.
  • Use a clear subject that describes what your message is about.
  • Offtopic, Inflammatory, Inappropriate, Illegal, or Offensive comments might be moderated. (You can read everything, even moderated posts, by adjusting your threshold on the User Preferences Page)
  • If you want replies to your comments sent to you, consider logging in or creating an account.

If you are having a problem with accounts or comment posting, please yell for help.