"Remember that one bug that had you tearing your hair out and banging your head against the wall for the longest time? And how it felt when you finally solved it? Here's a chance to share your greatest frustration and triumph with the community.
One that I vividly recall occurred back in the early 90's at a startup that was developing custom PBX hardware and software. There was the current development prototype rack and another rack for us in Quality Assurance (QA). Our shipping deadline for a major client was fast approaching, and the pressure level was high as development released the latest hardware and software for us to test. We soon discovered that our system would not boot up successfully. We were getting all kinds of errors; different errors each time. Development's machine booted just fine, *every* time. We swapped out our hard disks, the power supply, the main processing board, the communications boards, and finally the entire backplane in which all of these were housed. The days passed and the system still failed to boot up successfully and gave us different errors on each reboot.
What could it be? We were all stymied and frustrated as the deadline loomed before us. It was then that I noticed the power strips on each rack into which all the frames and power supplies were plugged. The power strip on the dev server was 12-gauge (i.e. could handle 20 amps) but the one on the QA rack was only 14-gauge (15 amps). The power draw caused by spinning up the drives was just enough to leave the system board under-powered for bootup.
We swapped in a new $10 power strip and it worked perfectly. And we made the deadline, too! So, fellow Soylents, what have you got? Share your favorite tale of woe and success and finally bask in the glory you deserve."
Always feels good to dive into a problem, dig through some layers of misdirection, and find a quick and easy fix. About six months ago, we added a new suite of tests to the application I work on at work, using a relatively untested testing framework. For some reason, after about a month, the step to parse the output from the test suite was taking so long (on the order of an hour) that the build machine would usually just fall over and spit back a warning. It was an awful pain -- every time the continuous build ran, you had to go look at the actual output from the tests yourself to make sure you hadn't broken anything -- and it wasted resources
The fix, in the end, was one character -- I added a non-greedy-match token to the regex that parsed the test output. Turns out, a bunch of the test suites had the same name, and the parser didn't expect that to happen. With the greedy matching, the parser was matching the beginning of each suite with the end of all the others that followed. That, of course, didn't scale well as we added more test suites!