"Remember that one bug that had you tearing your hair out and banging your head against the wall for the longest time? And how it felt when you finally solved it? Here's a chance to share your greatest frustration and triumph with the community.
One that I vividly recall occurred back in the early 90's at a startup that was developing custom PBX hardware and software. There was the current development prototype rack and another rack for us in Quality Assurance (QA). Our shipping deadline for a major client was fast approaching, and the pressure level was high as development released the latest hardware and software for us to test. We soon discovered that our system would not boot up successfully. We were getting all kinds of errors; different errors each time. Development's machine booted just fine, *every* time. We swapped out our hard disks, the power supply, the main processing board, the communications boards, and finally the entire backplane in which all of these were housed. The days passed and the system still failed to boot up successfully and gave us different errors on each reboot.
What could it be? We were all stymied and frustrated as the deadline loomed before us. It was then that I noticed the power strips on each rack into which all the frames and power supplies were plugged. The power strip on the dev server was 12-gauge (i.e. could handle 20 amps) but the one on the QA rack was only 14-gauge (15 amps). The power draw caused by spinning up the drives was just enough to leave the system board under-powered for bootup.
We swapped in a new $10 power strip and it worked perfectly. And we made the deadline, too! So, fellow Soylents, what have you got? Share your favorite tale of woe and success and finally bask in the glory you deserve."
I was green and fresh from college. Although I was a programmer, one of my first jobs was to support a reporting app we had. One day, we ran out of space breaking not only our app but a bunch of others also running on the server. I spoke to the sys admin and he gave us more space. Problem solved. Never having actually looked at how much space we were taking up, I reported what we found up the chain of command and moved on. Then it happened again. And again. The sys admin accused us of storing too much data. I finally started keeping track of how much space we were using and how much free space was on the hard drive... manually. (I kept running the "dir" command on the command line and piping it to a file then placed the numbers in a spreadsheet to track it.) As we got close to not having enough disk space, I'd ask for more to prevent the apps from crashing.
I noticed a couple of things. Our app would suck up a tremendous amount of the free space and then release it. It made tracking how much we were using very difficult. Strangely, it stayed fairly consistent, but the amount of free space on the network drive kept getting smaller and smaller. I came to the conclusion that something other program that I couldn't see was eating up free space. I began to accuse the sys admin of not being able to spot the problem. My boss eventually had to step in.
Long story short: Because of how critical it was becoming, others jumped on board to find the root cause of the problem. Someone found a storage of cache that our program had that I was not given rights to see (for whatever reason) and the sys admin didn't know about either. Once we setup a schedule to flush the cache, the problem was solved. I realized on that day that because of supposed security measures, neither the sys admin nor I could do our jobs and we wound up blaming each other for it. We'd both been victims of being silo'd.