Slash Boxes

SoylentNews is people

posted by Cactus on Saturday March 08 2014, @02:00AM   Printer-friendly
from the is-it-plugged-in? dept.

martyb writes:

"Remember that one bug that had you tearing your hair out and banging your head against the wall for the longest time? And how it felt when you finally solved it? Here's a chance to share your greatest frustration and triumph with the community.

One that I vividly recall occurred back in the early 90's at a startup that was developing custom PBX hardware and software. There was the current development prototype rack and another rack for us in Quality Assurance (QA). Our shipping deadline for a major client was fast approaching, and the pressure level was high as development released the latest hardware and software for us to test. We soon discovered that our system would not boot up successfully. We were getting all kinds of errors; different errors each time. Development's machine booted just fine, *every* time. We swapped out our hard disks, the power supply, the main processing board, the communications boards, and finally the entire backplane in which all of these were housed. The days passed and the system still failed to boot up successfully and gave us different errors on each reboot.

What could it be? We were all stymied and frustrated as the deadline loomed before us. It was then that I noticed the power strips on each rack into which all the frames and power supplies were plugged. The power strip on the dev server was 12-gauge (i.e. could handle 20 amps) but the one on the QA rack was only 14-gauge (15 amps). The power draw caused by spinning up the drives was just enough to leave the system board under-powered for bootup.

We swapped in a new $10 power strip and it worked perfectly. And we made the deadline, too!

So, fellow Soylents, what have you got? Share your favorite tale of woe and success and finally bask in the glory you deserve."

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by Snotnose on Saturday March 08 2014, @03:53AM

    by Snotnose (1623) on Saturday March 08 2014, @03:53AM (#13085)

    End of the 90s I was working for a company that made cell phone base stations. We had cards to do all the DSP. Each card could handle 8 calls, each chassis held enough cards so the chassis could handle 4 T1 lines. Needless to say, reliability was critical and resetting a chassis was Not A Good Thing (tm). We would have a DSP die every few hours. Randomly. No idea why or when. Only way to get it back was to reboot the chassis. This went on for 6 months, and as the deadline approached it's visibility got higher.

    Finally, my boss asked me to look into it. I knew nothing of DSP, worked for a different group, did not understand the code at all, but at least it was in C (compiled to some TI DSP chip). After about a week of reading the code and checking the manual to see what each library call did, I ran across an entry that said "Do not call this from an ISR". The code I was looking at was an ISR.

    Rewrote the code, compiled, and shazaam! Problem solved. Got lots of brownie points across the whole project for that one :)

    for (glee in 1..34) println("Guilty!")
    Starting Score:    1  point
    Moderation   +2  
       Interesting=2, Total=2
    Extra 'Interesting' Modifier   0  

    Total Score:   3  
  • (Score: 0) by Anonymous Coward on Saturday March 08 2014, @04:51AM

    by Anonymous Coward on Saturday March 08 2014, @04:51AM (#13096)

    I worked with one of the older TI DSP kits - the first time I had to deal with an out-of-order execution processor for real-time workload. Not so good times, but at least the compiler generally seemed to work reasonably well.

  • (Score: 1) by dargaud on Monday March 10 2014, @10:29AM

    by dargaud (364) on Monday March 10 2014, @10:29AM (#13815)

    When writing driver code, what you can and cannot do in interrupt routines is always critical. And not always clearly documented. And very hard to debug indeed.