SoylentNews Comments | An Enterprise SSD Flaw Will Brick Hardware after Exactly 40,000 Hours

An Enterprise SSD Flaw Will Brick Hardware after Exactly 40,000 Hours

posted by janrinok on Friday March 27 2020, @09:03AM

from the you-don't-always-get-what-you-pay-for dept.

upstart writes in with an IRC submission for SoyCow8162:

An enterprise SSD flaw will brick hardware after exactly 40,000 hours:

Hewlett Packard Enterprise (HPE) has warned that certain SSD drives could fail catastrophically if buyers don't take action soon. Due to a firmware bug, the products in question will be bricked exactly 40,000 hours (four years, 206 days and 16 hours) after the SSD has entered service. "After the SSD failure occurs, neither the SSD nor the data can be recovered," the company warned in a customer service bulletin.
[...] The drives in question are 800GB and 1.6TB SAS models and storage products listed in the service bulletin here. It applies to any products with HPD7 or earlier firmware. HPE also includes instructions on how to update the firmware and check the total time on the drive to best plan an upgrade. According to HPE, the drives could start failing as early as October this year.

Original Submission

This discussion has been archived. No new comments can be posted.

An Enterprise SSD Flaw Will Brick Hardware after Exactly 40,000 Hours | Log In/Create an Account | Top | 72 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Re:Legendary Re:Legendary (Score: 2) by VacuumTube on Saturday March 28 2020, @07:42PM (5 children)

by VacuumTube (7693) on Saturday March 28 2020, @07:42PM (#976732) Journal

Thanks for digging out this additional information, Takyon. At this point it does appear that HPE is one of the injured parties, and that they have done what they could to mitigate the problem. The one technical detail they gave out concerning a buffer index value doesn't say much to me, but they're probably constrained by NDAs.

Parent

Starting Score: 1 point

Karma-Bonus Modifier +1

Total Score: 2
Re:Legendary Re:Legendary (Score: 2) by RS3 on Monday March 30 2020, @02:01AM (4 children)

by RS3 (6367) on Monday March 30 2020, @02:01AM (#977083)

It's conceivable, to me anyway, that a software bug could have any possible result, including writing to, erasing, scrambling block FLASH cell hash tables (thereby losing all data), etc.

Parent
- Re:Legendary Re:Legendary (Score: 2) by VacuumTube on Tuesday March 31 2020, @08:18PM (3 children)
  
  by VacuumTube (7693) on Tuesday March 31 2020, @08:18PM (#977744) Journal
  
  Really? A single bug that would do all that plus cause the hardware to permanently fail? That's what I find intriguing about the subject, and perhaps it's just that I'm not very familiar with the hardware. But I can't recall ever seeing a software bug that caused such a complex failure in a delivered product.
  
  Parent
  - Re:Legendary Re:Legendary (Score: 3, Interesting) by RS3 on Tuesday March 31 2020, @08:40PM (2 children)
    
    by RS3 (6367) on Tuesday March 31 2020, @08:40PM (#977756)
    
    Ahhh, you've never done assembly language?
    I'll pose a potential scenario: programming error bad value causes pc (program counter) to jump somewhere it doesn't belong, like the routine that writes to the FLASH memory. But, the pointers are not correct for this write, and the write_eeprom_now routine trashes the very code that runs the SSD, which then goes even more crazy, also writing to main storage FLASH, trashing the stored data. Once code trashes itself, there's no fix unless you have external computers to do cross-checking, like the multiple redundant computes sometimes used in mission-critical stuff, like Space Shuttle, etc., which obviously nobody does in an SSD.
    I'm not technically posing a true hardware failure here, and I didn't perceive that from TFS or TFA. However, to correct the bad control program (patch / update) on the SSD, the SSD has to be able to run well enough to receive and execute the controller's FLASH update routine. It's like a motherboard BIOS that goes bad and the MB is bricked.
    All that said, you could conceivably remove and re-flash the chip that stores the controller's programming... unless, it's internal to the controller's microprocessor, which is likely the case.
    Some microcontrollers have a pin which when driven high or low depending on the spec, will tell the uP to load from external ROM/FLASH chip and ignore the internal programming. Then it would be possible to re-flash the internal bad code, but obviously this would take a bit of hardware and technician's time. And again, if the original code went berserk and trashed the main FLASH data (your stored files) then it's all moot (unless you want to repair the drive for future use...)
    
    Parent
    - Re:Legendary Re:Legendary (Score: 3, Interesting) by VacuumTube on Tuesday March 31 2020, @10:12PM (1 child)
      
      by VacuumTube (7693) on Tuesday March 31 2020, @10:12PM (#977811) Journal
      
      Actually I used to love programming in assembler, but that was long before the days of SSDs and I guess I don't think in those terms any more. So thanks for the thought experiment. It brought back fond memories.
      
      Parent
      - Re:Legendary (Score: 2) by RS3 on Tuesday March 31 2020, @10:36PM
        
        by RS3 (6367) on Tuesday March 31 2020, @10:36PM (#977821)
        
        You're quite welcome. Come to think of it, I don't do much assembler these days either and I'm itching to get back into it. Maybe.
        Yeah, I don't know how you could prevent these kinds of disasters without doing really good testing, code reviews, etc. Unfortunately companies are run by MBAs who see QC as being costly / overhead / loss. And that aside, egos are usually a pretty big moat to cross.
        BTW, I'm a vacuum tube hacker (too) and your username reminds me of a couple of projects that I could be working on while we wait for the world to hopefully to return to normal, or whatever the new normal becomes...
        
        Parent

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

An Enterprise SSD Flaw Will Brick Hardware after Exactly 40,000 Hours

Re:Legendary Re:Legendary (Score: 2) by VacuumTube on Saturday March 28 2020, @07:42PM (5 children)

Re:Legendary Re:Legendary (Score: 2) by RS3 on Monday March 30 2020, @02:01AM (4 children)

Re:Legendary Re:Legendary (Score: 2) by VacuumTube on Tuesday March 31 2020, @08:18PM (3 children)

Re:Legendary Re:Legendary (Score: 3, Interesting) by RS3 on Tuesday March 31 2020, @08:40PM (2 children)

Re:Legendary Re:Legendary (Score: 3, Interesting) by VacuumTube on Tuesday March 31 2020, @10:12PM (1 child)

Re:Legendary (Score: 2) by RS3 on Tuesday March 31 2020, @10:36PM

Starting Score:	1		point
Karma-Bonus Modifier		+1

Total Score:		2