Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Friday March 27 2020, @09:03AM   Printer-friendly [Skip to comment(s)]
from the you-don't-always-get-what-you-pay-for dept.

An enterprise SSD flaw will brick hardware after exactly 40,000 hours:

Hewlett Packard Enterprise (HPE) has warned that certain SSD drives could fail catastrophically if buyers don't take action soon. Due to a firmware bug, the products in question will be bricked exactly 40,000 hours (four years, 206 days and 16 hours) after the SSD has entered service. "After the SSD failure occurs, neither the SSD nor the data can be recovered," the company warned in a customer service bulletin.

[...] The drives in question are 800GB and 1.6TB SAS models and storage products listed in the service bulletin here. It applies to any products with HPD7 or earlier firmware. HPE also includes instructions on how to update the firmware and check the total time on the drive to best plan an upgrade. According to HPE, the drives could start failing as early as October this year.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Interesting) by Booga1 on Friday March 27 2020, @09:35AM (9 children)

    by Booga1 (6333) on Friday March 27 2020, @09:35AM (#976238)

    Wow, "good job" HP.
    Because of the way servers are ordered by the pallet/rack, that would mean companies that rely on these will all face massive upgrade maintenance events, replacement efforts, or simultaneous clustered failures.
    This is one of those "all hands on deck" type problems that sysadmins working in "the cloud" have to deal with all the time. The better they handle it, the less you hear about it. What a huge breach of what people had left in faith in HP. This makes it pretty clear they have planned on your drives to fail if you buy from them.

    • (Score: 2) by Chocolate on Friday March 27 2020, @09:52AM (2 children)

      by Chocolate (8044) on Friday March 27 2020, @09:52AM (#976239) Journal

      At least they told people about it now. No doubt their test devices are dying one by one.

      --
      Bit-choco-coin anyone?
      • (Score: 3, Interesting) by Tokolosh on Friday March 27 2020, @01:04PM (1 child)

        by Tokolosh (585) on Friday March 27 2020, @01:04PM (#976277)

        The HP bulletin is copyrighted.

        • (Score: 2) by janrinok on Friday March 27 2020, @07:17PM

          by janrinok (52) Subscriber Badge on Friday March 27 2020, @07:17PM (#976416) Journal

          It's a good job that we are quoting Engadget then... We'll let HP take that issue up with them.

          And if HP thought that being copyrighted would stop the information from getting out - well, they can now see that they were mistaken.

          --
          We are always looking for new staff in different areas - please volunteer if you have some spare time and wish to help.
    • (Score: 0) by Anonymous Coward on Friday March 27 2020, @10:09AM (4 children)

      by Anonymous Coward on Friday March 27 2020, @10:09AM (#976241)

      The warranty is 5 years. Guess you should install anything as soon as you get it.

      • (Score: 5, Funny) by BsAtHome on Friday March 27 2020, @10:48AM (3 children)

        by BsAtHome (889) on Friday March 27 2020, @10:48AM (#976248)

        The warranty is 5 years.

        That simply means that someone made a mistake in the calculation. It should have been 43800+1 hours to get beyond the warranty period. Probably an MBA rounding numbers in an excel spreadsheet that messed it up.
        /s

        • (Score: 5, Touché) by Nuke on Friday March 27 2020, @11:03AM

          by Nuke (3162) on Friday March 27 2020, @11:03AM (#976249)

          Not sure if I should have modded this as funny or insightful.

        • (Score: 2) by deimtee on Friday March 27 2020, @08:49PM

          by deimtee (3272) on Friday March 27 2020, @08:49PM (#976448) Journal

          I wonder if they used the same math they use to say it's 800GB or 1.6TB

          --
          No problem is insoluble, but at Ksp = 2.943×10−25 Mercury Sulphide comes close.
        • (Score: 2, Touché) by Anonymous Coward on Friday March 27 2020, @11:55PM

          by Anonymous Coward on Friday March 27 2020, @11:55PM (#976511)

          This is modded funny but it should be modded "+5 Plausible."

    • (Score: 3, Interesting) by driverless on Friday March 27 2020, @02:12PM

      by driverless (4770) on Friday March 27 2020, @02:12PM (#976302)

      Good thing we put Samsung 86x SSDs in the HP server rather than whatever crap HP would have supplied us with.

      As it was they still managed to screw it up, installing two sticks of RAM with different capacities in AA BB configuration instead of the necessary AB AB, so the machine only used half the RAM it had in there. I know people joke about "support monkeys" but I think in this case they actually had chimpanzees setting up the hardware.

  • (Score: 2) by janrinok on Friday March 27 2020, @10:14AM (10 children)

    by janrinok (52) Subscriber Badge on Friday March 27 2020, @10:14AM (#976242) Journal
    [from TFA]

    HPE also includes instructions on how to update the firmware and check the total time on the drive to best plan an upgrade.

    I've never updated the firmware on a drive before - if I could have afforded to buy one of these 40,000 hours ago then this might be fun.

    --
    We are always looking for new staff in different areas - please volunteer if you have some spare time and wish to help.
    • (Score: 0) by Anonymous Coward on Friday March 27 2020, @11:29AM (4 children)

      by Anonymous Coward on Friday March 27 2020, @11:29AM (#976254)

      Updating firmware is really easy. Years ago I had to do it and at that time you had to hook it up to a Windows machine and run a program from the CMD line! It only took a minute to update. I hope the updater code has been updated itself. At that time you obviously needed to take the disk offline and physically remove it to update it.

      Given that these are "only" 800 GB and 1.6 TB drives, it will probably be more cost effective to just replace them with bigger drives instead of updating all the firmwares, at least for servers with really large bays. Small instances, like home users and small businesses, this shouldn't be that bad of a process, just a PITA.

      • (Score: 2) by janrinok on Friday March 27 2020, @01:21PM (3 children)

        by janrinok (52) Subscriber Badge on Friday March 27 2020, @01:21PM (#976287) Journal

        Windows machine and run a program from the CMD line!

        Lets hope that they have improved the system - many enlightened users will have Linux installed, and not Windows.... ;-)

        --
        We are always looking for new staff in different areas - please volunteer if you have some spare time and wish to help.
        • (Score: 2) by RS3 on Friday March 27 2020, @02:15PM (2 children)

          by RS3 (6367) on Friday March 27 2020, @02:15PM (#976304)

          Sadly, many firmware (motherboard, auxiliary devices, external devices, hard disks, optical drives, etc.) updaters only run on Windows.

          A few come with self-booting images, IE. - you "burn" the image file to an optical or USB drive, boot from that, and it runs the updater.

          In the olden days many updaters created a bootable floppy disk and you booted that and ran the update.

          • (Score: 3, Interesting) by maxwell demon on Friday March 27 2020, @05:01PM (1 child)

            by maxwell demon (1608) on Friday March 27 2020, @05:01PM (#976375) Journal

            Actually, one mainboard I had had the BIOS updater as part of the BIOS itself: You would boot into BIOS, and then select a special option to read the new BIOS from an USB stick.

            --
            The Tao of math: The numbers you can count are not the real numbers.
            • (Score: 2) by RS3 on Friday March 27 2020, @08:59PM

              by RS3 (6367) on Friday March 27 2020, @08:59PM (#976454)

              Yes, many do that now, maybe most. And I'm super glad they got away from having to run Windows just to update firmware. :)

    • (Score: 5, Interesting) by RS3 on Friday March 27 2020, @02:36PM (4 children)

      by RS3 (6367) on Friday March 27 2020, @02:36PM (#976315)

      About 10 years ago I did some work for an audio engineer who is moderately famous- you'll see his name in some TV show / movie credits. He was genius.

      Anyway, he had a Seagate 1 TB drive (huge 10 years ago) full of major paid recording sessions with significant people. Not backed up of course. The drive bricked. I happened to be there one day he was in anguish and told me what had happened.

      I had already heard about these Seagate drives that brick themselves. I found a procedure for patching the firmware.

      You had to remove the drive's control board. No biggie, right? Oh but wait- don't remove that board yet...

      You had to use a serial port (RS232) with 12V to 5V (TTL) converter and connect to a 5V serial port on the hard disk. Special connector needed too. And of course some kind of serial port software, like "hyperterm", minicom, procomm, etc.

      You had to power up the bricked drive, with the serial port connected, connected to a Windows machine.

      Give some very cryptic commands to the drive through the serial port software.

      One of the commands tells the motor to stop spinning.

      When the drive stopped spinning, still powered up, you had to very very carefully REMOVE THE CONTROLLER BOARD.

      Then give some more commands.

      Then reattach the controller board. Yes, STILL POWERED UP. Yikes.

      Then some more commands. Drive spins up.

      Then run the Windows-based Seagate firmware updater.

      It all worked and saved the guy's butt.

      Thankfully I've never had to do such a procedure since then.

      • (Score: 2, Interesting) by nitehawk214 on Friday March 27 2020, @04:19PM (1 child)

        by nitehawk214 (1304) on Friday March 27 2020, @04:19PM (#976357)

        That is a "steely eyed missile man" level fix right there. Bravo.

        Hope the guy that did not do backups paid well for that service.

        --
        "Don't you ever miss the days when you used to be nostalgic?" -Loiosh
        • (Score: 3, Interesting) by RS3 on Saturday March 28 2020, @06:13AM

          by RS3 (6367) on Saturday March 28 2020, @06:13AM (#976570)

          Thank you for the high praises.

          I don't know if he did pay me for that, but he gave me a couple of things worth $, including an awesome cable tester called a "Swizz Army" https://www.ebay.com/itm/Ebtech-Swizz-Army-6-In-1-Cable-Tester-Tone-Tester-w-Phantom-Power-Detection/313034155509 [ebay.com] if anyone ever needs such a thing. Does test tones, detects intermittents (LED latches ON to show intermittent- very useful), etc.

          It was more a matter of me keeping him in business so I could get more work through him. I can't even guess how much all those studio tracks were worth. The drive was full. Close to 1,000 hours of mono track (24 bit samples, 96k/s sample rate), many were paid musicians, etc.

          A huge percentage of non-IT knowledgeable people have no clue about storage reliability (lack of), doing backups, etc.

      • (Score: 1, Informative) by Anonymous Coward on Friday March 27 2020, @07:39PM (1 child)

        by Anonymous Coward on Friday March 27 2020, @07:39PM (#976424)

        Ah yes. The BSY brick bug. I still have this in my bookmarks for some reason https://sites.google.com/site/seagatefix/ [google.com]

        • (Score: 2) by RS3 on Friday March 27 2020, @09:04PM

          by RS3 (6367) on Friday March 27 2020, @09:04PM (#976456)

          Oh gosh, not sure if I should thank you for bringing back the memory, but thank you. I don't remember all of those details. Maybe I had found a different procedure? I don't remember doing the cardboard insulator.

          Have you done that procedure to a drive?

          My solution: never ever ever buy a Seagate drive. That said, I don't think any drive is reliable anymore.

  • (Score: 5, Insightful) by jimtheowl on Friday March 27 2020, @10:21AM (8 children)

    by jimtheowl (5929) on Friday March 27 2020, @10:21AM (#976243)
    It is not a bug, but a 'feature' imported from their experience selling printers.

    I must have owned at least 3 or 4 HP printers before switching to Brother.

    HP has/had good people, but short term profit just seems too enticing to their management.
    • (Score: 2) by driverless on Friday March 27 2020, @10:43AM

      by driverless (4770) on Friday March 27 2020, @10:43AM (#976247)

      I can see why they'd want firmware updates, 40,000 hours is just within the 5-year warranty period rather than just over it like it should be, so it's a warranty replacement rather than a new sale for HP.

      I assume the new firmware will set the death counter to 45,000 hours, which is safely outside the warranty period, thus guaranteeing a new sale.

    • (Score: 3, Funny) by Nuke on Friday March 27 2020, @11:26AM (6 children)

      by Nuke (3162) on Friday March 27 2020, @11:26AM (#976253)

      You never discovered that taking the button battery out of HP printers defeated the time limit? I'm still using the HP cartridges that Moses handed down.

      • (Score: 0) by Anonymous Coward on Friday March 27 2020, @12:27PM (5 children)

        by Anonymous Coward on Friday March 27 2020, @12:27PM (#976261)

        I like how HP inkjets run a "clean heads" job every time you turn it on, then run out of magenta ink which mostly never gets used in the first place and locks out the entire printer until it's replaced. I trashed the inkjet and bought a Canon laserprinter. I may have accidentally dropped the old HP inkjet on the garage floor... I was finding small parts of it for years.

        • (Score: 4, Informative) by driverless on Friday March 27 2020, @02:04PM (1 child)

          by driverless (4770) on Friday March 27 2020, @02:04PM (#976297)

          Brother inkjets do that too, fortunately you can put some duct tape over the part of the tank that they use to sense the ink levels and "reset" it to full.

          • (Score: 2) by jimtheowl on Friday March 27 2020, @08:16PM

            by jimtheowl (5929) on Friday March 27 2020, @08:16PM (#976434)
            I was implying laser. Inkjet printers are a mostly a bad investment in any situation. For some users, the ink will dry out before they use it a second time.

            They are meant to be cheap to buy and expensive to run. No surprise that many resellers used to throw them in with the sale of a desktop computer as an incentive for the ill informed buyer.
        • (Score: 0) by Anonymous Coward on Friday March 27 2020, @02:40PM (1 child)

          by Anonymous Coward on Friday March 27 2020, @02:40PM (#976316)

          It's hard to find fault with that, if you've had the privilege of supporting seldom-used inkjets without such a feature. You'll waste at least as much ink trying to clean the printheads after the ink dries out from a month without printing, and what's worse, you'll do it under time pressure; the whole reason Mom even knows the printheads dried out is because she wants something printed.

          Sure, some of us do print quite regularly, and then these extra cleaning cycles are unnecessary, but there's probably more printers sitting in homes seldom used than in offices, or in the minority of homes that actually print regularly.

          • (Score: 3, Funny) by EEMac on Friday March 27 2020, @03:12PM

            by EEMac (6423) on Friday March 27 2020, @03:12PM (#976335)

            > there's probably more printers sitting in homes seldom used than in offices

            And unfortunately, homes that rarely print are more likely to buy inkjets. It's incredibly frustrating to live near one of these.

        • (Score: 0) by Anonymous Coward on Friday March 27 2020, @07:45PM

          by Anonymous Coward on Friday March 27 2020, @07:45PM (#976427)

          They aren't going to put some sort of battery-backed RTC in there. It has no idea how long it has been off or if maintenance is required, so it just assumes it is and does the clean and self test. Can't really fault it for that. Maybe you should leave your printer on and it will waste less.

  • (Score: 0) by Anonymous Coward on Friday March 27 2020, @10:31AM (5 children)

    by Anonymous Coward on Friday March 27 2020, @10:31AM (#976245)

    Know what businesses want? Certainty. Exactly 40k hours, nothing more, nothing less. And HPE OBVIOUSLY knows bidness.

    HPE Smart.

    • (Score: 2) by Bot on Friday March 27 2020, @11:06AM

      by Bot (3902) on Friday March 27 2020, @11:06AM (#976251) Journal

      I guess some engineer took the programmed obsolescence too literally.

      --
      Account abandoned.
    • (Score: 0) by Anonymous Coward on Friday March 27 2020, @03:12PM

      by Anonymous Coward on Friday March 27 2020, @03:12PM (#976333)

      Know what businesses want? Certainty.
      I certainly didn't work harder than exactly what I was being paid to do too.

    • (Score: 1, Insightful) by Anonymous Coward on Friday March 27 2020, @05:41PM (2 children)

      by Anonymous Coward on Friday March 27 2020, @05:41PM (#976383)

      Good thing HPE doesn't make ventilators.

      • (Score: 2, Funny) by Anonymous Coward on Friday March 27 2020, @05:48PM (1 child)

        by Anonymous Coward on Friday March 27 2020, @05:48PM (#976384)

        Make America Gasp Again.

        • (Score: 0) by Anonymous Coward on Tuesday March 31 2020, @05:39AM

          by Anonymous Coward on Tuesday March 31 2020, @05:39AM (#977523)

          lol

  • (Score: 2, Insightful) by Bot on Friday March 27 2020, @11:04AM (14 children)

    by Bot (3902) on Friday March 27 2020, @11:04AM (#976250) Journal

    How about this, mandatory decoupling between controller in the widest sense and mechanics for every electronic item and mandatory free out of warranty controller replacement. Or free upgrade if you stopped making it. No more phones that refuse to charge even with fresh battery. No more printers that fail just because. No more heaps of garbage that make Greta cry. (well, she should).

    Yes stuff would cost more, but it's the right price. And money is a fraud anyway.

    --
    Account abandoned.
    • (Score: 2, Troll) by Azuma Hazuki on Friday March 27 2020, @12:34PM (4 children)

      by Azuma Hazuki (5086) Subscriber Badge on Friday March 27 2020, @12:34PM (#976264) Journal

      What is your obsession with Thunberg? Does it bother you that she's less than half your age and already accomplished more--and is a better person--than you ever will do or be?

      --
      I am "that girl" your mother warned you about...
      • (Score: 2, Funny) by Bot on Friday March 27 2020, @12:56PM

        by Bot (3902) on Friday March 27 2020, @12:56PM (#976274) Journal

        Let's enter debug mode of the lefty brain:

        BAD TRUMP BAD TRUMP BAD TRUMP HEY DON'T TOUCH GRETA YOU OBSESSED PSYCHOPATH BAD TRUMP BAD TRUMP BAD TRUMP

        wew

        --
        Account abandoned.
      • (Score: 4, Touché) by Nuke on Friday March 27 2020, @01:15PM (2 children)

        by Nuke (3162) on Friday March 27 2020, @01:15PM (#976281)

        I read Bot's post as being on the same side as Grumpy Greta, so I'm not sure what your point is. And how do you know what Bot has or has not accomplished?

        As for me, I wouldn't need Greta to tell me that I'm pissed off when my HP printer or SSD stops working because they put a time limiter in it.

        • (Score: 0, Funny) by Anonymous Coward on Friday March 27 2020, @02:42PM

          by Anonymous Coward on Friday March 27 2020, @02:42PM (#976317)

          And how do you know what Bot has or has not accomplished?

          How does zoomy-zukes know anything? She just asserts it, and lo! it is truth.

        • (Score: 2, Troll) by Bot on Sunday March 29 2020, @02:28AM

          by Bot (3902) on Sunday March 29 2020, @02:28AM (#976807) Journal

          >so I'm not sure what your point is.

          You should download the Azuma to English dictionary.

          For example when Azuma says: "You are the lowest entity in the universe, I would throw you into a black hole only to see the hole spit you back because you're too repulsive for it", it really means: "Hello there, nice weather this afternoon don't you think?"/

          --
          Account abandoned.
    • (Score: 5, Informative) by RS3 on Friday March 27 2020, @02:58PM (8 children)

      by RS3 (6367) on Friday March 27 2020, @02:58PM (#976324)

      > mandatory decoupling between controller in the widest sense

      Agree. Occasionally I've been able to recover spinning magnetic hard disk data by replacing the controller board.

      Rather than take these (HPE SSD) kinds of risks with data, I'd be more inclined to use a controller that uses plugin microSD cards. If the controller dies, or even an individual SD card, just replace the thing that died.

      One example, and it will run them in RAID: https://the-gadgeteer.com/2016/03/17/turn-10-micro-sd-cards-into-a-sata-ssd-drive/ [the-gadgeteer.com]

      • (Score: 2) by RS3 on Friday March 27 2020, @03:03PM (1 child)

        by RS3 (6367) on Friday March 27 2020, @03:03PM (#976325)

        That example looks cool, but further searching reveals shortcomings: it probably can only do RAID 0, which is no good. Also, I don't think it does wear-leveling, TRIM, etc., so again, no good. But the concept is great.

        • (Score: 1, Insightful) by Anonymous Coward on Friday March 27 2020, @03:17PM

          by Anonymous Coward on Friday March 27 2020, @03:17PM (#976340)

          I's a terrible concept because you can just buy two normal SSDs of comparable storage capacity for the price of the micro-SD cards you pin this thing, still have change left over, and the normal SSDs will perform better and last longer.

      • (Score: 0) by Anonymous Coward on Friday March 27 2020, @03:11PM (5 children)

        by Anonymous Coward on Friday March 27 2020, @03:11PM (#976332)

        Rather than take these (HPE SSD) kinds of risks with data, I'd be more inclined to use a controller that uses plugin microSD cards. If the controller dies, or even an individual SD card, just replace the thing that died.

        One example, and it will run them in RAID: https://the-gadgeteer.com/2016/03/17/turn-10-micro-sd-cards-into-a-sata-ssd-drive/ [the-gadgeteer.com]

        That seems like an overly expensive way to make a really slow SSD that will not last particularly long.

        • (Score: 4, Touché) by janrinok on Friday March 27 2020, @07:23PM (4 children)

          by janrinok (52) Subscriber Badge on Friday March 27 2020, @07:23PM (#976420) Journal

          That seems like an overly expensive way to make a really slow SSD that will not last particularly long.

          Yeah, but other than that it would be OK.....

          --
          We are always looking for new staff in different areas - please volunteer if you have some spare time and wish to help.
          • (Score: 3, Insightful) by RS3 on Saturday March 28 2020, @06:19AM (3 children)

            by RS3 (6367) on Saturday March 28 2020, @06:19AM (#976571)

            That is funny, but the ACs are missing the point. RTFS: "After the SSD failure occurs, neither the SSD nor the data can be recovered."

            With my idea, your data has a high likelihood of recovery. That is the point I was trying to make.

            Who cares about the cost of stupid hardware, whether the drive is $50 or $500. The data might be priceless. Why lose all your data because 1 controller died? Why lose all your data because 1 FLASH cell died and drags down an address crossbar? Pull the microSD cards, plug them into a new controller and you're back in business.

            • (Score: 2) by hendrikboom on Sunday March 29 2020, @10:17PM (2 children)

              by hendrikboom (1125) Subscriber Badge on Sunday March 29 2020, @10:17PM (#977037) Homepage Journal

              Why not just have a backup?

              • (Score: 2) by RS3 on Monday March 30 2020, @12:01AM (1 child)

                by RS3 (6367) on Monday March 30 2020, @12:01AM (#977063)

                I knew someone had to ruin a perfectly ludicrous argument by injecting reason, ration, and sanity. Harumph.

                Psst: nobody ever does those. We only talk smugly about them, and patronize those who lose data.

                Joking aside, the 1 TB audio engineer's drive was very very expensive 10 or so years ago, and I'm not sure he could afford any way to back up that much data. And, I think he was one of many people who just don't understand that storage is unreliable. It just doesn't occur to them.

                And to be more clear, it's not that they are unreliable, it's the disaster that occurs when they do fail, and you really rarely have any warning. And even if you do get some kind of warning, most people don't know what to do next.

                My first computer's hard disk failed after I had it for only a few weeks. I don't remember what data I lost, but it was good to learn that lesson early on in computing.

                • (Score: 0) by Anonymous Coward on Wednesday April 01 2020, @04:28PM

                  by Anonymous Coward on Wednesday April 01 2020, @04:28PM (#978088)

                  Joking aside, the 1 TB audio engineer's drive was very very expensive 10 or so years ago

                  1TB drives were not expensive 10 or so years ago. According to this chart of drive prices over time [backblaze.com] they cost about $0.11/GB back in Q1 2009, pricing them at about 100 USD which seems about right.

                  Seriously: if you are at all worried about data loss due to drive failures you can just buy two drives and save everything to both of them.

  • (Score: 2, Informative) by VacuumTube on Friday March 27 2020, @11:44AM (19 children)

    by VacuumTube (7693) on Friday March 27 2020, @11:44AM (#976256) Journal

    Since HPE isn't offering any explanations one can only conclude (as others have noted) that this is an error in an algorithm designed for engineered obsolescence. But that must mean if the error hadn't existed the drives would still have failed almost simultaneously at some point outside the warranty period, and for anyone with several of the units the cause would have become obvious. Either HPE looks stupid and greedy or they look even stupider and greedier. To quote a person who might direct a similar business strategy, "Sad. Very sad."

    • (Score: 4, Informative) by takyon on Friday March 27 2020, @12:02PM (18 children)

      by takyon (881) <{takyon} {at} {soylentnews.org}> on Friday March 27 2020, @12:02PM (#976257) Journal

      https://www.theregister.co.uk/2020/03/25/hpe_ssd_death_fix/ [theregister.co.uk]

      If you're getting deja-vu, you're not alone. HPE separately warned [theregister.co.uk] of certain SAS SSDs dying after their 32,768th hour of operation in November last year.

      I'm calling it stupid, not greedy.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 2) by inertnet on Friday March 27 2020, @12:36PM (3 children)

        by inertnet (4071) Subscriber Badge on Friday March 27 2020, @12:36PM (#976265)

        I'm surprised that these planned obsolescence policies are hardly ever leaked by technicians involved in the design process. Punishment for leaking must be harsh.

        I don't know about the USA but the EU might hand out massive fines again if this gets proven.

        • (Score: 2) by Bot on Friday March 27 2020, @01:07PM (2 children)

          by Bot (3902) on Friday March 27 2020, @01:07PM (#976279) Journal

          Because "the higher number of people are involved in a conspiracy the more risk that someone spills the beans" is a fallacy, empirically speaking. It already happened with dieselgate, it wasn't only VW who cheated. And the fuel efficiency stats are traditionally overestimated. Nobody had interest in losing their job for something the competition probably did too, so... nobody spoke up.

          --
          Account abandoned.
          • (Score: 0) by Anonymous Coward on Friday March 27 2020, @02:49PM (1 child)

            by Anonymous Coward on Friday March 27 2020, @02:49PM (#976322)

            If nobody spoke up, why did we have Dieselgate?

            • (Score: 2) by toddestan on Saturday March 28 2020, @06:21AM

              by toddestan (4982) on Saturday March 28 2020, @06:21AM (#976572)

              Because the EPA eventually figured out that there was some funny business going on with Volkswagons.

      • (Score: 1) by VacuumTube on Friday March 27 2020, @01:07PM (11 children)

        by VacuumTube (7693) on Friday March 27 2020, @01:07PM (#976278) Journal

        "I'm calling it stupid, not greedy."

        Don't you consider engineering products to need replacement before they wear out to be a bit greedy? Why not?

        • (Score: 2) by takyon on Friday March 27 2020, @01:16PM (10 children)

          by takyon (881) <{takyon} {at} {soylentnews.org}> on Friday March 27 2020, @01:16PM (#976282) Journal

          Well, if you take their word for it, it was the result of a bug. So, Hanlon's razor. Also, they made the warning before any bricking actually occurred, so there won't be any need for replacement unless businesses ignore this bulletin.

          --
          [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
          • (Score: 2) by VacuumTube on Friday March 27 2020, @06:41PM

            by VacuumTube (7693) on Friday March 27 2020, @06:41PM (#976406) Journal

            "So, Hanlon's razor. "

            You're right. We don't have enough evidence for a conviction, even if it is against HPE.

          • (Score: 2) by RS3 on Saturday March 28 2020, @06:36AM

            by RS3 (6367) on Saturday March 28 2020, @06:36AM (#976575)

            I'll make the argument that bugs happen due to greed- pushing things out to customers before they're ready to be sold.

          • (Score: 2) by VacuumTube on Saturday March 28 2020, @10:40AM (7 children)

            by VacuumTube (7693) on Saturday March 28 2020, @10:40AM (#976598) Journal

            Since my previous post granting the benefit of a doubt to HPE, I've been bothered by the question of how a bug could conceivably cause a SSD to fail after a precise number of hours. It would have to be one that not only disables further operation, but goes the additional step of thoroughly wiping the data making recovery impossible. With due respect to Hanlon I have to say that it seems vanishingly unlikely that this could happen other than by design. So in accordance with Occam's razor it seems more likely to me that the only bug was in coding the failure to occur a few hours earlier than intended.

            • (Score: 4, Informative) by takyon on Saturday March 28 2020, @11:11AM (6 children)

              by takyon (881) <{takyon} {at} {soylentnews.org}> on Saturday March 28 2020, @11:11AM (#976599) Journal

              I did some more research. SanDisk (Western Digital) is getting fingered. Seems like they provided bugged code that was used by both HPE and Dell:

              HPE Warns of New Bug That Kills SSD Drives After 40,000 Hours [bleepingcomputer.com]

              The company says that this is a comprehensive list of impacted SSDs it makes available. However, the issue is not unique to HPE and may be present in drives from other manufacturers.

              [...] HPE learned about the firmware bug from a SSD manufacturer and warns that if SSDs were installed and put into service at the same time they are likely to fail almost concurrently.

              [...] Last month, Dell EMC released new firmware to correct a bug causing nine SanDisk SSDs in its portfolio to fail "after approximately 40,000 hours of usage."

              [...] The update corrects a check for logging the circular buffer index value. "Assert had a bad check to validate the value of circular buffer's index value. Instead of checking the max value as N, it checked for N-1," Dell's advisory [dell.com] explains.

              HPE releases urgent fix to stop enterprise SSDs conking out at 40K hours [blocksandfiles.com]

              The company said in a bulletin that the “issue is not unique to HPE and potentially affects all customers that purchased these drives.” HPE has not identified the SSD maker and refused to do so, saying: “We’re not confirming manufacturers.”

              [...] It seems likely that the HPE drives are also SanDisks. Blocks & Files asked Western Digital, which acquired SanDisk in 2016, for comment. A company spokesperson said: “Per Western Digital corporate policy, we are unable to provide comments regarding other vendors’ products. As this falls within HPE’s portfolio, all related product questions would best be addressed with HPE directly.”

              --
              [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
              • (Score: 2) by VacuumTube on Saturday March 28 2020, @07:42PM (5 children)

                by VacuumTube (7693) on Saturday March 28 2020, @07:42PM (#976732) Journal

                Thanks for digging out this additional information, Takyon. At this point it does appear that HPE is one of the injured parties, and that they have done what they could to mitigate the problem. The one technical detail they gave out concerning a buffer index value doesn't say much to me, but they're probably constrained by NDAs.

                • (Score: 2) by RS3 on Monday March 30 2020, @02:01AM (4 children)

                  by RS3 (6367) on Monday March 30 2020, @02:01AM (#977083)

                  It's conceivable, to me anyway, that a software bug could have any possible result, including writing to, erasing, scrambling block FLASH cell hash tables (thereby losing all data), etc.

                  • (Score: 2) by VacuumTube on Tuesday March 31 2020, @08:18PM (3 children)

                    by VacuumTube (7693) on Tuesday March 31 2020, @08:18PM (#977744) Journal

                    Really? A single bug that would do all that plus cause the hardware to permanently fail? That's what I find intriguing about the subject, and perhaps it's just that I'm not very familiar with the hardware. But I can't recall ever seeing a software bug that caused such a complex failure in a delivered product.

                    • (Score: 3, Interesting) by RS3 on Tuesday March 31 2020, @08:40PM (2 children)

                      by RS3 (6367) on Tuesday March 31 2020, @08:40PM (#977756)

                      Ahhh, you've never done assembly language?

                      I'll pose a potential scenario: programming error bad value causes pc (program counter) to jump somewhere it doesn't belong, like the routine that writes to the FLASH memory. But, the pointers are not correct for this write, and the write_eeprom_now routine trashes the very code that runs the SSD, which then goes even more crazy, also writing to main storage FLASH, trashing the stored data. Once code trashes itself, there's no fix unless you have external computers to do cross-checking, like the multiple redundant computes sometimes used in mission-critical stuff, like Space Shuttle, etc., which obviously nobody does in an SSD.

                      I'm not technically posing a true hardware failure here, and I didn't perceive that from TFS or TFA. However, to correct the bad control program (patch / update) on the SSD, the SSD has to be able to run well enough to receive and execute the controller's FLASH update routine. It's like a motherboard BIOS that goes bad and the MB is bricked.

                      All that said, you could conceivably remove and re-flash the chip that stores the controller's programming... unless, it's internal to the controller's microprocessor, which is likely the case.

                      Some microcontrollers have a pin which when driven high or low depending on the spec, will tell the uP to load from external ROM/FLASH chip and ignore the internal programming. Then it would be possible to re-flash the internal bad code, but obviously this would take a bit of hardware and technician's time. And again, if the original code went berserk and trashed the main FLASH data (your stored files) then it's all moot (unless you want to repair the drive for future use...)

                      • (Score: 3, Interesting) by VacuumTube on Tuesday March 31 2020, @10:12PM (1 child)

                        by VacuumTube (7693) on Tuesday March 31 2020, @10:12PM (#977811) Journal

                        Actually I used to love programming in assembler, but that was long before the days of SSDs and I guess I don't think in those terms any more. So thanks for the thought experiment. It brought back fond memories.

                        • (Score: 2) by RS3 on Tuesday March 31 2020, @10:36PM

                          by RS3 (6367) on Tuesday March 31 2020, @10:36PM (#977821)

                          You're quite welcome. Come to think of it, I don't do much assembler these days either and I'm itching to get back into it. Maybe.

                          Yeah, I don't know how you could prevent these kinds of disasters without doing really good testing, code reviews, etc. Unfortunately companies are run by MBAs who see QC as being costly / overhead / loss. And that aside, egos are usually a pretty big moat to cross.

                          BTW, I'm a vacuum tube hacker (too) and your username reminds me of a couple of projects that I could be working on while we wait for the world to hopefully to return to normal, or whatever the new normal becomes...

      • (Score: 3, Touché) by Rich on Friday March 27 2020, @01:52PM (1 child)

        by Rich (945) on Friday March 27 2020, @01:52PM (#976292) Journal

        I'm calling it stupid, not greedy.

        32768 is stupid. 40000 is greedy.

        • (Score: 1, Funny) by Anonymous Coward on Saturday March 28 2020, @12:05AM

          by Anonymous Coward on Saturday March 28 2020, @12:05AM (#976514)

          ...and 64,000 is enough for anyone.
          (yes, I know it was originally 640,000)

  • (Score: 3, Insightful) by DannyB on Friday March 27 2020, @04:33PM

    by DannyB (5839) Subscriber Badge on Friday March 27 2020, @04:33PM (#976361) Journal

    I remember when companies like HP, like Boeing once did respectable engineering work and built quality products.

    --
    Nature abhors a machine that removes dust from the living space.
(1)