Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by janrinok on Wednesday January 05 2022, @12:41PM   Printer-friendly
from the BIG-oops! dept.

This HPE software update accidentally wiped 77TB of data:

We covered this story here University Loses Valuable Supercomputer Research After Backup Error Wipes 77 Terabytes of Data. I, like some others, suspected finger trouble on the part of those doing the backup, but the company writing the sofware have put their hands up and taken responsibility.

A flawed update sent out by Hewlett Packard Enterprise (HPE) resulted in the loss of 77TB of critical research data at Kyoto University, the company has admitted.

HPE recently issued a software update that broke a program deleting old log files, and instead of just deleting those (which would still have a backup copy stored in a high-capacity storage system), it deleted pretty much everything, including files in the backup system, Tom's Hardware reported.

As a result, some 34 million files, generated by 14 different research groups, from December 14 to December 16, were permanently lost.

In a press release, issued in Japanese, HPE took full responsibility for the disastrous mishap.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 3, Interesting) by MrGuy on Wednesday January 05 2022, @03:07PM (16 children)

    by MrGuy (1007) on Wednesday January 05 2022, @03:07PM (#1210129)

    some 34 million files,

    Wow that’s a lot of files.

    generated by 14 different research groups,

    Boy that’s a significant number of folks

    from December 14 to December 16

    Wait, what? They lost two days worth of files?

    Sure. That’s annoying. It’s definitely a setback. And it absolutely shouldn’t have happened.

    But we’re not talking about years of work lost. Not months. Not even weeks. It’s two days.

    The headlines on this incident would have you believe this is the burning of the Lighthouse of Alexandria or the Sack of Baghdad level cultural loss. They throw up the big numbers to get the big clicks, while ignoring that this is an INCONVENIENCE. Not a tragedy.

    Starting Score:    1  point
    Moderation   +1  
       Interesting=2, Overrated=1, Total=3
    Extra 'Interesting' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   3  
  • (Score: 3, Funny) by MrGuy on Wednesday January 05 2022, @03:16PM

    by MrGuy (1007) on Wednesday January 05 2022, @03:16PM (#1210134)

    Sorry. Got my Civ wonders mixed up there. Obviously, meant the Great Library of Alexandria, not the lighthouse.

  • (Score: 3, Insightful) by Runaway1956 on Wednesday January 05 2022, @03:29PM (7 children)

    by Runaway1956 (2926) Subscriber Badge on Wednesday January 05 2022, @03:29PM (#1210138) Journal

    I'm not sure about your math, or your line of reasoning. It sounds like they not only lost all the work done on those particular days, but they lost all the backups of work leading up to those days? The article is not especially clear exactly what was lost, but if all the backups were erased, it's probably a lot more work than just a couple of days.

    So, I guess it all boils down to a rather simple question: How much time, money, and manpower will it take to replace and reproduce all the erased data? If they can "fix" the problem in less than a month, then not a really big deal. If the "fix" will take years, then it's a major problem. Let's remember that some research projects take years of planning, and funding from the university, from private enterprise, as well as government grants.

    • (Score: 2) by krishnoid on Wednesday January 05 2022, @06:14PM

      by krishnoid (1156) on Wednesday January 05 2022, @06:14PM (#1210196)

      You're way ahead of me on this. I'm still trying to puzzle together "backup" and "erased". Like, "whoops, we reversed a flag on rsync" plus "we totally (accidentally) called a separate routine to overwrite old backups, some of which required operator action to remount".

    • (Score: 0) by Anonymous Coward on Wednesday January 05 2022, @09:08PM (5 children)

      by Anonymous Coward on Wednesday January 05 2022, @09:08PM (#1210271)

      whatever, real science is always repoducable, so how can anything of value been lost here?

      • (Score: 3, Insightful) by Kell on Wednesday January 05 2022, @10:36PM (4 children)

        by Kell (292) on Wednesday January 05 2022, @10:36PM (#1210320)

        Speaking as a research engineer: time. Time is valuable and you never get it back.

        --
        Scientists ask questions. Engineers solve problems.
        • (Score: 1) by Acabatag on Thursday January 06 2022, @02:59AM (3 children)

          by Acabatag (2885) on Thursday January 06 2022, @02:59AM (#1210402)

          What seems odd to me is that scientists let anybody from Eye-Tee with their crappy 'enterprise' systems near this much important scientific data. Weren't there toner cartridges to replace or a secretaries' keyboard to blow the crumbs out of?

          This was not ordinary data that the data janitors are charged with maintaining.

          • (Score: 3, Informative) by Kell on Thursday January 06 2022, @06:23AM (2 children)

            by Kell (292) on Thursday January 06 2022, @06:23AM (#1210459)

            This might surprise some, but we often don't have a choice how our data is hosted at our institutions. With various data integrity and handling mandates from funding agencies plus strictly limited grant budgets it's almost impossible to self-manage your own IT unless you're somewhere like CERN. At my institution even getting an Octoprint server that I can connect to from home to manage 3D prints is basically impossible because the IT people are funded independently of the academics who rely on them and simply don't give a shit.

            --
            Scientists ask questions. Engineers solve problems.
            • (Score: 2) by PiMuNu on Thursday January 06 2022, @04:37PM

              by PiMuNu (3823) on Thursday January 06 2022, @04:37PM (#1210553)

              I think even at CERN they have IT guys running the firewall/etc. (Although not as long ago as you might imagine, they had a bit of an "incident" when it turned out all of their controls software shared the same password)

              The folks running the cluster are probably more like research specialists into IT, but they are surely going to use some enterprise solution for managing many storage nodes rather than rolling their own hacky scripts.

            • (Score: 0) by Anonymous Coward on Saturday January 08 2022, @05:29AM

              by Anonymous Coward on Saturday January 08 2022, @05:29AM (#1211018)

              My only catch on this is that the affected storage was for their 3 different supercomputers. At all the places I've worked at and with, the long-term storage is located separately from the cluster storage. Furthermore, there is absolutely the expectation everywhere I have been or worked with to use both for their intended purposes.

  • (Score: 3, Interesting) by tangomargarine on Wednesday January 05 2022, @04:06PM (6 children)

    by tangomargarine (667) on Wednesday January 05 2022, @04:06PM (#1210151)

    some 34 million files,

    Yeah, like how they used to say "computer capable of storing ten million pieces of information" "so it's got a 10 MB hard drive"

    Wait, what? They lost two days worth of files?

    Sure. That’s annoying. It’s definitely a setback. And it absolutely shouldn’t have happened.

    But we’re not talking about years of work lost. Not months. Not even weeks. It’s two days.

    The headlines on this incident would have you believe this is the burning of the Lighthouse of Alexandria or the Sack of Baghdad level cultural loss.

    Depends on the context. My sister was doing experiments with mice for her PhD and she said that if something happened to her mice before the experiment was done, it would cost $5000 a mouse instead of $50 to replace them and start over, because the specific genetic breed she was using was no longer in demand from the breeding facility.

    Even aside from missing an obviously critical event ("we were studying the build-up and actual event of this supernova and we've still got the last 3 months but we lost the initial 2 days of the nova itself"), it's always possible that the data as a whole is worse without a continuous sampling.

    And it's 14 different research groups. So the odds are higher at least 1 of them was significantly impacted.

    The headlines on this incident would have you believe this is the burning of the Lighthouse of Alexandria or the Sack of Baghdad level cultural loss.

    Which of these headlines is implying that?

    This HPE software update accidentally wiped 77TB of data

    University Loses Valuable Supercomputer Research After Backup Error Wipes 77 Terabytes of Data

    "Valuable" is hardly comparing anything to the Library of Alexandria, and "77TB" is just a fact. It sounds like *you're* the one overreacting.

    --
    "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
    • (Score: 2) by maxwell demon on Wednesday January 05 2022, @06:04PM

      by maxwell demon (1608) on Wednesday January 05 2022, @06:04PM (#1210192) Journal

      "computer capable of storing ten million pieces of information" "so it's got a 10 MB hard drive"

      A piece of information could be a bit, in which case it would be more of a 1.44 MB floppy.

      --
      The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by krishnoid on Wednesday January 05 2022, @06:29PM (3 children)

      by krishnoid (1156) on Wednesday January 05 2022, @06:29PM (#1210201)

      Average mass of adult mouse: 30g (on the heavier side)
      Cost of one gram of gold: ~$60
      Replacing one mouse: $5e3 / 30g x $60 = $1800 ~= 2.77 times its mass in gold.

      Sounds like a great investment. In a research lab though, would you buy mouse futures or insurance? It's not like they live that long, so you'd figure they'd either rebreed their own or pay the research breeder a little more to keep a line open for a couple years. I don't know how these "lines" are kept "open" when it comes to rodent husbandry, though.

      • (Score: 2) by tangomargarine on Wednesday January 05 2022, @07:33PM (2 children)

        by tangomargarine (667) on Wednesday January 05 2022, @07:33PM (#1210217)

        My numbers may not be accurate, but it was at least an order of magnitude.

        I think the issue is, if they have an active breeding facility at the supplier, all they need to do is dip into the cage to get some mice out, and ship them to you. When there isn't enough demand for a genetic line, they freeze some of its DNA for storage...so if you want that same strain again, they have to use petri dishes to "spin up" the breeding population again before it becomes self-sustaining, which will take longer and be more inconvenient for them.

        I think the strain she was working on may have affected the fertility of the mice involved, too.

        --
        "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
        • (Score: 2) by krishnoid on Wednesday January 05 2022, @07:42PM

          by krishnoid (1156) on Wednesday January 05 2022, @07:42PM (#1210224)

          Are you serious? Seems like a great B-story premise for a science-fiction epic, with humans instead of mice.

        • (Score: 1, Informative) by Anonymous Coward on Wednesday January 05 2022, @11:04PM

          by Anonymous Coward on Wednesday January 05 2022, @11:04PM (#1210325)

          That is about right. Getting a specific mouse line repeated can easily cost multiple orders of magnitude than ordering them in the first place. The suppliers can easily get you more popular one or the ones they have on hand, but if you go outside of that Oh boy can it get expensive. If they have to dip into their embryo stock, replicate the growth environment, and all the other work that goes into creating a mouse line, and they need to do it in a rush so your experiments aren't delayed/corrupted, and they don't have a large number of orders to spread out the cost, it adds up really quick per mouse.

    • (Score: 0) by Anonymous Coward on Friday January 07 2022, @09:44AM

      by Anonymous Coward on Friday January 07 2022, @09:44AM (#1210801)

      I had experience when working on a project with free electron laser our team got four nights to use the beam and boy it would be a shame to loose data from very expensive research equipment if some bug would wipe out our data. LHC at CERN maitenance is $3000000 a day on average, so two days might be very costly at research institutions.