Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Friday December 31 2021, @04:44AM   Printer-friendly
from the I-hope-they-had-backups!-Oh.-Wait... dept.

University Loses Valuable Supercomputer Research After Backup Error Wipes 77 Terabytes of Data:

Kyoto University, a top research institute in Japan, recently lost a whole bunch of research after its supercomputer system accidentally wiped out a whopping 77 terabytes of data during what was supposed to be a routine backup procedure.

That malfunction, which occurred sometime between Dec. 14 and Dec. 16, erased approximately 34 million files belonging to 14 different research groups that had been using the school's supercomputing system. The university operates Hewlett Packard Cray computing systems and a DataDirect ExaScaler storage system—the likes of which can be utilized by research teams for various purposes.

It's unclear what kind of files were specifically deleted or what caused the actual malfunction, though the school has said that the work of at least four different groups will not be able to be restored.

Also at BleepingComputer.

Original announcement from the university.


Original Submission

Related Stories

HPE Software Update Accidentally Wiped That 77TB of Data 32 comments

This HPE software update accidentally wiped 77TB of data:

We covered this story here University Loses Valuable Supercomputer Research After Backup Error Wipes 77 Terabytes of Data. I, like some others, suspected finger trouble on the part of those doing the backup, but the company writing the sofware have put their hands up and taken responsibility.

A flawed update sent out by Hewlett Packard Enterprise (HPE) resulted in the loss of 77TB of critical research data at Kyoto University, the company has admitted.

HPE recently issued a software update that broke a program deleting old log files, and instead of just deleting those (which would still have a backup copy stored in a high-capacity storage system), it deleted pretty much everything, including files in the backup system, Tom's Hardware reported.

As a result, some 34 million files, generated by 14 different research groups, from December 14 to December 16, were permanently lost.

In a press release, issued in Japanese, HPE took full responsibility for the disastrous mishap.


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: -1, Troll) by Anonymous Coward on Friday December 31 2021, @05:01AM

    by Anonymous Coward on Friday December 31 2021, @05:01AM (#1208926)

    I love it
    I DON'T CARE
    I love it! 3

  • (Score: 5, Interesting) by Anonymous Coward on Friday December 31 2021, @05:04AM (10 children)

    by Anonymous Coward on Friday December 31 2021, @05:04AM (#1208927)

    I was one of the sysadmins for a university computer system about a decade ago. We had a nightly rsync to back up data from the server to another system. If I'm recalling correctly the server had a single large drive, with the automatic backup to another system. As a graduate student, I didn't have access to some things that were reserved for faculty and staff, and I believe that prevented me from shutting off the rsync. When a replacement drive was installed, one of the other sysadmins brought the server back online and started transferring data from the backup over to the server. The problem is that they didn't turn off the nightly rsync. When that job ran, most of the files from the backup hadn't been transferred back over to the server. The rsync saw that the files didn't exist on the server, so they were purged from the backup. For some reason, probably budgetary, there weren't any offline backups of the data. The result was massive data loss. Unfortunately, this seems very common in university settings, and it seems difficult to get administrations to spend money on backups until it's too late.

    • (Score: 5, Insightful) by Michael on Friday December 31 2021, @05:43AM

      by Michael (7157) on Friday December 31 2021, @05:43AM (#1208931)

      Yeah, I'm with HAL.

      "I don’t think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error."

    • (Score: 3, Insightful) by Mojibake Tengu on Friday December 31 2021, @07:38AM (1 child)

      by Mojibake Tengu (8598) on Friday December 31 2021, @07:38AM (#1208945) Journal

      Well, rsync alone is a poor way of doing backups, since it is not atomic operation. Anything wrong may happen during the process. You experienced just that non-atomicity on your own.

      The proper way should be, atomically do a snapshot of a filesystem first (real filesystem required, not toy filesystem) and perform backup operations on the snapshot.
      The same on the backup side, all backups should be fixated with snapshot as well.

      So, abundance of storage capacity is absolute necessity, not just in corporate industry and academy but in hobby and home too.

      --
      Respect Authorities. Know your social status. Woke responsibly.
      • (Score: 0) by Anonymous Coward on Friday December 31 2021, @05:32PM

        by Anonymous Coward on Friday December 31 2021, @05:32PM (#1209019)

        Well yes but your fancy pants ZFS snapshot would have been a snapshot missing most of its files, creating the same result.

        TRWTF is enabling or forgetting to disable deletion of files on the backup target.

    • (Score: 2, Funny) by Anonymous Coward on Friday December 31 2021, @08:30AM (2 children)

      by Anonymous Coward on Friday December 31 2021, @08:30AM (#1208951)

      > it seems difficult to get administrations to spend money on backups until it's too late

      That's why, when I admin, I carry round a USB drive of everyone's home folder.

      • (Score: 2) by Thexalon on Friday December 31 2021, @02:59PM

        by Thexalon (636) on Friday December 31 2021, @02:59PM (#1208985)

        And as any BOFH should know, that makes it really easy to either locate or manufacture some "interesting" contents of said home folder. Even just information on how to access somebody's corporate expense account can do wonders come employee evaluation and annual raise discussions.

        --
        The only thing that stops a bad guy with a compiler is a good guy with a compiler.
      • (Score: 2) by Dr Spin on Friday December 31 2021, @09:25PM

        by Dr Spin (5239) on Friday December 31 2021, @09:25PM (#1209051)

        I carry round a USB drive of everyone's home folder.

        If the data is worth anything, it is worth keeping 556BPI NRZI tapes of it under your bed! In case of emergency, you can read them by sprinkling rust on them with a salt shaker and using a magnifying glass.

        If its porn, 1600BPI tapes are acceptable, but the higher density Phase Encoding is harder to read with the human eye.

        Anyone who can read 6250 GCR tapes using iron filings and a magnifying glass deserves all the porn they can get!

        --
        Warning: Opening your mouth may invalidate your brain!
    • (Score: 4, Interesting) by PiMuNu on Friday December 31 2021, @09:38AM (1 child)

      by PiMuNu (3823) on Friday December 31 2021, @09:38AM (#1208955)

      In defence of the university, the way I use our cluster is as ephemeral storage/cpu. Anything that needs saving I bring back to my office environment (with backups that I control). Code, which holds the real value, is backed up reasonably carefully. That way I am relatively protected from not just cock-ups with storage elements, but also network outages/etc. 80 TB sounds like a lot of data, but it isn't really all that much. Doing a big batch job for me can easily generate big datasets ~ few hundred GB. Last time I did a big data analysis task I probably had about 10 TB of ephemeral data floating around on disk.

      Disclaimer: I occasionally work with colleagues from Kyoto University.

      • (Score: 1, Touché) by Anonymous Coward on Friday December 31 2021, @06:42PM

        by Anonymous Coward on Friday December 31 2021, @06:42PM (#1209026)

        80 TB sounds like a lot of data, but it isn't really all that much.

        I suppose it also depends upon how much supercomputer time went into generating that data.

    • (Score: 0) by Anonymous Coward on Friday December 31 2021, @01:42PM (1 child)

      by Anonymous Coward on Friday December 31 2021, @01:42PM (#1208976)

      Perhaps the solution is for university work to be made public domain and available for all to Torrent and have it distributed freely.

      Copy'right' makes it much harder for society to maintain long term information and to build off of it. The model encourages people to keep things a secret and prevents the free sharing of information.

      Well, much of that can also be attributed to corporations that kept undemocratically expanding and extending protection terms. Perhaps seven years should be the term, then it should enter the public domain.

      Back in the days products used to come with technical manuals that used to give you enough information to build it. Things like video cameras, etc... People can learn from that and eventually use that information to work in industry.

      Now a days they want to keep the consumer ignorant and unable to repair their own products or start a company to build products and improve on them. Everything is proprietary and trade secret. Intellectual property is partly to blame for this. Patents don't tell you how to do anything useful, they just contain enough information to prevent you from doing something useful without getting sued.

      • (Score: 0) by Anonymous Coward on Friday December 31 2021, @01:48PM

        by Anonymous Coward on Friday December 31 2021, @01:48PM (#1208978)

        and now a days companies even etch out the chip information on a device so that they can prevent you from replacing the chip if it breaks.

        If patents are so useful where are the patents that tell me how to build these devices. Where are the patents that tell me the information that these companies go out of their way to keep secret?

        Instead, if anything, companies try to keep this information secret partly to prevent patent trolls from simply reading technical manuals and using that information to find lame excuses to sue them for. This hurts society as it keeps everyone more ignorant.

        The system is broken and needs to be fixed.

  • (Score: -1, Troll) by Anonymous Coward on Friday December 31 2021, @05:18AM

    by Anonymous Coward on Friday December 31 2021, @05:18AM (#1208928)

    Woe to those with a turkey head shaped penis! For yours is the gobble gobble.

  • (Score: 2, Touché) by MIRV888 on Friday December 31 2021, @05:40AM

    by MIRV888 (11376) on Friday December 31 2021, @05:40AM (#1208930)

    These computers are so finicky and unreliable.
    I know my system backups routinely erase vast data sets.
    Happens all the time.

  • (Score: 1, Informative) by Anonymous Coward on Friday December 31 2021, @06:29AM (4 children)

    by Anonymous Coward on Friday December 31 2021, @06:29AM (#1208935)

    So huge that my normal advice would be impractical, that is to keep immutable deduped backups over the near term.

    • (Score: 0) by Anonymous Coward on Friday December 31 2021, @07:27AM (2 children)

      by Anonymous Coward on Friday December 31 2021, @07:27AM (#1208943)

      According to their website, they have 24 petabytes and enough grunt that 77 terabytes would be gone in less than 5 minutes. It also says that LARGE0 and LARGE1 are in a paired configuration. Sounds like someone ran a command and clobbered a part of LARGE0. A day later, something didn't look right or someone complained about missing data and their mistake before that level of backup completely propagated.

      • (Score: 0) by Anonymous Coward on Friday December 31 2021, @07:33AM

        by Anonymous Coward on Friday December 31 2021, @07:33AM (#1208944)

        with 24.923P of training data their CS boffins get the 77T back in no time.

      • (Score: 0) by Anonymous Coward on Friday December 31 2021, @08:32AM

        by Anonymous Coward on Friday December 31 2021, @08:32AM (#1208952)

        That's enough for a really high resolution goatse.

    • (Score: 4, Interesting) by Brymouse on Friday December 31 2021, @01:22PM

      by Brymouse (11315) on Friday December 31 2021, @01:22PM (#1208972)

      Not really.

      Lets say you want ZFS Raid3, so you can loose 3 disks before you have a data loss. Supermicro's 90 LFF disk 4ru server filled with 16tb SAS disks is 6 15 disk (12 usable) arrays. Add in 4 x 4 TB NVMe for l2arc/cache and special devices in a mirror config + 2 mirrored SAS disks for server boot disks in the rear. This nets you 1152 TB usable zraid3 space with stupid fast IO ops in 4RU. 20 servers is 2 racks for 24 pb. This is about $90k per server or 1.8 Million. Round up to 2M for the network gear to support this and racks/install.

      It's expensive, but not out of reach by any means.

  • (Score: 0) by Anonymous Coward on Friday December 31 2021, @07:53AM (1 child)

    by Anonymous Coward on Friday December 31 2021, @07:53AM (#1208946)

    https://old.reddit.com/r/datahoarder [reddit.com]

    ^ for fuckin real

    • (Score: 2) by MIRV888 on Friday December 31 2021, @08:33AM

      by MIRV888 (11376) on Friday December 31 2021, @08:33AM (#1208953)

      I'm just a lone home hoarder with no commercial interests whatsoever. My pile o'data is at about 13 Tb of mostly media (80%) and the rest various software packages.
      I have offline backups of pretty much everything.

      My previous post was not meant as a troll. Even though 77Tb is a massive amount of data, proper offline or independently (not paired) redundancy should be SOP for professional storage operations. If your enterprise has enough money to store petabytes of data, it has enough to properly and securely archive said data offline.
      YMMV

  • (Score: 0) by Anonymous Coward on Friday December 31 2021, @08:23AM (1 child)

    by Anonymous Coward on Friday December 31 2021, @08:23AM (#1208949)

    all the time.

    • (Score: 1, Insightful) by Anonymous Coward on Friday December 31 2021, @04:20PM

      by Anonymous Coward on Friday December 31 2021, @04:20PM (#1208998)
      And everyone knows that 640 gb is enough for anyone.

      (there's a distortion in the universe, as if all the p0rn and video hoarders screamed in outrage at once).

  • (Score: 0) by Anonymous Coward on Sunday January 02 2022, @04:53PM

    by Anonymous Coward on Sunday January 02 2022, @04:53PM (#1209364)

    So it was related to servicing of backup script during its run, newer claims report. My hypothesis... something like this in script:

    #Last updated:2020-12-14
    #Last updated: 2020-12-14
    ...
    target=/maindrive
    rsync... rsync me everything to $target
    target=/maindrive/logs
    find $target -type f -ctime +10 -exec rm -f {} \; #remove older logs to conserve space... we have it in backups, don't we?

    Try to remove excessive comment line during running of rsync.
    Do not try with valuable drives!

(1)