Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Friday February 03 2017, @06:39AM   Printer-friendly
from the 5-backup-strategies-weren't-enough dept.

Ruby Paulson at BlogVault reports

GitLab, the online tech hub, is facing issues as a result of an accidental database deletion that happened in the wee hours of last night. A tired, frustrated system administrator thought that deleting a database would solve the lag-related issues that had cropped up... only to discover too late that he'd executed the command for the wrong database.

[...] It's certainly freaky that all the five backup solutions that GitLab had were ineffective, but this incident demonstrates that a number of things can go wrong with backups. The real aim for any backup solution, is to be able to restore data with ease... but simple oversights could render backup solutions useless.

Computer Business Review adds

The data loss took place when a system administrator accidentally deleted a directory on the wrong server during a database replication process. A folder containing 300GB of live production data was completely wiped.

[...] The last potentially useful backup was taken six hours before the issue occurred.

However, this is not seen to be of any help as snapshots are normally taken every 24 hours and the data loss occurred six hours after the previous snapshot which [resulted in] six hours of data loss.

David Mytton, founder and CEO [of] Server Density, said: "This unfortunate incident at GitLab highlights the urgent need for businesses to review and refresh their backup and incident handling processes to ensure data loss is recoverable, and teams know how to handle the procedure.

GitLab has been updating a Google Doc with info on the ongoing incident.

Additional coverage at:
TechCrunch
The Register


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Informative) by The Mighty Buzzard on Friday February 03 2017, @11:39AM

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Friday February 03 2017, @11:39AM (#462324) Homepage Journal

    If we somehow managed to destroy our production database we would be in exactly the same position being as we don't back up any more often than they did. Maybe we should look at changing that for the production server at least. Then again, we're talking comments rather than code. What do you lot think?

    --
    My rights don't end where your fear begins.
    Starting Score:    1  point
    Moderation   +2  
       Informative=2, Total=2
    Extra 'Informative' Modifier   0  
    Karma-Bonus Modifier   +1  

    Total Score:   4  
  • (Score: 2) by pkrasimirov on Friday February 03 2017, @03:05PM

    by pkrasimirov (3358) Subscriber Badge on Friday February 03 2017, @03:05PM (#462397)
    • (Score: 2) by The Mighty Buzzard on Friday February 03 2017, @03:35PM

      by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Friday February 03 2017, @03:35PM (#462428) Homepage Journal

      Eh, we do already have plans in place as well as daily local and offsite db dumps going back a good ways. I was asking more for specific ideas like the PS1 setting but more for the backups end of things. I mean we got a lot of smart folks here, might as well use them.

      --
      My rights don't end where your fear begins.
      • (Score: 0) by Anonymous Coward on Friday February 03 2017, @04:24PM

        by Anonymous Coward on Friday February 03 2017, @04:24PM (#462458)

        IIRC, you use MySQL. In that case, turn on https://dev.mysql.com/doc/refman/5.7/en/binary-log.html [mysql.com] with a basename that points to a different directory. The logs will rotate automatically when they hit a certain size or the daemon is restarted. You can then restore, if necessary from the backup dump to get you close and then by replaying the binary logs in concatenated form. A tip though, before blindly replaying the transaction log, you should change it to human-readable form and edit out the unneeded stuff. Also, don't forget to do a FULL vacuum, analyse and reindex of the databases as well.

        • (Score: 2) by pkrasimirov on Friday February 03 2017, @04:50PM

          by pkrasimirov (3358) Subscriber Badge on Friday February 03 2017, @04:50PM (#462472)

          Good link, thanks. That "binary log" is essentially a journal. But it should be usable for quick replay of changes on top of last backup, otherwise it is not of much value. Restore (as well as backup) should be fully automated bullet-proof task exactly for the reason in the story: sometimes humans err and stress does not help. Also mind all that would be during outage, hardly a good time to "change it to human-readable form and edit out the unneeded stuff. [...] do a FULL vacuum, analyse and reindex of the databases".

        • (Score: 0) by Anonymous Coward on Friday February 03 2017, @08:34PM

          by Anonymous Coward on Friday February 03 2017, @08:34PM (#462581)

          Forgot to mention: https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_flush-logs [mysql.com] can be helpful to cut down redundant. replays

  • (Score: 0) by Anonymous Coward on Friday February 03 2017, @03:43PM

    by Anonymous Coward on Friday February 03 2017, @03:43PM (#462437)

    I'm not an expert, but wouldn't best practice be to keep a log file with all SQL transactions where delete commands can be manually undone?

  • (Score: 1) by hopp on Friday February 03 2017, @05:38PM

    by hopp (2833) on Friday February 03 2017, @05:38PM (#462493)

    Loss is a part of life. You can't take it with you data included.

    Take reasonable steps to protect the data knowing that guaranteed complete recovery is improbable or prohibitively expensive.