Ruby Paulson at BlogVault reports
GitLab, the online tech hub, is facing issues as a result of an accidental database deletion that happened in the wee hours of last night. A tired, frustrated system administrator thought that deleting a database would solve the lag-related issues that had cropped up... only to discover too late that he'd executed the command for the wrong database.
[...] It's certainly freaky that all the five backup solutions that GitLab had were ineffective, but this incident demonstrates that a number of things can go wrong with backups. The real aim for any backup solution, is to be able to restore data with ease... but simple oversights could render backup solutions useless.
Computer Business Review adds
The data loss took place when a system administrator accidentally deleted a directory on the wrong server during a database replication process. A folder containing 300GB of live production data was completely wiped.
[...] The last potentially useful backup was taken six hours before the issue occurred.
However, this is not seen to be of any help as snapshots are normally taken every 24 hours and the data loss occurred six hours after the previous snapshot which [resulted in] six hours of data loss.
David Mytton, founder and CEO [of] Server Density, said: "This unfortunate incident at GitLab highlights the urgent need for businesses to review and refresh their backup and incident handling processes to ensure data loss is recoverable, and teams know how to handle the procedure.
GitLab has been updating a Google Doc with info on the ongoing incident.
Additional coverage at:
TechCrunch
The Register
(Score: 4, Informative) by The Mighty Buzzard on Friday February 03 2017, @11:39AM
If we somehow managed to destroy our production database we would be in exactly the same position being as we don't back up any more often than they did. Maybe we should look at changing that for the production server at least. Then again, we're talking comments rather than code. What do you lot think?
My rights don't end where your fear begins.
(Score: 2) by pkrasimirov on Friday February 03 2017, @03:05PM
It's called Disaster recovery [wikipedia.org] plan. Also one easy prevention measure is to ... change terminal PS1 format/colours to make it clear whether you’re using production or staging (red production, yellow staging). [gitlab.com]
(Score: 2) by The Mighty Buzzard on Friday February 03 2017, @03:35PM
Eh, we do already have plans in place as well as daily local and offsite db dumps going back a good ways. I was asking more for specific ideas like the PS1 setting but more for the backups end of things. I mean we got a lot of smart folks here, might as well use them.
My rights don't end where your fear begins.
(Score: 0) by Anonymous Coward on Friday February 03 2017, @04:24PM
IIRC, you use MySQL. In that case, turn on https://dev.mysql.com/doc/refman/5.7/en/binary-log.html [mysql.com] with a basename that points to a different directory. The logs will rotate automatically when they hit a certain size or the daemon is restarted. You can then restore, if necessary from the backup dump to get you close and then by replaying the binary logs in concatenated form. A tip though, before blindly replaying the transaction log, you should change it to human-readable form and edit out the unneeded stuff. Also, don't forget to do a FULL vacuum, analyse and reindex of the databases as well.
(Score: 2) by pkrasimirov on Friday February 03 2017, @04:50PM
Good link, thanks. That "binary log" is essentially a journal. But it should be usable for quick replay of changes on top of last backup, otherwise it is not of much value. Restore (as well as backup) should be fully automated bullet-proof task exactly for the reason in the story: sometimes humans err and stress does not help. Also mind all that would be during outage, hardly a good time to "change it to human-readable form and edit out the unneeded stuff. [...] do a FULL vacuum, analyse and reindex of the databases".
(Score: 0) by Anonymous Coward on Friday February 03 2017, @08:34PM
Forgot to mention: https://dev.mysql.com/doc/refman/5.7/en/mysqldump.html#option_mysqldump_flush-logs [mysql.com] can be helpful to cut down redundant. replays
(Score: 0) by Anonymous Coward on Friday February 03 2017, @03:43PM
I'm not an expert, but wouldn't best practice be to keep a log file with all SQL transactions where delete commands can be manually undone?
(Score: 2) by pkrasimirov on Friday February 03 2017, @04:22PM
It was rm -Rvf, not DELETE FROM.
(Score: 1) by hopp on Friday February 03 2017, @05:38PM
Loss is a part of life. You can't take it with you data included.
Take reasonable steps to protect the data knowing that guaranteed complete recovery is improbable or prohibitively expensive.