Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by on Sunday August 09 2020, @05:51AM   Printer-friendly
from the SNAFU dept.

The Mighty Buzzard writes:

Yeah, so, failure to babysit the db node that was scheduled for a reboot on the 5th resulted in a bit of database FUBAR that left us temporarily losing everything from then to now. Fortunately we had a backup less than six hours old, restored from it, and appear to be copacetic now. Except for the missing five hours and change.

I'd usually make some sort of dumb joke here but it was already four hours past my bedtime when I found out about the problem. My brain is no work good anymore. Fill in whatever dad joke or snark about getting a do-over for a change strikes your fancy.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Informative) by Anonymous Coward on Tuesday August 11 2020, @09:37AM

    by Anonymous Coward on Tuesday August 11 2020, @09:37AM (#1034819)

    Depending on what exactly they are changing, they can literally transfer you from one hypervisor to another without "turning off" the VM. As far as the VM is concerned, you just get a split second of added latency to everything, some lost packets, maybe a page fault or twenty, and your clock is off by a bit until your time daemon fixes it.

    The issue is that depending on what exactly needs changing, you can't do that sort of thing for everything. Plus, they will usually charge a lot more for those kinds of up times since it takes a little bit more expertise, time, and attention to pull off in the first place. In many situations, they will do an offline transfer of machines like you suggest, but it isn't always quick or easy in itself. Often, it is fastest, easiest, and cheapest to just shut down all the VMs, do what you need to do power cycle the hypervisor, test the changes and bring it all back up. It's not like you have large nines in your uptime guarantees.

    Starting Score:    0  points
    Moderation   +1  
       Informative=1, Total=1
    Extra 'Informative' Modifier   0  

    Total Score:   1