Backups Are Ineffective; Focus on Recovery
It’s been said that “If your data doesn’t exist in at least two places, then it doesn’t exist.” The sentiment behind that statement is a good one, but, unfortunately, it doesn’t go far enough. Beyond needing to just ensure that you back up your data, it’s also of paramount importance that you know beyond a shadow of a doubt that you can recover from your backups.
Just as it’s impossible to know the fate of Schrödinger’s poor cat until you open the box, it’s impossible to know whether your backups will save you until you actually restore them to production. And when it comes to DR, you don’t know that your failover will be successful until you actually push the button. If you test your DR plans regularly, you can have this level of confidence; unfortunately, most organizations don’t test regularly enough to be sure (Figure 4).
How Often DR Plans Are Tested
Every 1-3 months
Every 3-6 months
Once or twice per year
Less than once per year
Figure 4: Nearly three-fourths of survey respondents aren’t testing their DR plans monthly
YOU PERFORMED A SUCCESSFUL FAILOVER? GREAT! NOW CAN YOU FAIL BACK?
Plenty of DR products on the market excel at orchestrating a failover and getting your DR site online quickly. If they’re tested regularly, they may solve your initial problem of bringing the business back online. But failing over is only half the battle. Where many DR solutions fall flat today is when it comes time to fail back to the primary site.
Just as it’s impossible to know the fate of Schrödinger’s poor cat until you open the box, it’s impossible to know whether your backups will save you until you actually restore them to production.
It turns out that diverting production traffic back to the primary site is a little trickier than you’d hope. Consider some of the ways that failback can go sideways (Figure 5):
If there was a VM format conversion in order to fail over, you’ll need to convert back to the original format again. In the best case, that process will probably take more time than you’d like. But as anyone with experience doing physical-to-virtual or virtual-to-virtual migrations will tell you, expecting a system to go through two machine format conversions and come back up with no issues is a long shot at best.
- Replicating back the changed data can become an expensive proposition if you’re coming back out of a cloud; egress charges can add up quickly if you aren’t judicious about how to reflect the changed data back to the primary site.
- Some cloud providers have tools that can help with failover, and that’s great. They have very little incentive to help with your fail back, however. It could be argued that they don’t even want you to fail back successfully.
Top reasons why failback was difficult
VMs had been converted to cloud-native formats and it was difficult to convert back to a VMware vSphere format
Amount of time required to failback
It was difficult to understand the incremental changes that had occurred in the system since failover
Lack of appropriate automation tools to help with failback
Figure 5: VM format conversion is a top challenge for those who have had to exercise their DR capabilities and then tried to fail back
When it comes to disaster recovery, focus on the key word: recovery. Backups are a crucial part of the picture, but legacy backup technology often falls flat when it comes to recovery time. Additionally, look at a modern approach to failover and failback.