Backup and Recovery
One of the biggest questions that I get from customers is about backups. Systems are built on data, and data lives on storage media (usually hard disk drives), and storage media has a known failure rate. It’s a known fact that storage media fails, and when it fails, bad things happen. It’s important to differentiate between system and media failures (which are components that just go bad) and Disasters (which are things that happen to your business, such as tornadoes or acts of god, that are independent of single failures). Disaster Recovery is a whole separate topic, which I will cover at a later date, which is equally important. But now back to system backups…
I learned the hard way, a long time ago, with an ancient Mac laptop and school work. Disk failures happen, and when they do, you’re screwed unless you have backups.
This leads to the rule: always have 2 copies of your data. One of live data, and another of a recent backup. How recent? There is the question. How much data can you recover from reentering transactions and/or manual recreations of entries that you’ve already done? If it’s a few dozen, then probably not a big deal. A few thousand, then it’s not so easy.
This leads to the first term in backups: Recovery Point Objective (RPO). RPO is how long you can stand to manually recreate transactions that have already happened. If your RPO is 4 hours, then you only expect to have to re-create 4 hours of transactions after a data restore after a failure.
The other consideration for backups is Recovery Time Objective (RTO), or “how long does it take to restore the system to the RPO”. After a failure, how long does it take to obtain the backups, obtain restore media, and do the restoration? Data Restoration typically takes hours, and sometimes longer, so RTO is an important consideration. If a short RTO is required, then a system that stores duplicate data is usually warranted. You replace failed storage A with backup storage B, and you’re done. Quick, but expensive. If you can stand a longer recovery time, then you live with a backup drive that takes minutes or hours to restore to other media. While the restoration is in progress, the systems are off line and nothing is happening. After the restoration, the system still has to be recovered with the non-backed up transactions. Therefore, the true recovery time will be: time to detect the failure + time to restore the backup + time to reenter the transactions since the backup. That could be hours or days. How long can you be in limbo?
There’s a serious cost-vs-time tradeoff for backups. Rest assured, 10 out of 10 systems will fail. The question is when. Are you prepared?
Posted: March 7th, 2015 under Backups.
Comments: none