Is your team prepared for when disaster strikes?

Every software system is subject to a disaster, and these events can have a massive impact on the business. The key is being prepared so that you can recover quickly. This is where recovery testing comes in.

Recovery testing

What is Recovery Testing?

Recovery testing is when you force the failure of your software in order to verify that recovering from the failure is both possible and properly performed. The goal, obviously, is to recover quickly.

Recovery testing involves measuring and determining the following:

  1. How long does it take for your system to resume normal operations?
  2. What percentage of scenarios can your system successfully recover from?
  3. Can the system recover all lost data?
  4. Can users reconnect successfully?

By answering these questions and always working to improve the results, your company and product will be better prepared for when a crisis occurs.

The Importance of Recovery Testing

The ability to recover from a disaster is important for any business. How critical your business or product is determines the level of criticality for recovery testing.

For example, if your product is a communications platform for the airline industry being used 24/7 and highly depended upon, the ability to recover within seconds or minutes is paramount. On the other hand, if your product is a learning management system, being able to recover within hours or maybe even a few days might be acceptable.

Mission critical products with high dependencies and strict SLAs must be able to recover quickly, and therefore these businesses must invest the time and money required to conduct thorough recovery testing.

How is recovery testing conducted?

It’s not a wise idea to test recoverability in your production environment, after all, that would involve taking your entire system offline. So, how do you conduct recovery testing if the nature of recovery testing is to see how you recover from a system-wide outage or catastrophe?

Create a test environment that is as close to the production environment as you can. Tests should be done on the hardware you’re restoring to whenever possible. Interfaces, hardware, and code should be a replica to the live system. The closer your test environment is to your product system, the higher quality your tests will be.

As you can see, recovery testing can be time consuming, and very expensive.

For mission critical products and services, this expense is necessary. The more you conduct recovery testing, the more prepared you’re going to be for a disaster. Keep in mind, just like data failures occur on systems, things can also go wrong with the disaster recovery plan. This is why it’s important to conduct frequent recovery testing… to “practice” recovering form a disaster.

Conclusion

The repercussions of failure are vast. A business can be impacted financially, their brand can be damaged, or a breach of security may occur. These risks are always there, because every system is subject to failure. No company, product, or software is immune to “breaking”. How frequent recoverability is tested directly impacts how prepared an organization is for disaster.

If you find yourself leading a project that involves business process and technology, be sure to review any existing recovery testing procedures. And if no procedures exist, make sure you put in the time and effort to ensure a proper disaster recovery plan is designed and tested.