Hurricane Dorian ravaged the Bahamas and then the Atlantic provinces of Canada.
On the 1,000-mile trip up the Atlantic coast, it might have swung and hit any of those neighboring states head-on and caused enormous damage. Instead, it spared most of the US.
Let’s use this near-miss as an opportunity to examine our organizations’ preparedness for disaster recovery.
In the last blog, we covered the “evaluation phase” of disaster recovery planning. Today, we’ll cover the remaining three: Strategies, Development, and Testing/Maintenance.
Phase 2: Recovery Strategies
Once Evaluation is complete, an organization will have identified the assets to recover, and its priority for recovery in a variety of scenarios.
Now, the organization needs to create a budgeted recovery plan for each type of threat and resource combination. The recovery strategy should not only include the specifics of the plan, but also the rough timetable and budget for each potential disaster.
Sometimes, the solutions are simple. For the ‘spilled-coffee’ vulnerability, the company needs to budget for backup keyboards in the IT department supply cabinet and a sign to ban food and drink from the server rooms.
Other times, recovery can be quite complex.
What if the company wants to replicate the network and servers for a 30-person office destroyed by water damage? This recovery plan must account for employee equipment, phone systems, personal data, corporate data, corporate servers, peripheral equipment (printers, etc.) and the network systems that provided resource access and security.
Usually, the budget for the recovery is determined by estimating the organization’s cost for the duration of the recovery process. For-profit companies measure this cost in dollars, but for non-profits (hospitals, etc.) and government entities, the cost calculations will need to factor in additional variables related to their specific missions (human lives, etc.).
Creating Backup Sites
Many small organizations try to save money with local file backups, or by keeping redundant equipment in the office.
However, in an area-specific disaster, such as a flood, the damage will affect the entire area and any local backup is also likely to be destroyed.
A more secure option is to create a backup site in a separate and secure location. Red Hat covers three common categories of backup types: Hot, Warm and Cold.
Hot sites are active copies of the organization’s existing infrastructure with all systems and connections configured. These sites only need the most recent backups of data to become fully active.
Warm sites reproduce the organization’s data center with regards to equipment, but without the full set of configurations and connections fully established. Cold sites can be as simple as configured spaces in buildings without any equipment.
The costs for the sites are directly related to the speed of recovery, with hot sites being the most expensive and fastest to deploy. However, not all organizations try to fully recover in a disaster.
Some organizations reduce expenses and speed up recovery by using less powerful equipment to only cover the emergency needs of the organization, while the IT department recovers the original environment.
Large organizations with multiple offices can use other offices (preferably far enough away to avoid localized disasters) as backup locations. But, if that is not an option, physical backup sites can be sourced from specialized backup recovery services.
However, cloud-hosted environments offer additional options for recovery that do not depend upon a physical location. With a cloud-hosted environment, a system can be set up, configured and tested in advance like it is a hot system.
Then, the system can be hibernated on a storage server in a warm backup status that can be launched and restored quickly by the IT department. The more virtual your environment, the more robustly it can be replicated in the cloud.
When using a vendor such as Ideal Integrations, your team will have both public and private cloud options available, and the expertise to ensure it is done securely.
However, this only accounts for the computing environment.
In order for the organization to function, it needs data. As noted above, local storage is vulnerable, so regular offsite backups are critical for disaster recovery.
While there are companies that specialize in providing backup as a service (Carbonite, etc.), companies can also use private or public cloud environments to back up their data.
An individual employee’s equipment can be even more difficult to recover and restore during a large-scale disaster than servers and data which are fully controlled by the IT department.
Off-site backups can speed up the recovery process, but the data alone will not recover the various software required by that specific employee to function. Fortunately, there are options for thin-client or remote desktop services that could be used to speed up the recovery process.
For example, an insurance company’s Florida office is hit by a hurricane that floods and destroys the local desktop machines and laptops. Yet, the agents need to go out to evaluate damage and begin processing claims.
The IT department can ship inexpensive laptops or tablets to their affected employees. Using a centralized corporate environment outside of the disaster area, the desktop computers are virtually restored and then accessed remotely by the agents.
The agents would then be able to function in the field through any available connection supported by those machines (telephone, cellular, wifi, etc.).
If recovery is urgent, consider that telecommunication infrastructure may be damaged. Telephone lines typically recover faster than cellular towers which in turn are also easier to repair than fiber optic data lines.
It may take some time before high-speed communication is available. Telephone-wire-based dial-up modems may be important for the most urgent connections.
Cellphone hot-spots can also be used as the cellular service is restored, but often bandwidth will be limited in the initial recovery stages.
Thin-client services, such as virtual desktops, can offer an advantage over a fully-restored local machine because they primarily transmit the monitor information to the remote user leaving the data and software processing to the centralized server.
This reduces the amount of data required to be sent via the internet connection and provide more functional operation in a low-bandwidth situation (ie: dial up modem, etc.).
Phase 3: Develop the Recovery Plan
Creating the list of potential disasters, and the list of solutions, only gets you halfway through the recovery planning process.
Personnel should be assigned to each recovery stage, and that comprehensive plan of action must be started. Any equipment needed must be purchased, and any service contracts to accomplish the stages for recovery need to be executed.
In our trivial example, the danger of spilling coffee on a keyboard can be generalized as a category (“personal equipment failure”) and solved with a simple instruction (“call IT for a replacement”). The development stage then requires the keyboards and other equipment to be purchased and stored.
More complex tasks, such as recreating a data center, will likely require a team of internal employees, the purchase of much more equipment, and hiring external vendors.
Such coordination must be outlined in advance so that everyone can follow protocol. After all, even for a cold-site disaster recovery, an offsite location is less expensive and easier to obtain when rented prior to a disaster.
Keep in mind that disasters stress out and exhaust even the sharpest IT staff members, making it difficult to both think and concentrate.
Checklists and comprehensive step-by-step disaster plans lessen the burden on IT staff members by moving the recovery planning to a time when everyone is calm and focused.
Additionally, it helps reduce costs by placing the recovery planning within standard business hours, instead of the emergency overtime likely required for a disaster recovery.
Don’t forget security! Make sure your security expert or vendor is involved in your disaster planning to prevent unwelcome surprises. You don’t want that hot site mirroring your network to sit unprotected in a public cloud environment and be thoroughly explored by hackers.
While this is an extreme scenario, making sure your team takes security into account during the planning will help prevent problems later.
Phase 4: Testing and Maintenance
Developing the recovery plan is a good start, but the process remains incomplete.
95% of the respondents of a Spiceworks study stated they had a disaster recovery plan, but 23% of the respondent companies stopped there.
It is simply easier to consider the disaster recovery as theoretical.
Of the companies that were not testing their data recovery plans, 61% did not have enough time and 53% said they did not have sufficient resources.
The disaster recovery plan requires both testing and maintenance. Just because something works, in theory, does not mean it will be easy to do or that something hasn’t been overlooked.
Testing the processes in advance helps eliminate overlooked steps or forgotten equipment that could cripple recovery efforts. Go through drills to practice launching the backup systems, and make sure every stage works.
Recovery plans also age.
If an IT manager leaves or the company opens a new office, huge changes to the recovery plan will need to be made to account for new people with new access requirements or an updated IT environment.
Keep in mind that not all elements of the disaster recovery plan need tested at once. Test the solutions by order of importance and likelihood over a period of months, or even years.
If an IT department creates a schedule to test the disaster recovery plan over time, they spread out the workload. They’ll also be provided with opportunities to update the plan as it ages.
This is an iterative process, and the disaster recovery team must work hard to be current with new company assets and business practices.
Working with outside resources, such as Ideal Integrations, also helps. Internal resources tend to run at capacity – outside support may be needed to realize an effective disaster recovery plan.
We also offer a both a fully integrated security solution, and security consulting, through our Blue Bastion cybersecurity team.
Don’t wait for the hurricane to flood your environment, or a ransomware attack, to take down your network.
Contact us today to see how we might help you avoid the worst consequences of a disaster!