Abstract
This tip provides some useful definitions of terms and concepts related to Disaster Recovery.
Contents
Here are some definitions of some common terms related to Disaster Recovery. They are taken from an ITSO publication currently available as a Redpiece (draft), entitled "Disaster Recovery Strategies with Tivoli Storage Management". To access the full text, please go to http://publib-b.boulder.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg246844.html
Business Continuity
Business continuity describes the processes and procedures an organization puts in place to ensure that essential functions can continue during and after a disaster. Business Continuity Planning seeks to prevent interruption of mission-critical services, and to reestablish full functioning as swiftly and smoothly as possible.
Business Impact Analysis (BIA)
A business impact analysis is performed to determine the impacts associated with disruptions to specific functions or assets in a firm – these include operating impact, financial impact, and legal or regulatory impact. For example, should billing, receivable, and collections business functions be crippled by inaccessibility of information, cash flow to the business will suffer. Additional risks are that lost customers will never return, the business’ credit rating may suffer, and significant costs may be incurred for hiring temporary help. Lost revenues, additional costs to recover, fines and penalties, overtime, application and hardware, lost good will, and delayed collection of funds could be the business impact of a disaster.
Risk Analysis
A risk analysis identifies important functions and assets that are critical to a firm’s operations, then subsequently establishes the probability of a disruption to those functions and assets. Once the risk is established, objectives and strategies to eliminate avoidable risks and minimize impacts of unavoidable risks can be set. A list of critical business functions and assets should first be compiled and prioritized. Following this, determine the probability of specific threats to business functions and assets. For example, a certain type of failure may occur once in 10 years. From a risk analysis, a set objectives and strategies to prevent, mitigate, and recover from disruptive threats should be developed.
Disaster Recovery Plan (DRP)
The DRP is an IT-focused plan designed to restore operability of the target systems, applications, or computer facility at an alternate site after an emergency. A DRP addresses major site disruptions that require site relocation. The DRP applies to major, usually catastrophic, events that deny access to the normal facility for an extended period. Typically, Disaster Recovery Planning involves an analysis of business processes and continuity needs; it may also include a significant focus on disaster prevention.
Disaster Tolerance
Disaster tolerance defines an environment’s ability to withstand major disruptions to systems and related business processes. Disaster tolerance at various levels should be built into an environment and can take the form of hardware redundancy, high availability/clustering solutions, multiple data centers, eliminating single points of failure, and distance solutions.
DR Hotsite
A data center facility with sufficient hardware, communications interfaces and environmentally controlled space capable of providing relatively immediate backup data processing support.
DR Warmsite
A data center or office facility which is partially equipped with hardware, communications interfaces, electricity and environmental conditioning capable of providing backup operating support.
DR Coldsite
One or more data center or office space facilities equipped with sufficient pre-qualified environmental conditioning, electrical connectivity, communications access, configurable space and access to accommodate the installation and operation of equipment by critical staff required to resume business operations.
Bare Metal Recovery
A bare metal recovery describes the process of restoring a complete system, including system and boot partitions, system settings, applications, and data to their original state at some point prior to a disaster.
High Availability
High availability describes a system’s ability to continue processing and functioning for a certain period of time - normally a very high percentage of time, for example 99.999%. High availability can be implemented in your IT infrastructure by reducing any single points-of-failure (SPOF), using redundant components. Similarly, clustering and coupling applications between two or more systems can provide a highly available computing environment.
Recovery Time Objective (RTO)
The Recovery Time Objective is the time needed to recover from a disaster or, saying it another way, how long you can afford to be without your systems.
Recovery Point Objective (RPO)
Recovery Point Objective describes the age of the data you want the ability to restore in the event of a disaster. For example, if your RPO is 6 hours, you want to be able to restore systems back to the state they were in, as of no longer than 6 hours ago. To achieve this, you need to be making backups or other data copies at least every 6 hours. Any data created or modified inside your recovery point objective will be either lost or must be recreated during a recovery. If your RPO is that no data is lost, synchronous remote copy solutions are your only choice.
Network Recovery Objective (NRO)
Network Recovery Objective indicates the time required to recover or fail over network operations. Keep in mind that systems level recovery is not fully complete if customers cannot access the application services via network connections. Hence, the NRO includes the time required to bring online alternate communication links, re-configure routers and name servers (DNS) and alter client system parameters for alternative TCP/IP addresses. Comprehensive network failover planning is of equal importance to data recovery in a Disaster Recovery scenario.
Special Notices
This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.
