Case Study for Content Manager OnDemand Backup, Recovery, and High Availability #1: Global Voice and Data Communications Company

Published 01 February 2005

Authors: Wei-Dong Zhu


This document is the first of two documents in a series that present case studies for IBM DB2 Content Manager OnDemand solutions. This case study presents a global voice and data communications company. It presents background information, the backup procedures, the high availability configuration, and a disaster recovery plan. Through this case study, you can glimpse how a real-world solution is set up and maintained. You can take what you learn from this scenario and apply it to your next OnDemand solution implementation and maintenance project.


A global voice and data communications company has a requirement to make online electronic statements available to their customers. They also want to provide their customer service representatives with the most recent statements that their customers have received, an archive repository of statements, and reports from past years.

The company implemented OnDemand for multiplatforms to fulfill their company's need. Since the implementation, the company’s database has grown to approximately 550 GB. At any given time, they have approximately 4 TB of data objects stored in the OnDemand managed cache file systems. Most of the customer statements are stored in the Advanced Function Presentation (AFP) data format and most of the internal reports are stored as line data. They also store image and Portable Document Format (PDF) data but to a much lesser extent. Approximately 12 TB of data files are stored on Tivoli Storage Manager (TSM) managed media, and an additional 12 TB reside in TSM copy storage pools. TSM provides an immediate “backup” of data loads for OnDemand because they load data to TSM at the same time that they load data to the database and OnDemand managed cache. The company's average load volumes are approximately 240 GB per month. Peak volumes support approximately 3000 concurrent users extrapolating to 50 000 to 70 000 logged-on users at a glance. As much as 60 percent of users access the OnDemand system using a Web application. The other 40 percent of users are internal Windows client users. The company processes just under 10 million retrievals per month, with peaks at 500 000 per day.

Backup, recovery, and high availability approach
The company has the following backup procedure, high availability configuration, and disaster recovery plan.

Backup procedure
A full offline OnDemand database backup is performed every weekend on the Library Server and is stored in TSM. Because the full offline backup takes a snapshot of the database at a point in time, archive logs are not required to restore the database. The TSM database is also backed up in its entirety while the OnDemand application is offline. Because application groups are configured to store data to cache and to TSM defined nodes at load time, the cache file systems defined in the ars.cache file do not need to be backed up.

High availability configuration
This global voice and data communications company is configured to run OnDemand in a distributed Library Server and Object Server system configuration with TSM, using a standby node for high availability.

The software installed on each node includes:

OnDemand Library ServerAIX 5.2, HACMP 4.5, DB2 EEE 7.2, Content Manager OnDemand 7.1.1
OnDemand Object ServerAIX 5.2, HACMP 4.5, TSM Server and Client API 5.2, Content Manager OnDemand 7.1.1
standby nodeAIX 5.2, HACMP 4.5, DB2 EEE 7.2, TSM Server and Client API 5.2, Content Manager OnDemand 7.1.1

The Library Server, Object Server, and the standby node all have a physical attachment to the IBM ESS subsystem (shared disk) via a RS232 cable. The standby node is inactive. It waits for either the Library Server or the Object Server to fail and leave the cluster. The standby node then takes over the role of the failed node.

The following diagram illustrates the company’s high availability plan.

High availability plan

When all nodes on the system are healthy, the Library Server owns the resource group A, which consists of the OnDemand database and database log files. The primary Object Server owns the resource group B, which consists of the OnDemand managed cache file systems, TSM database, and TSM logs. Should the Library Server node fail and leave the cluster, the standby node assumes control of the resource group A and functions as the Library Server. Should the Object Server node fail and leave the cluster, the standby node assumes control of the resource group B and functions as the Object Server.

The advantage of this configuration is that the redundant hardware is minimized by having the standby node act as failover for both the Library Server and Object Server. The standby node has three network interfaces and separate physical connections to each server node’s external disk. Therefore, the standby node can, if necessary, take over for both servers concurrently. Note that in this situation, the cluster’s performance would most likely degrade when the standby node functions in both roles.

Disaster recovery plan
The disaster recovery plan is to store backups at an offsite location. There is no mirrored site. Rather, it is expected that if a catastrophic event occurs at the data center, an identical hardware configuration would be created, and backup and storage volumes would be restored.

Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.

Follow IBM Redbooks

Follow IBM Redbooks