IBM SmartCloud Control Desk: High Availability and Disaster Recovery Configurations

IBM Redbooks Solution Guide

Abstract

In today’s global environment, more organizations must reduce their downtime to the minimum possible and achieve continuous availability of their systems. Products that are based on the IBM® Tivoli® Process Automation Engine, such as IBM Maximo® Asset Management, IBM Maximo Industry Solutions, and IBM SmartCloud® Control Desk, often play a role in such environments, and thus also have continuous availability requirements. As part of this situation, it is important to understand the high availability (HA) and disaster recovery (DR) capabilities of IBM SmartCloud Control Desk and IBM Maximo Products, and to ensure that all the components of an HA/DR solution are configured and tested to handle outages. In this IBM Redbooks® Solution Guide, by outlining some of the topologies that we tested, and the documentation that we created, we hope to demonstrate how robust the IBM SmartCloud Control Desk and IBM Maximo infrastructure can be.

Contents

In today’s global environment, more organizations must reduce their downtime to the minimum possible and achieve continuous availability of their systems. Products that are based on the IBM® Tivoli® Process Automation Engine, such as IBM Maximo® Asset Management, IBM Maximo Industry Solutions, and IBM SmartCloud® Control Desk, often play a role in such environments, and thus also have continuous availability requirements. As part of this situation, it is important to understand the high availability (HA) and disaster recovery (DR) capabilities of IBM SmartCloud Control Desk and IBM Maximo Products, and to ensure that all the components of an HA/DR solution are configured and tested to handle outages. In this IBM Redbooks® Solution Guide, by outlining some of the topologies that we tested, and the documentation that we created, we hope to demonstrate how robust the IBM SmartCloud Control Desk and IBM Maximo infrastructure can be.

High availability (HA) and disaster recovery (DR) are important for organizations that are running mission-critical applications and that must maintain high levels of access to system content. By implementing a highly available environment for your solution, you can minimize the effects that a component or overall system failure can have on daily operations. By implementing a multisite disaster recovery environment for your solution, you can minimize the effects of a complete solution failure on a site because of a natural or man-made disaster.

Figure 1 shows a local high availability topology example.


Figure 1. Local high availability topology example

This IBM Redbooks Solution Guide provides an overview of the configuration possibilities for a high availability and disaster recovery solution with IBM Maximo and SmartCloud Control Desk. More information can be found in High Availability and Disaster Recovery Configurations for IBM SmartCloud Control Desk and IBM Maximo Products, SG24-8109.

Did you know?
The availability of any application is measured by its overall uptime. If the users experience errors, timeouts because of the system load, or the application cannot connect to the database, then the application is not considered highly available. Network outages, hardware failure, operating system or other software-related errors, and power interruptions are examples of failure that can lead to unavailability to the users. If there are such failures, the highly available solution must be able to perform the following tasks:

  • Shield the application from the failure without appreciable performance degradation.
  • Fail over to another server on the cluster.
  • Recover from the failure to return the application to normal operations.

In addition, in a highly available application, the impact of maintenance activities on the availability must be minimized.

Business value
The most common business drivers for increased availability of a particular solution are cost of downtime, service-level agreements (SLAs), and user satisfaction. Although these drivers are the most common ones, other business drivers might exist:
  • Cost of system outage

    Critical applications and processes can be impacted during system downtime, which can lead to potential loss of revenue as operations might be at a standstill. The benefits of creating system redundancy often outweigh the financial impact of an outage. Maintaining a high availability and disaster recovery solution can be compared with having a good insurance policy.
  • Service-level agreement

    SmartCloud Control Desk is used to manage enterprise assets, IT environments, and availability of systems. These tasks are commonly referenced in SLAs. Therefore, contractual obligations can mandate a certain level of system availability to meet SLAs.
  • User satisfaction

    Frequent and unexpected outages during system usage can directly impact user satisfaction. Users who rely on Maximo and SmartCloud Control Desk for daily operations might lose confidence in the solution if their productivity is affected.

Solution overview
The solution relies on the high availability of the underlying components, such as web server, application server, database, LDAP server, and the Tivoli Process Automation Engine. A cluster manager can be used to monitor and automate system failover. You can configure the components of your solution environment for high availability in various ways. Different high availability configurations handle failover differently. For this reason, it is important to choose the correct configuration to suit the needs for your organization.

High Availability and Disaster Recovery Configurations for IBM SmartCloud Control Desk and IBM Maximo Products, SG24-8109 explains how to design and configure three potential topologies:
  • Local high availability topology

    Local high availability implies redundancy and failover of all the essential components that are contained within a single site. Although local high availability does not provide the disaster protection of a multisite topology, it does provide protection from system, process, and hardware failures. By defining and eliminating all single points of failure, you can strengthen your IBM SmartCloud Control Desk or Maximo infrastructure and decrease system downtime.

    Local high availability is also a great place to start when you try to achieve a full disaster recovery plan. For example, when multiple sites are introduced, having local high availability on these sites can prevent the need to execute a full disaster recovery procedure when there is just a component failure on the overall solution.
  • Active-passive disaster recovery

    Active-passive disaster recovery implies complete site replication to an alternative location so that services can be restored when the primary site goes down. The second site is not processing user transactions and is sitting idle until a failover is required. This topology can be thought of as a type of insurance policy for the IBM SmartCloud Control Desk and Maximo infrastructure.

    An active-passive site configuration can provide a company with a contingency plan when an unexpected failure occurs. File system, database, and backup/restore procedures can be implemented to keep the passive site synchronized with the primary. The technologies that are used depend on the distance, budget, and synchronization state that is required by the organization. Having a reliable, high-speed network infrastructure and link between the sites is one of the most important elements in the plan.
  • Hybrid-active disaster recovery

    For some organizations, having a completely passive site means that there are resources that are not used unless a disaster recovery is needed, so there is a desire to use this infrastructure as much as possible. The ability to bring certain services online and process user requests is a possibility in a multisite disaster recovery topology. You should consider many factors before you choose this type of topology.

    IBM SmartCloud Control Desk and Maximo do not support a completely active-active environment where there are primary databases that are located at each site. For this reason, the application in both sites needs to point to the same database server. This active database can be replicated by using IBM DB2® HADR or other database replication technologies to Site B. If Site A fails, the applications on Site B need to be reconfigured to point to the database locally.

    Figure 2 shows a hybrid-active topology with a remote database connection.


    Figure 2. Example hybrid-active topology showing a remote database connection

Solution architecture
As Maximo and SmartCloud Control Desk are applications that are deployed to middleware, the overall availability of the system depends on the configuration of these middleware components. This document briefly describes some of the features that are available from the middleware. More specific configuration details can be found in High Availability and Disaster Recovery Configurations for IBM SmartCloud Control Desk and IBM Maximo Products, SG24-8109.

Here are the middleware components that are involved in the infrastructure:
  • Database

    The database stores all the data that is used by the application and is a critical component for the application. Database technologies that can be implemented include DB2 HA Feature with Shared Disk, DB2 HADR log shipping, and DB2 pureScale® active clustering. With DB2 HADR, the standby database can be opened in read-only mode and used as a reporting replica, as shown in Figure 3. Oracle Real Application Clusters (RAC) and Active DataGuard are also supported.


    Figure 3. DB2 HADR configuration with a read-only replica
  • Application server

    The application server hosts the application and connects to the database to deliver the UI and all its features to users. The most common method for HA and DR for the application server is using IBM WebSphere® Application Server clustering across multiple nodes. With clustering, you can have multiple Java virtual machines (JVMs) on each node, but it is important to spread across multiple nodes to eliminate single points of failure. Clustering your application server gives you the benefit of high availability combined with increased performance and scalability.
  • Web server

    The web server can be used as a single access point for users and can also distribute users across many application server Java virtual machines (JVMs). It is important to have at least one other web server node that can take over if the primary fails. Often, organizations eliminate the web server component and replace it with a load balancer (or use load balancers with web servers). If so, it is important that the load balancer itself is not a single point of failure.
  • User directory (LDAP)

    If your organization uses LDAP security for the Maximo or SmartCloud Control Desk, this LDAP component must also be made highly available. Maximo and SmartCloud Control Desk support using Tivoli Directory Server and Microsoft Active Directory as suitable LDAP servers. Each product has its own user replication techniques and should be implemented as a prerequisite to the topology configuration.
  • Cluster manager

    A cluster manager is an optional component, but can be introduced in to the topology to help monitor, detect failure of, and automatically fail over applications, processes, IP addresses, and service to other nodes. Cluster managers can greatly decrease the impact of a failure, as it automatically takes action when a failure occurs. The preferred cluster manager solution in this configuration is IBM Tivoli System Automation for Multiplatforms (SA MP).

Eliminating all single points of failure by enabling high availability and disaster recovery for each of these components help you meet your organization's need for a continuously available application.

Usage scenarios
The high availability and disaster recovery solutions are often implemented when the costs of system downtime exceed the cost of adding redundancy to the infrastructure. Here are some example scenarios:
  • Scenario 1 – Local high availability

    An organization determines that they require redundancy and failover capabilities to make their SmartCloud Control Desk application available to meet specific service operation levels. This particular organization does not have a second site or the capital to introduce a second site. Implementing a local high availability topology can allow for system and process failure without the costs that are associated with a second site
  • Scenario 2 – Multisite disaster recovery

    An organization has been running a local high availability solution for some time now and has determined that they require a second site for protection from complete site failure or a disaster. This organization is in a region that is particularly vulnerable to severe weather conditions. By adding another site to the topology, the application can be restored if a site becomes inactive because of a disaster. It is recommended that the second site be far enough away that the same severe weather conditions do not affect both sites.

Integration

Maximo and SmartCloud Control Desk both can integrate with external systems through the Integration Framework. Using the WebSphere Application Server built-in Service Integration Bus (SIB), a system can process inbound and outbound transactions through message queues. This SIB configuration can be duplicated from one site to another and combined with database and file system replication to allow transactions to fail over from one site to the other during a disaster recovery. Without the ability to fail over transactions, any messages that were in the queues are stuck until the system is brought back online. This configuration is shown in Figure 4.


Figure 4. Service Integration Bus Active/Passive configuration

High Availability and Disaster Recovery Configurations for IBM SmartCloud Control Desk and IBM Maximo Products, SG24-8109 goes into further about on how to configure this configuration. It also touches on using IBM WebSphere MQ versus the SIB.

Supported platforms
The supported middleware versions and platforms for this solution depends on the version of Maximo or SmartCloud Control Desk that is being implemented. Information about supported platforms for these products can be found in the Product Configuration Matrix at http://www.ibm.com/support/docview.wss?uid=swg27014419.

Ordering information
This product is available only through IBM Passport Advantage®. It is not available as a shrink-wrapped product. Detailed ordering information is available at the IBM Offering Information website (see the "Related information" section).

Related information
For more information, see the following documents:

Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.

Profile

Publish Date
06 May 2013


Rating: Not yet rated


Author(s)

IBM Form Number
TIPS1013