Using Linux-HA to Create a Highly Available J2EE Environment Running on Linux on System z

Abstract

This IBM Technote describes the configuration that has been set up to create a highly available Java 2, Enterprise Edition (J2EE) application serving environment using Debian on an IBM S/390 system, Tomcat, and Linux-HA, to demonstrate Linux high availability (Linux-HA) solutions during a customer workshop.

Contents


As Linux® on System z® becomes more prevalent and mainstream in the industry, the need for it to deliver higher levels of availability is increasing. IBM® supports the High Availability Linux (Linux-HA) project, which provides high availability functions to the open source community. The goal of this IBM Technote is to describe what was put in place to demonstrate Linux high availability solutions. This Technote is not meant to replace official Linux-HA documentation, which you can find at the following address:
http://www.linux-ha.org/

Environment
The environment is made of two guests running Debian on an IBM S/390® system (basic installation, no graphical environment) under z/VM 5.4 RSU2. Figure 1 illustrates this environment.
Figure 1. Our environment


The two guests are connected to the external world by using a virtual network interface card connected to a z/VM virtual switch (VSWITCH). Another interface is connected to a private Hipersocket guest local area network (LAN) that will be used for the clustering. Both guests are connected to the network by using the same interfaces, eth0 being the external connection and hsi0 being the private network connection:

debian-ca:~# cat /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback

# The primary network interface
auto eth0 hsi0
iface eth0 inet static
address 10.XX.XXX.XXX
netmask 255.255.255.0
network 10.XX.XXX.XXX
broadcast 10.XX.XXX.255
gateway 10.XX.XXX.254
# dns-* options are implemented by the resolvconf package, if installed
dns-nameservers A.B.C.D W.X.Y.Z
dns-search mop.fr.ibm.com

iface hsi0 inet static
address 192.168.66.1
netmask 255.255.255.0
network 192.168.66.0
broadcast 192.168.66.255


Installation and configuration of packages

Both nodes must run the Tomcat J2EE application server coupled with the Apache2 Web server through apache2 mod_jk. Both of these applications are placed under Heartbeat control. Installation of these packages is done through the apt-get command, which installs the requested packages and the dependencies:

debian-ca:~# apt-get install tomcat5.5 apache2 libapache2-mod-jk hearbeat-2

Since the Debian Tomcat 5.5 application server initialization script does not work properly on S/390 systems (see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=517832 for details), we removed it from the Debian boot sequence as indicated in the following example. Moreover, Tomcat 5.5 is placed under Heartbeat control. Therefore, there is no need to start it during the initialization process.

debian-ca:~# update-rc.d -f remove tomcat5.5
Removing any startup links for /etc/init.d/tomcat5.5

We do the same with the Apache2 Web server, because it is also placed under Heartbeat control:

debian-ca:~# update-rc.d -f remove apache2
Removing any startup links for /etc/init.d/apache2

Heartbeat configuration

The Heartbeat main configuration file is /etc/ha.d/ha.cf. Our setup is quite simple and is shown as follows:

logfacility daemon
keepalive 1
deadtime 10
warntime 5
initdead 120
udpport 694
bcast hsi0
auto_failback on
node debian-ca
node debian-ca2
respawn hacluster /usr/lib/heartbeat/ipfail
use_logd yes

The two nodes that are being clustered are debian-ca and debian-ca2. They monitor each other through a ping on the hsi0 interface. This way, we ensure that no monitoring traffic disturbs the production traffic that flows through eth0. A node is considered dead if it does not answer to a ping after 10 seconds, while the first warning happens after 5 seconds.

Heartbeat resources are defined in /etc/ha.d/haresources file and are shown as follows:

debian-ca 10.XX.XXX.XXX apache2 tomcat

From the /etc/ha.d/haresources resource file entry, we see that the master node is named debian-ca, the virtual IP address to use is 10.XX.XXX.XXX, and the scripts used for control are apache2 and tomcat. By default, the scripts are looked up in /etc/init.d and /etc/ha.d/resource.d. Apache2 is the standard Debian startup script (/etc/init.d/apache2), while tomcat is our  home-made Tomcat 5.5 startup script (/etc/ha.d/resource.d/tomcat) that we created from a sample skeleton in the /etc/init.d/skeleton directory:

#!/bin/sh -e
### BEGIN INIT INFO
# Provides: Tomcat applications
# Required-Start:
# Required-Stop:
# Default-Start: S
# Default-Stop: 0 6
# Short-Description: Start Tomcat applications
### END INIT INFO

PATH="/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin"

. /lib/lsb/init-functions

case "$1" in
start)
log_action_begin_msg "Starting Tomcat applications"
/usr/share/tomcat5.5/bin/startup.sh -config /etc/tomcat5.5/server.xml
/usr/share/tomcat5.5/bin/startup.sh -config /etc/tomcat5.5/server-1.xml
/usr/share/tomcat5.5/bin/startup.sh -config /etc/tomcat5.5/server-2.xml
/usr/share/tomcat5.5/bin/startup.sh -config /etc/tomcat5.5/server-3.xml
/usr/share/tomcat5.5/bin/startup.sh -config /etc/tomcat5.5/server-4.xml
/usr/share/tomcat5.5/bin/startup.sh -config /etc/tomcat5.5/server-5.xml
/usr/share/tomcat5.5/bin/startup.sh -config /etc/tomcat5.5/server-6.xml
/etc/init.d/ps-watcher start

log_action_end_msg $?
;;

stop)
log_action_begin_msg "Stopping Tomcat applications"
/etc/init.d/ps-watcher stop
/usr/share/tomcat5.5/bin/shutdown.sh -config /etc/tomcat5.5/server.xml
/usr/share/tomcat5.5/bin/shutdown.sh -config /etc/tomcat5.5/server-2.xml
/usr/share/tomcat5.5/bin/shutdown.sh -config /etc/tomcat5.5/server-3.xml
/usr/share/tomcat5.5/bin/shutdown.sh -config /etc/tomcat5.5/server-4.xml
/usr/share/tomcat5.5/bin/shutdown.sh -config /etc/tomcat5.5/server-5.xml
/usr/share/tomcat5.5/bin/shutdown.sh -config /etc/tomcat5.5/server-6.xml
/usr/share/tomcat5.5/bin/shutdown.sh -config /etc/tomcat5.5/server-1.xml

log_action_end_msg $?
;;

*)
echo "Usage: /etc/init.d/tomcat {start|stop}"
exit 1
;;
esac

exit 0

Logs are written to the /var/log/daemon.log file, as stated by the log facility statement. The Log facility « daemon » is defined in the /etc/rsyslog.conf file. The last part of the configuration is the authkeys file, which is in the /etc/ha.d/authkeys directory, as follows:

debian-ca:/etc/ha.d# cat authkeys
auth 1
1 sha1 4d94499ce29839079340b1367ff8daab0f3a3acb

The authkeys file is used to authenticate traffic between cluster members by using key 1 (of type sha1) to sign outgoing packets. The configuration must be identical on both nodes. Therefore, copy the ha.cf, haresources, and authkeys files from the debian-ca node to the debian-ca2 node, and restart heartbeat on both nodes.

Heartbeat configuration validation
To validate your Heartbeat configuration, after restart, on the debian-ca node, configure the network as follows:

debian-ca:~# ifconfig -a
eth0 Link encap:Ethernet HWaddr 02:00:00:00:00:0f
inet addr:10.XX.XXX.XXX Bcast:10.XX.XXX.255 Mask:255.255.255.0
inet6 addr: fe80::200:0:100:f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1492 Metric:1
RX packets:1774 errors:0 dropped:0 overruns:0 frame:0
TX packets:1128 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:143887 (140.5 KiB) TX bytes:197458 (192.8 KiB)

eth0:0 Link encap:Ethernet HWaddr 02:00:00:00:00:0f
inet addr:10.XX.XXX.XXX Bcast:10.XX.XXX.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1492 Metric:1

hsi0 Link encap:Ethernet HWaddr 02:00:00:00:00:0d
inet addr:192.168.66.1 Bcast:192.168.66.255 Mask:255.255.255.0
inet6 addr: fe80::ff:fe00:d/64 Scope:Link
UP BROADCAST RUNNING NOARP MULTICAST MTU:8192 Metric:1
RX packets:110217 errors:0 dropped:0 overruns:0 frame:0
TX packets:110218 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:26693319 (25.4 MiB) TX bytes:28204179 (26.8 MiB)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:42 errors:0 dropped:0 overruns:0 frame:0
TX packets:42 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3136 (3.0 KiB) TX bytes:3136 (3.0 KiB)

Here, eth0:0 is the virtual IP address that is controlled by Heartbeat and is moved to the backup node in case something goes wrong with the first one. This virtual IP address is to be used for configuring Apache2, Tomcat, and the application. You can also add it to the /etc/hosts directory and refer to the associated host name in the configuration.

The output of ps shows that the 7 JVM (Tomcat application servers) also started:

debian-ca:~# ps faux|grep java|wc -l
8

Heartbeat testing
To test Heartbeat, from an authorized z/VM machine, enter a force command to shut down the master node of the heartbeat configuration uncleanly. The /var/log/daemon.log file of the second node contains the details of the failover sequence:

debian-ca2:~# tail -f /var/log/daemon.log
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: Current arena value: 0
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: MSG stats: 0/0 ms age 172504626 [pid1371/HBWRITE]
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: cl_malloc stats: 334/218034 41336/19761 [pid1371/HBWRITE]
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: RealMalloc stats: 49652 total malloc bytes. pid [1371/HBWRITE]
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: Current arena value: 0
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: MSG stats: 0/0 ms age 172504626 [pid1373/HBREAD]
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: cl_malloc stats: 334/822171 41368/19785 [pid1373/HBREAD]
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: RealMalloc stats: 42228 total malloc bytes. pid [1373/HBREAD]
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: Current arena value: 0
Mar 25 10:41:47 debian-ca2 heartbeat: [1337]: info: These are nothing to worry about.

The master node (debian-ca) is still there, and everything is working as expected, but at this point, it is logged off by force, as indicated by the message "WARN: node debian-ca: is dead."

Mar 25 13:52:48 debian-ca2 heartbeat: [1337]: WARN: node debian-ca: is dead
Mar 25 13:52:48 debian-ca2 ipfail: [1400]: info: Status update: Node debian-ca now has status dead

Heartbeat on the second node (debian-ca2) has detected the death of the first node (debian-ca):

Mar 25 13:52:48 debian-ca2 heartbeat: [1337]: WARN: No STONITH device configured.
Mar 25 13:52:48 debian-ca2 heartbeat: [1337]: WARN: Shared disks are not protected.

No data is being generated on the nodes in the cluster, and no protection on shared disks needs to be implemented.

The process of resources acquisition by the second node (debian-ca2) now begins:

Mar 25 13:52:48 debian-ca2 heartbeat: [1337]: info: Resources being acquired from debian-ca.

The second node has acquired the virtual IP:

Mar 25 13:52:48 debian-ca2 heartbeat: [8800]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL
Mar 25 13:52:48 debian-ca2 heartbeat: [1337]: info: Link debian-ca:hsi0 dead.
Mar 25 13:52:48 debian-ca2 harc[8800]: [8814]: info: Running /etc/ha.d/rc.d/status status
Mar 25 13:52:48 debian-ca2 heartbeat: [8801]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys debian-ca2] to acquire.
Mar 25 13:52:48 debian-ca2 heartbeat: [1337]: debug: StartNextRemoteRscReq(): child count 1
Mar 25 13:52:48 debian-ca2 mach_down[8826]: [8847]: info: Taking over resource group 10.3.57.247
Mar 25 13:52:48 debian-ca2 ipfail: [1400]: info: NS: We are dead. :<
Mar 25 13:52:48 debian-ca2 ResourceManager[8848]: [8859]: info: Acquiring resource group: debian-ca 10.XX.XXX.XXX apache2 caamt
Mar 25 13:52:48 debian-ca2 IPaddr[8871]: [8902]: INFO: Resource is stopped
Mar 25 13:52:48 debian-ca2 ResourceManager[8848]: [8918]: info: Running /etc/ha.d/resource.d/IPaddr 10.XX.XXX.XXX start
Mar 25 13:52:48 debian-ca2 ResourceManager[8848]: [8919]: debug: Starting /etc/ha.d/resource.d/IPaddr 10.XX.XXX.XXX start
Mar 25 13:52:48 debian-ca2 IPaddr[8937]: [8967]: INFO: Using calculated nic for 10.3.57.247: eth0
Mar 25 13:52:48 debian-ca2 IPaddr[8937]: [8972]: INFO: Using calculated netmask for 10.3.57.247: 255.255.255.0
Mar 25 13:52:48 debian-ca2 IPaddr[8937]: [8977]: DEBUG: Using calculated broadcast for 10.XX.XXX.XXX: 10.XX.XXX.255
Mar 25 13:52:48 debian-ca2 IPaddr[8937]: [8994]: INFO: eval ifconfig eth0:0 10.3.57.247 netmask 255.255.255.0 broadcast 10.XX.XXX.255
Mar 25 13:52:48 debian-ca2 IPaddr[8937]: [8999]: DEBUG: Sending Gratuitous Arp for 10.3.57.247 on eth0:0 [eth0]
Mar 25 13:52:48 debian-ca2 IPaddr[8920]: [9013]: INFO: Success
Mar 25 13:52:48 debian-ca2 ResourceManager[8848]: [9014]: debug: /etc/ha.d/resource.d/IPaddr 10.XX.XXX.XXX start done. RC=0

The Apache2 Web server has started on the second server:

Mar 25 13:52:48 debian-ca2 ResourceManager[8848]: [9034]: info: Running /etc/init.d/apache2 start
Mar 25 13:52:48 debian-ca2 ResourceManager[8848]: [9035]: debug: Starting /etc/init.d/apache2 start
Mar 25 13:52:49 debian-ca2 ResourceManager[8848]: [9046]: debug: /etc/init.d/apache2 start done. RC=0

Tomcat instances have started on the second node:

Mar 25 13:52:49 debian-ca2 ResourceManager[8848]: [9118]: info: Running /etc/ha.d/resource.d/tomcat start
Mar 25 13:52:49 debian-ca2 ResourceManager[8848]: [9119]: debug: Starting /etc/ha.d/resource.d/tomcat start
Mar 25 13:52:49 debian-ca2 ipfail: [1400]: info: Link Status update: Link debian-ca/hsi0 now has status dead
Mar 25 13:52:50 debian-ca2 ipfail: [1400]: info: We are dead. :<
Mar 25 13:52:50 debian-ca2 ipfail: [1400]: info: Asking other side for ping node count.
Mar 25 13:52:50 debian-ca2 ipfail: [1400]: debug: Message [num_ping] sent.
Mar 25 13:53:02 debian-ca2 ResourceManager[8848]: [9319]: debug: /etc/ha.d/resource.d/tomcat start done. RC=0

All resources have been re-acquired by the backup node, and the applications are back to service. When the first node re-appears, the resources are switched back to this node, thanks to the auto_failback yes configuration directive in the /etc/ha.d/ha.cf file:

Mar 25 13:53:02 debian-ca2 mach_down[8826]: [9320]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
Mar 25 13:53:02 debian-ca2 mach_down[8826]: [9324]: info: mach_down takeover complete for node debian-ca.
Mar 25 13:53:02 debian-ca2 heartbeat: [1337]: info: mach_down takeover complete.


For more information about the Linux-HA configuration and use cases, see the Linux-HA Web site at the following address:
http://www.linux-ha.org

See also the IBM Redbooks publication Achieving High Availability on Linux for System z with Linux-HA Release 2, SG24-7711.

Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.

Profile

Publish Date
06 May 2009

Last Update
06 May 2009


Rating: Not yet rated


Author(s)

IBM Form Number
TIPS0707