-->
Skip to main content

IBM WebSphere Application Server Network Deployment V5.0 - Tuning Plug-in Workload Management Failover

Redbooks logo

Abstract

This tip covers the ConnectTimeout and RetryInterval settings which can be added to the plugin-cfg.xml file to tune plug-in workload management failover.

Contents

ConnectTimeout setting

When a cluster member exists on a machine that is removed from the network (because its network cable is unplugged or it has been powered off, for example), the plug-in, by default, cannot determine the cluster member's status until the operating system TCP/IP time-out expires. Only then will the plug-in be able to forward the request to another available cluster member.

It is not possible to change the operating system time-out value without side effects. For instance, it might make sense to change this value to a low setting so that the plug-in can fail over quickly.

However, the time-out value on some of the operating systems is not only used for outgoing traffic (from Web server to application server) but also for incoming traffic. This means that any changes to this value will also change the time it takes for clients to connect to your Web server. If clients are using dial-up or slow connections, and you set this value lower, they will not be able to connect.

To overcome this problem, WebSphere Application Server V5.0 offers an option within the plug-in configuration file that allows you to bypass the operating system time-out.

It is possible to add an attribute to the Server element called ConnectTimeout, which makes the plug-in use a non-blocking connect. Setting ConnectTimeout to a value of 0 is equal to not specifying the ConnectTimeout attribute, that is, the plug-in performs a blocking connect and waits until the operating system times out. Set this attribute to an integer value greater than zero to determine how long the plug-in should wait for a response when attempting to connect to a server. A setting of 10 will mean that the plug-in waits for ten seconds to time out.

To determine what setting should be used, you need to take into consideration how fast your network and servers are. Complete some testing to see how fast your network is, and take into account peak network traffic and peak server usage. If the server cannot respond before the ConnectTimeout, the plug-in will mark it as down.

Since this setting is determined on the Server tag, you can set it for each individual cluster member. For instance, say you have a system with four cluster members, two of which are on a remote node. The remote node is on another subnet and it sometimes takes longer for the network traffic to reach it. You might want to set up your cluster as shown in the following example.

<ServerCluster Name="PluginCluster" RetryInterval="120">
<Server CloneID="u87ul5ed" LoadBalanceWeight="2" Name="PluginMember1">
<Transport Hostname="app1.itso.ibm.com" Port="9084" Protocol="http"/>
</Server>
<Server CloneID="u7ijgocl" LoadBalanceWeight="6" Name="PluginMember2" ConnectTimeout="20">
<Transport Hostname="app2.itso.ibm.com" Port="9084" Protocol="http"/>
</Server>
<Server CloneID="u8cjs53o" LoadBalanceWeight="2" Name="PluginMember3" ConnectTimeout="10">
<Transport Hostname="app3.itso.ibm.com" Port="9085" Protocol="http"/>
</Server>
</ServerCluster>

PluginMember1 is on the same machine as the Web server. This means that there is no need to use a non-blocking connect. PluginMember2 is on a remote server in a slower part of the network, so it has a higher ConnectTimeout setting to compensate for this. Finally, PluginMember3 is on a faster part of the network, so it is safer to set the ConnectTimeout to a lower value.

If a non-blocking connect is used, you will see a slightly different trace output. This example shows what you will see in the plug-in trace if a non-blocking connect is successful.

...
TRACE: ws_common: websphereGetStream: Have a connect timeout of 10; Setting socket to not block for the connect
TRACE: errno 55
TRACE: RET 1
TRACE: READ SET 0
TRACE: WRITE SET 32
TRACE: EXCEPT SET 0
TRACE: ws_common: websphereGetStream: Reseting socket to block
...

RetryInterval setting

There is a setting in the plug-in configuration file that allows you to specify how long to wait before retrying a server that is marked as down. This is useful in avoiding unnecessary attempts when you know that the server is unavailable. The default is 60 seconds.

This setting is specified in the ServerCluster element using the RetryInterval attribute. An example of this in the plugin-cfg.xml file is as follows:

<ServerCluster Name="PluginCluster" RetryInterval="120">

This would mean that if a cluster member were marked as down, the plug-in would not retry it for 120 seconds.

There is no way to recommend one specific value; the value chosen depends on your environment. For example, if you have numerous cluster members, and one cluster member being unavailable does not affect the performance of your application, then you can safely set the value to a very high number.

Alternatively, if your optimum load has been calculated assuming all cluster members to be available or if you do not have very many, then you will want your cluster members to be retried more often to maintain the load.

Also, take into consideration the time it takes to restart your server. If a server takes a long time to boot up and load applications, then you will need a longer retry interval.

Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.

Profile

Publish Date
22 July 2003


Rating: Not yet rated


Author(s)

IBM Form Number
TIPS0240