Implementation Considerations for Pure Versus Sampled Events in IBM Tivoli Monitoring 6.1

Abstract

When implementing IBM Tivoli Monitoring 6.1, it is important to understand the difference between pure versus sampled events. This Technote discusses this important implementation consideration.

Contents

When implementing IBM Tivoli Monitoring 6.1, it is important to understand the difference between pure versus sampled events. This Technote discusses this important implementation consideration.


Pure versus sampled events with impact to the IBM Tivoli Monitoring 6.1 console user

The concept of a pure event can be understood as a stateless event. There is no recognition of a state of a resource in a pure event. An easily understood example is when an event is read from a Windows log file. In the case of the log file monitor, the agent is merely matching lines from the log that it to which it was configured to forward to the TEMS. The agent is not keeping track of anything to compare events and there is no evaluation other than matching.

There is also no concept of an interval when building a situation to detect pure events, although there is some configuration of a time interval possible for most agents that detects pure events in most cases ... but that is for all operation of the agent and not on a situation-by-situation basis as with sampled events.

The sampled event, in contrast, has a state. The current state of the resource at sample time has a value and a state against which it is being measured. If instead of reading the log for an event we evaluated the current status of the storage application process for up/down, this would be a sampled event.

We evaluate the status (what is it) and compare against some criteria (up/down). When the monitor determines that the criteria were met. The sampled situation becomes true and thus appears on the event console. When it is resolved (or no longer true), it is false.

When a pure event comes to the console, it is there until acted on by a human operator (if you are managing the events from the console) unless you include an UNTIL setting to expire them at a later time. In contrast, when a situation reports a sampled event and it comes to the console, it is there until the conditions of the situation change that would result in it going back to false. Sampled events cannot and should not be closed by the console, by IBM Tivoli Enterprise Console, or by the UNTIL tab in the situation editor.

A small business support team (or any organization without an enterprise view tool) using the events console needs a view created in a workspace for the console operators that displays only pure events for events they are required to action. A second view might display sampled events to alert the console operator to the fact that they cannot close the events but might want to investigate situations that are currently visible in that view. Figure-1 illustrates such a console. This occurs via the creation of a custom workspace for the console operator.

A custom workspace with separate pure and sampled event views
Figure-1 A custom workspace with separate pure and sampled event views

Pure versus sampled events and customer impact

If you are an enterprise client with a central event console such as IBM Tivoli Enterprise Console, pure events are presented and treated in a way that is most closely related to the events that you received from the Tivoli Enterprise Console Logfile Adapter, the Windows event log adapter, the SNMP adapter, and other related adapters. There is no schedule for the events. They arrive according to the logic of the agent that is sending them. The sampling, in that sense, occurs at the agent’s discretion and may vary from agent to agent.

What to do with these events is easily recognized and fits within our traditional notion of enterprise monitoring. Integration with IBM Tivoli Enterprise Console can result in pure events creating IBM Tivoli Enterprise Console events.

Important: It is permissible for pure events to be closed by rules from IBM Tivoli Enterprise Console. It is not advisable (although currently there is nothing in our evaluated version of the product that disables it from doing so) to close sampled events. As is mentioned later, this causes problems as the event is closed while the situation remains true and it does not recover until the situation becomes false and then true again.

While one strategy might be to not generate an IBM Tivoli Enterprise Console event for a sampled situation (not forwarding the events to IBM Tivoli Enterprise Console), this is hardly a maintainable strategy in an enterprise environment, especially because sampled events are probably as common as pure events (or more so).

For sampled events, it is less clear for an enterprise customer (with IBM Tivoli Enterprise Console or another enterprise console product), especially if the intent is to try to use the IBM Tivoli Monitoring 6.1 event console for some adjunct purposes to IBM Tivoli Enterprise Console. The questions start to rise: What is our policy for exploring these sampled situations? When do we decide that a sampled event becomes an IBM Tivoli Enterprise Console event or trouble ticket?

Note: If the decision is made to not use the IBM Tivoli Monitoring 6.1 event console as some adjunct tool to an existing Tivoli Enterprise Console implementation (that is, not installing the IBM Tivoli Enterprise Console event synchronization), the point is moot, and all sampled situations should send IBM Tivoli Enterprise Console events and then be set to expire in the UNTIL tab (for pure). You must install the BAROC file for IBM Tivoli Monitoring 6.1 at the IBM Tivoli Enterprise Console server. Set up the TEMS to use the IBM Tivoli Enterprise Console event integration facility so that the events are sent to IBM Tivoli Enterprise Console.

If no one looks at the IBM Tivoli Monitoring 6.1 console, the current status of the situation is not an issue. This puts you at process parity with IBM Tivoli Monitoring 5.x and Distributed Monitoring 3.7. At that point, two-way communication between IBM Tivoli Enterprise Console and IBM Tivoli Monitoring 6.1 is not required or desired. The one-way communication delivers the events.

For customers who choose to use the IBM Tivoli Monitoring 6.1 console in an enterprise environment, we can offer the best practices from some OMEGAMON XE customers. Many who used the OMEGAMON XE product chose to implement some logic for sampled events where possible. This logic is a suggestion about how you could choose to deal with these sampled events.

Your organization should explore the concept of these events with the administrators of the application and systems involved in the sampled events to decide whether this is the appropriate course of action. It is possible that by the time the sampled event indicates an issue and the person arrives on the scene, the sampled event could become false again. They need to understand how the product functions in order to reconcile the current state with the fact that they were paged or got a trouble ticket or whatever the action.

The following outlines the suggested best practice for dealing with sampled events:

When the sampled situation becomes true, attempt to resolve the issue via automation. You do not want operators to try to resolve these issues (as indicated before) until you are sure that the sampled situation will not become false within your tolerance limits.

Via policy, wait for the next sample to determine whether the automation resolved the situation. Even if the event that the situation addresses was resolved, it will not make the situation false until it samples again. To make this most efficient, the time between samples should allow for the automation to have a chance to resolve the issue. In other words, if the automation requires stopping and restarting a process and it takes x seconds for the process to recycle, the sampling interval should not be less than x seconds.

If the situation is not false after the next iteration, take action (such as generate a trouble ticket or open an IBM Tivoli Enterprise Console event).

If automation is not possible, you want the policy to notify someone immediately so that a resolution process starts as soon as possible (according to agreed response times negotiated in your Service Level Agreements.)

Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.

Profile

Publish Date
17 May 2006


Rating:
(based on 5 reviews)


Author(s)

IBM Form Number
TIPS0616