IBM i 6.1 Problem Management Tip
Published 21 May 2008
Authors: Jim Cook
IBM i 6.1 makes it easier to identify the problem reporting status of problems detected by the system. Previously, it was confusing to determine whether hardware problems had been reported to IBM because the responsibility for problem reporting was spread over HMC, service partitions, and the operating system itself.
With IBM i 6.1 all the problem reporting information is listed in the Display Problem History panel. Details provided include identification of which system is responsible for reporting the problem, and whether the problem has been sent to IBM. This tip shows how to use the new information.
This IBM i 6.1 tip has been provided by Edgar Apolo Alvarez and Jesus David Salas.
If you are confused about hardware-problem reporting, you are most likely not alone. The issue has become more confusing in recent operating system releases. In IBM™ i5/OS V5R3 (now re-branded as IBM i), the introduction of POWER5™ technology-based servers also brought the service processor and the Hardware Management Console (HMC). With that architecture change, the responsibility of problem reporting was spread over HMC, service partitions, and the operating system itself.
Over time it was noted that the process of listing problems in the problem log with a SENT designation is not sufficient because there is no way to determine if the problem has been sent to the HMC. This causes confusion about what system—a reporting partition or the HMC—should be responsible for reporting the problems.
IBM has answered this issue with IBM i 6.1's extended capability to manage hardware problem reporting. Now, all the problem reporting information is listed in the Display Problem History panel, with the Display Problem Details panel showing whether or not the problem has been sent to IBM. There is also a new indicator in the log to show which system is responsible for reporting the problem. This tip explains how to use the new problem information.
Recent operating system releases have three possible reporters of problems: the HMC, a service partition, or the current i5/OS partition or system.
With the HMC, problems are reported using the HMC interface. To report a problem, an HMC must be attached to and configured for the partition, and the firmware update policy must be set to *HMC.
With a service partition, the problem bypasses the HMC and is sent directly by the service partition. This requires the firmware policy to be set to *OPSYS.
With the i5/OS in charge of the reporting, the current partition reports the problem. In this scenario, the current partition is a non-service partition.
Display Problem History panel
All three reporting options are now managed by the Display Problem History panel (function key 6 (F6) on the Display Problem Details panel), which displays a list of events logged for the selected problem. Events are based on each time a problem log entry is created, changed, or elements are deleted. They are listed in the order of their occurrence, and once an event is entered, it cannot be changed or deleted. The main events for a problem are:
- Problem entry opened. This event is logged when a problem is created on the system.
- Problem analyzed. This event is logged when a problem has the Open status and then is analyzed with option 8 and option 1 on the Work With Problems panel.
- Reported by service partition. This event is logged when the problem occurred in a non-service partition and the problem is a platform hardware error.
- Reported by HMC. This event will be seen only when a problem occurs in a service partition, the firmware is set to *OPSYS, and it is a platform hardware error.
- Reported by current i5/OS partition. This event is logged when a problem occurs in a non-service partition or the problem is not a platform hardware error and the firmware is set to *HMC.
- Alert created. This event is logged when an alert (which is similar to a problem) occurs on the system.
- Prepared to report. This event is logged when the problem will be reported to a service provider by a user profile or job.
- Service request sent. This event is logged when the problem was sent to a service provider and indicates that information needed to correct the problem will be returned.
- Problem answered. This event is logged when a problem was sent to a service provider and indicates that information needed to correct the problem has been returned.
When the Display Problem History panel shows “Reported by HMC,” it does not mean that the problem has been sent to IBM using the HMC interface; it means that the problem should be reported by the HMC interface. The Display Problem History events only notify you of how a problem should be reported. When you want to report a problem to IBM, select option 8=Work With Problem (WRKPRB) in the Work With Problem panel; then select option 2=Report Problem.
Report Problem means to enter the information needed to send a service request and, optionally, to send the request. The status of the problem is updated to PREPARED or SENT. Even though a problem has a SENT status, it can be reported to IBM. If a problem is designated to be reported by the HMC but the HMC interface is not attached or is not working as expected, the process will time out. Then a new event will be logged as “Reported by Service Partition” and the status of the problem will be changed to READY.
Having events in the Display Problem History panel makes it easier for you to determine which partition had the problem and can even help you determine what kind of problem it is.
In-band and out-of-band configurations
For the Display Problem History panel to list events correctly, you must determine how the system has been configured to get firmware updates, in-band or out-of-band, because this determines how platform problems should be reported.
To identify whether your system is running in out-of-band or in-band configuration, use the Display Firmware Status (DSPFMWSTS) command. This panel displays information for the current server firmware with a Firmware Update Policy field displaying one of two possible values: *OPSYS or *HMC.
- In-band configuration (*OPSYS): The server firmware is currently being managed by the operating system using PTFs for the specified server firmware product ID/release.
- Out-of-band configuration (*HMC): The server firmware is currently being managed by HMC. The operating system isn’t allowed to make changes to the server firmware. The status of the PTFs for the server firmware product is not applicable to the active server firmware.
The following figure shows the structure of the POWER5 and POWER6 system firmware supporting the operating systems. In this structure, if a problem occurs, the IBM 6.1 Work With Problem technology makes it easier to determine what area is responsible to report the problem to IBM.
Firmware notes for the figure:
- The Flexible Service Processor (FSP) firmware eases diagnostic, initialization, and configuration errors detection and corrections.
- The Power Hypervisor (PHYP) firmware is based in the System i heritage Hypervisor facility of virtual LAN, Virtual I/O, and subprocessor partitioning support.
- Partition Firmware (PFW) supports the System p heritage Power Architecture Platform Requirements (PAPR) interface.
- The HMC firmware provides the convergence between configurations, management, and service.
- The System Power Control Network (SPCN) firmware interface is used to monitor and control power.
- The Bulk Power Control (BPC) firmware controls each power unit in the system unit (also typically referred to as the Central Electronic Complex (CEC)) and the I/O enclosures.
This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.
Follow IBM Redbooks
Follow IBM Redbooks