IBM HPC Cluster Health Check

Published on 14 January 2014, updated 17 December 2014

Share this page:

IBM Form #: TIPS1078

Authors: Dino Quintero, Ross Aiken, Shivendra Ashish, Manmohan Brahma, Murali Dhandapani, Rico Franke, Jie Gong, Markus Hilger, Herbert Mehlhose, Justin I Morosi, Thorsten Nitsch and Fernando Pizzano

Abstract

IBM® has built High Performance Computing (HPC) clusters for sometime, and this experience can help a customer with configuration choices or noncompliant designs that are made during cluster deployments. Large clusters can become difficult to correct as the system scales in terms of nodes, applications, and users. This IBM Redbooks® Solution Guide describes a toolset that can aid system administrators with the initial stages of installing their cluster.

This Solution Guide addresses topics to provide infrastructure health checks, for example, checking the configuration, and verifying the functions of the common subsystems (nodes or servers, switch fabric, parallel file system, job management, and problem areas).

This Solution Guide is targeted at technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) that are responsible for delivering cost effective Technical Computing and HPC solutions to optimize business results, product development, and scientific discoveries.

Others who read this also read

Special Notices

The material included in this document is in DRAFT form and is provided 'as is' without warranty of any kind. IBM is not responsible for the accuracy or completeness of the material, and may update the document at any time. The final, published document may not include any, or all, of the material included herein. Client assumes all risks associated with Client's use of this document.