Skip to main content

IBM High Performance Computing Cluster Health Check

An IBM Redbooks publication


Published on 21 February 2014, updated 03 April 2014

  1. .EPUB (0.6 MB)
  2. .PDF (1.0 MB)

Apple BooksGoogle Play Books

Share this page:   

ISBN-10: 073843924X
ISBN-13: 9780738439242
IBM Form #: SG24-8168-00

Authors: Dino Quintero, Ross Aiken, Shivendra Ashish, Manmohan Brahma, Murali Dhandapani, Rico Franke, Jie Gong, Markus Hilger, Herbert Mehlhose, Justin I. Morosi, Thorsten Nitsch and Fernando Pizzano

    menu icon


    This IBM® Redbooks® publication provides information about aspects of performing infrastructure health checks, such as checking the configuration and verifying the functionality of the common subsystems (nodes or servers, switch fabric, parallel file system, job management, problem areas, and so on).

    This IBM Redbooks publication documents how to monitor the overall health check of the cluster infrastructure, to deliver technical computing clients cost-effective, highly scalable, and robust solutions.

    This IBM Redbooks publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost-effective Technical Computing and IBM High Performance Computing (HPC) solutions to optimize business results, product development, and scientific discoveries. This book provides a broad understanding of a new architecture.

    Table of Contents

    Chapter 1. Introduction

    Chapter 2. Key concepts and interdependencies

    Chapter 3. The health lifecycle methodology

    Chapter 4. Cluster components reference model

    Chapter 5. Toolkits for verifying health (individual diagnostics)

    Appendix A. Commonly used tools

    Appendix B. Tools and commands outside of the toolkit


    Others who read this also read