IBM High-Performance Computing (HPC) Insights with IBM AC922 Clustered Solution

A draft IBM Redbooks publication

Updated 04 February 2019

cover image

ISBN-10: 0738457450
ISBN-13: 9780738457451
IBM Form #: SG24-8422-00

Authors: Dino Quintero, Miguel Gomez Gonzalez, Ahmad Y Hussein, Jan-Frode Myklebust

Abstract

This IBM Redbooks publication documents and addresses topics to provide concepts to setup a complete infrastructure environment and tune the applications to use IBM POWER9™ hardware architecture with the technical computing software stack.

This publication is driven by CORAL project solution. It explores, tests, and documents how to implement an IBM High-Performance Computing (HPC) solution on POWER9 by using IBM technical innovations to help solve challenging scientific, technical, and business problems.

This book documents the HPC clustering solution with InfiniBand on IBM Power Advanced Computing (AC) AC922 8335-GTH and 8335-GTX servers with NVIDIA Tesla v100 SXM2 GPUs with NVLink, software components and the IBM Spectrum Scale parallel file system.

This solution includes recommendations on components used to provide a cohesive clustering environment which includes job scheduling, parallel application tools, scalable file systems, administration tools and a high speed interconnect.

This book is divided in three parts: Part 1 focuses on planners of the solution, Part 2 focuses on the administrators, and Part 3 focuses on the developers.

This book targets technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights among clients’ data so that they can act to optimize business results, product development, and scientific discoveries.

Table of contents

Part 1. Planning
Chapter 1. Introduction to IBM high-performance computing
Chapter 2. IBM Power System AC922 for high-performance computing overview
Chapter 3. Software stack
Chapter 4. Reference architecture
Part 2. Deployment
Chapter 5. Nodes and software deployment
Chapter 6. Cluster Administration and Storage Tools (CAST)
Part 3. Application development
Chapter 7. Compilation, execution, and application development
Chapter 8. Running parallel software, performance enhancement, and scalability testing
Chapter 9. Measuring and tuning applications
Appendix A. Additional material

These pages are Web versions of IBM Redbooks- and Redpapers-in-progress. They are published here for those who need the information now and may contain spelling, layout and grammatical errors. This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. Your feedback is welcomed to improve the usefulness of the material to others.

Follow IBM Redbooks

Follow IBM Redbooks