IBM High-Performance Computing Insights with IBM Power System AC922 Clustered Solution

An IBM Redbooks publication

Published 02 May 2019

cover image

ISBN-10: 0738457450
ISBN-13: 9780738457451
IBM Form #: SG24-8422-00
(352 pages)

Authors: Dino Quintero, Miguel Gomez Gonzalez, Ahmad Y Hussein, Jan-Frode Myklebust

Abstract

This IBM® Redbooks® publication documents and addresses topics to set up a complete infrastructure environment and tune the applications to use an IBM POWER9™ hardware architecture with the technical computing software stack.

This publication is driven by a CORAL project solution. It explores, tests, and documents how to implement an IBM High-Performance Computing (HPC) solution on a POWER9 processor-based system by using IBM technical innovations to help solve challenging scientific, technical, and business problems.
This book documents the HPC clustering solution with InfiniBand on IBM Power Systems™ AC922 8335-GTH and 8335-GTX servers with NVIDIA Tesla V100 SXM2 graphics processing units (GPUs) with NVLink, software components, and the IBM Spectrum™ Scale parallel file system.

This solution includes recommendations about the components that are used to provide a cohesive clustering environment that includes job scheduling, parallel application tools, scalable file systems, administration tools, and a high-speed interconnect.

This book is divided into three parts: Part 1 focuses on the planners of the solution, Part 2 focuses on the administrators, and Part 3 focuses on the developers.

This book targets technical professionals (consultants, technical support staff, IT architects, and IT specialists) who are responsible for delivering cost-effective HPC solutions that help uncover insights among clients’ data so that they can act to optimize business results, product development, and scientific discoveries.

Table of contents

Part 1. Planning
Chapter 1. Introduction to IBM high-performance computing
Chapter 2. IBM Power System AC922 server for HPC overview
Chapter 3. Software stack
Chapter 4. Reference architecture
Part 2. Deployment
Chapter 5. Nodes and software deployment
Chapter 6. Cluster Administration and Storage Tools
Part 3. Application development
Chapter 7. Compilation, execution, and application development
Chapter 8. Running parallel software, performance enhancement, and scalability testing
Chapter 9. Measuring and tuning applications
Appendix A. Additional material

Follow IBM Redbooks

Follow IBM Redbooks