IBM Platform HPC for System x

IBM Redbooks Solution Guide

Abstract

IBM® Platform HPC, which is easy-to-use and comprehensive HPC management software, includes a rich set of features that empowers high-performance technical computing users. It helps reduce the complexity of a computing environment and improves the time to solution.

For related information about this topic, refer to the following IBM Redbooks publication:
IBM Platform Computing Solutions Reference Architectures and Best Practices, SG24-8169-00

Contents


IBM® Platform HPC Version 4.1.1 is technical computing cluster management software. IBM Platform HPC, which is easy-to-use and comprehensive HPC management software, includes a rich set of features that empowers high-performance technical computing users. It helps reduce the complexity of a computing environment and improves the time to solution.

Platform HPC simplifies the application integration process so that users can focus on their work instead of managing a cluster. The high availability feature for the management node reduces cluster downtime and minimizes operational and productivity loss. For applications requiring MPI, the robust commercial MPI library accelerates and scales HPC applications for a shorter time to solution. The integrated xCAT technology delivers enhanced cluster provisioning.

Its robust cluster and workload management capabilities are accessible by using the latest design in web-based interfaces, making it powerful and simple to use. Figure 1 shows the IBM Platform HPC architecture.

IBM Platform HPC architecture
Figure 1. IBM Platform HPC architecture


Did you know?

Platform HPC is a single product with a unified set of management capabilities that make it easy to harness the power and scalability of a technical computing cluster, resulting in shorter time to system readiness and user productivity and optimal throughput. Backed by the industry's best customer support, Platform HPC incorporates nearly two decades of product and technology leadership.

Platform HPC runs on the latest generation of IBM System x® iDataPlex®, IBM Flex System™ Servers, IBM rack-based servers, and most generic x86-based hardware.


Business value

Platform HPC simplifies the application integration process so that users can focus on their work instead of managing a cluster. The high availability feature for the management node reduces cluster downtime and minimizes operational and productivity loss. For applications requiring MPI, the robust commercial MPI library accelerates and scales HPC applications for a shorter time to solution.

Other HPC cluster products merely combine multiple tools and interfaces, which are not integrated, certified, or tested together. Platform HPC is a single product with a unified set of management capabilities that make it easy to harness the power and scalability of a technical computing cluster, resulting in a shorter time to system readiness and user productivity and optimal throughput. Backed by the industry's best customer support, Platform HPC incorporates nearly two decades of product and technology leadership.

Platform HPC delivers the following key benefits:
  • Faster time to cluster readiness
  • Reduced infrastructure and management costs
  • Optimal resource usage
  • Improved user and administrator productivity
  • High availability of the cluster to minimize downtime and reduce operational and productivity loss
  • Improved cluster provisioning through embedded xCAT technology
  • Shorter time to results

Here are the features and benefits of Platform HPC:
  • A comprehensive and easy-to-use management product: Takes the complexity out of setting up, managing, and monitoring a heterogeneous cluster environment, ensuring that technical computing environments are quickly up and running.
  • A next-generation, easy-to-use interface: A powerful web-based user interface with an intuitive design that is easy for both users and administrators to use.
  • Integrated application support: Simplifies application integration and enables users to use the cluster transparently to accelerate their application workload.
  • Topology-aware workload management: Based on Platform LSF®, the industry's powerful policy-driven workload manager for improved productivity, usage, and throughput with minimal administrative effort.
  • Robust workload and system monitoring and reporting: Enables you to make timely decisions and proactively manage compute resources against business demand, maximizing uptime, optimizing application performance, and improving user productivity.
  • High reliability of the management node for reduced cluster downtime and loss of operations and productivity.
  • Accelerator support, including GPU and Intel XeonTM Phi coprocessor scheduling, management, and monitoring: Makes it easy to take immediate advantage of the exceptional HPC performance that is delivered by the accelerators.
  • Robust commercial MPI library (Platform MPI): Accelerates and scales HPC applications for a shorter time to solution.

Platform HPC runs on various hardware and operating environments. By prequalifying and certifying Platform HPC on these systems, IBM helps you take the risk out of deploying mission-critical, high-performance technical computing deployments.


Solution overview

IBM Platform HPC includes a rich set of features that empowers high-performance technical computing users. It helps reduce the complexity of their computing environment and improves the time to solution. Platform HPC V4.1.1 offers the following capabilities for technical computing users:
  • Uses xCAT technology for cluster provisioning, which enables broader Linux operating system support
  • IBM GPFS™ monitoring that correlates workload with file system performance for enhanced overall application performance
  • Integrated monitoring for network switches and chassis
  • Enhanced workload reports to help with chargeback accounting
  • Refreshed workload management that is based on Platform LSF 9.1.1 with an easy upgrade to IBM LSF standard, and certified on the latest operating system
  • Platform MPI 9.1, which supports the latest hardware and operating system
  • Support for new versions of the Linux operating system

Platform HPC for System x offers the following features and benefits:
  • Reduces time to cluster readiness with easy-to-use cluster management
  • Increases user and administrator productivity with an intuitive web-based interface, including simplified application integration
  • Improves resource usage and reduces infrastructure cost with infrastructure aware job scheduling
  • High availability of the cluster for reduced cluster downtime
  • Accelerated time to results with robust workload management capabilities and advanced MPI libraries


Solution architecture

The solution architecture has the following components:
  • Application support: Platform HPC makes it easy to fully harness the power of a technical computing cluster to optimize application performance. High-performing, HPC-optimized MPI libraries are integrated with the product, making it easy to run parallel applications. Scripting guidelines and job submission templates for commonly used commercial applications enable easy application integration and ensure that users are immediately productive, which simplifies job submission, reduces job setup time, and minimize operation errors. After applications are running, Platform HPC improves application performance by intelligently scheduling resources based on workload characteristics.
  • Web-based interface: Easily manage a cluster and submit, monitor, and manage jobs. A single, unified web-based interface with advanced capabilities makes it easy to manage all aspects of a technical computing cluster, which includes cluster provisioning, monitoring, application integration, and workload management and reporting. The intuitive interface includes a robust set of management dashboards and reports that make it easy to monitor resource performance and alerts to ensure that the technical computing infrastructure is optimally used.
  • Cluster management: A better way to deploy and manage technical computing clusters. Platform HPC enables users to quickly provision and manage clusters with unprecedented ease. It ensures maximum uptime and can transparently synchronize files to cluster nodes without any downtime or reinstallation. Platform HPC uses xCAT technology for enhanced cluster provisioning capability, which enables broader operating system support.
  • Workload management: Increase application performance and resource usage: Platform HPC includes a robust workload scheduling engine that is based on IBM Platform LSF, a proven, powerful, policy-driven workload management product for engineering and scientific distributed computing environments. By scheduling workloads intelligently according to policy, Platform HPC improves user productivity with minimal system administrative effort. In addition, it enables user teams to easily access and share all computing resources, while reducing time between simulation iterations.
  • Workload and system monitoring and reporting: Easy troubleshooting and capacity planning. Platform HPC includes an operational dashboard that generates comprehensive administrative reports. It enables administrators to make timely decisions and proactively manage compute resources against business demand. By correlating workload information with system load, they can easily identify and troubleshoot issues. When it is time for capacity planning, the unified web portal can be used to run detailed reports and analyses that quantify user needs and remove the guesswork from capacity expansion.
  • Accelerator support: Take advantage of high-performing accelerators such as GPGPU and Intel Xeon Phi coprocessor. With Platform HPC, you can schedule accelerator-enabled applications specifically to certain resources, which gives you a distinct advantage in heterogeneous hardware and application environments. Administrators can configure Platform HPC so that only jobs that can benefit from running on the accelerators are allocated to those resources, which free processor-based resources to run other jobs. Using the unified management interface, administrators can also monitor usage, temperature, and status, and detect ECC error accumulation for the accelerators.
  • MPI library: Accelerates and scales HPC applications for a shorter time to solution. To make it easier to get parallel applications running. Platform HPC includes a robust and high-performing MPI implementation, IBM Platform MPI. Platform MPI gives you consistent performance at application run time and for application scaling, resulting in top performance results across a range of third-party benchmarks.
  • Application support: Easy application integration with minimal job submission errors. Platform HPC comes with job submission templates for common commercial ISV applications and configurable templates for homegrown applications. By configuring these templates based on the application settings in your environment, users can run jobs without writing scripts. These self-documenting interfaces help minimize job submission errors. Platform HPC also includes the Intel Cluster Ready Cluster Checker to ensure that the cluster delivers the best performance for MPI applications.
  • Third-party software integration: Easily customized based on your organization's unique needs. Platform HPC enables you to customize the metrics that you monitor, including integrating with third-party management software such as Intel Cluster Checker and QLogic Fabric Manager, which enables administrators to monitor and alert you about abnormal statuses for nonserver devices in the HPC environment. Custom actions and buttons can be added to the web-based interface, which can trigger a link to a URL or a command that is launched on the head node. These customizations enable administrators to create an environment that is tailored specifically to their organization's unique requirements.
  • Fully certified and supported by IBM Platform Computing: Unlike other cluster toolkits that are often collections of open source software, Platform HPC is a single product with a single installer and a unified web-based management interface. It is based on the Platform LSF workload manager. Platform HPC unlocks HPC management to deliver easy-to-use, comprehensive, and robust HPC management capabilities while reducing overall cluster cost with full support and certification from Platform Computing, which is the world leader in HPC management products.


Usage scenarios

Platform HPC includes a Platform LSF workload management component. IBM Platform LSF is an enterprise-class software that distributes work across existing heterogeneous IT resources, which creates a shared, scalable, and fault-tolerant infrastructure, and delivers a faster, more reliable workload performance. IBM Platform LSF provides the following capabilities:
  • Balances load and allocates resources while providing access to those resources.
  • Provides a resource management framework that takes your job requirements, finds the best resources to run the job, and monitors its progress.
  • Jobs always run according to host load and site policies.

Figure 2 shows an overview of the IBM Platform LSF.

IBM Platform LSF overview
Figure 2. IBM Platform LSF overview


Integration

Platform HPC is part of the Systems Software product portfolio and is targeted at technical computing clients. Platform HPC may be integrated with various HPC product portfolios and services that are available from IBM, including the following ones:
  • The Platform LSF family of products is available as add-ons for Platform HPC, further extending its robust capabilities. These add-ons include Platform Application Center - Standard Edition, Platform RTM, and Platform License Scheduler. Other LSF family add-on products require Platform LSF Standard for Platform HPC. Upgrading to Platform LSF Standard for Platform HPC enables you to manage workloads across multiple Platform HPC clusters through a single job submission and management interface.
  • For clients with diverse workload scheduling requirements, Platform HPC can be used with IBM Platform Symphony® and its associated add-on products. Both Platform HPC and Platform Symphony share a common resource management layer and can share resources on the same physical cluster.
  • Depending on the nature of your requirement, Platform HPC deployments often involve software development and integration services. With its breadth of services capabilities, IBM is uniquely positioned to help you integrate applications and be running quickly to get the maximum value from your technical computing investment.

Platform HPC is hardware-independent and runs on generic x86 servers. This edition is for IBM sellers, IBM Business Partners, and OEM partners to deliver a solution with both IBM and non-IBM hardware. It can be also sold as stand-alone software for managing generic x86 clusters.

While remaining hardware-independent, Platform HPC is also a part of the IBM Intelligent Cluster™ portfolio and is certified to run on the latest generation of the System x servers. In addition, IBM GPFS can also be deployed with Platform HPC to deliver improved file system performance for technical computing applications running on distributed parallel file systems.

Platform HPC delivers the following unique advantages:
  • Platform HPC is a complete product that is installed through a single installer for simplified cluster configuration, which ensures that all the HPC cluster components are precertified to work together, eliminating integration configuration and complexity.
  • The workload management function that is included in Platform HPC is based on the Platform LSF workload scheduler, which is deployed in many large data centers, which enables you to access the full performance and scalability of your HPC cluster as though you were running a Tier 1 HPC cluster.
  • Platform HPC includes a browser-based portal for users to submit jobs and access job-related data. The portal includes application scripts for technical applications, which reduce the job submission learning experience for users.
  • The industry's highest-performing MPI library, Platform MPI, is a part of Platform HPC, ensuring high performance and robustness for MPI-based applications.
  • High availability for the management node helps minimize cluster downtime.
  • Platform HPC includes an easy-to-use administration portal for administration of the cluster. Commercial support from IBM Platform Computing comes with the product license, providing one-stop support from technical computing experts.


Supported platforms

Here are the key prerequisites for Platform HPC:
  • A cluster computing environment that is composed of two or more servers.
  • A head node with at least one network interface.
  • A supported operating system that is installed on the head node.
  • Access to the operating system media or image file that is used for the installation of the operating system on all nodes in the cluster.
  • Non-head nodes that can be set to PXE boot.

Platform HPC for System x requires the following hardware:
  • Minimum requirements for the management node:
    • 2.5 GB of physical memory (RAM) for the management node
    • 80 GB of available disk space
    • At least one Ethernet interface
    • DVD drive
  • Minimum requirements for compute node for stateful package-based installation:
    • 1 GB of physical memory (RAM) for the compute node
    • 40 GB of available disk space
    • One Ethernet interface
  • Minimum requirements for stateless installation:
    • 4 GB of physical memory (RAM)
    • One Ethernet interface

One of the following operating systems is required:
  • Red Hat Enterprise Linux 6.4 x86 64-bit
  • Red Hat Enterprise Linux 5.9 x86 64-bit (non-head node)
  • CentOS 6.4 x86 64-bit (non-head node)


Ordering information

Ordering information is show in Table 1.

Table 1. Ordering part numbers and feature codes
Program numberVRMFeature descriptionOTC billing feature numberSEO number
5641-HP64.1.1Platform HPC for System x V4.x, Per Managed Server with 1-Year SW S&S072400FE948
5641-HP74.1.1IBM Platform HPC for System x V4.x, Per Managed Server with 3-Year SW S&S072500FE949
5641-HP84.1.1Platform HPC for System x V4.x, Per Managed Server with 5-Year SW S&S072600FE950
5641-PL19.1.1Platform LSF Standard for Platform HPC, V9.x, Per RVU with 1-Year SW S&S071900FE954
5641-PL19.1.1Platform LSF Standard for Platform HPC, V9.x, Per 250 RVU with 1-Year SW S&S071800FE955
5641-PL39.1.1Platform LSF Standard for Platform HPC, V9.x, Per RVU with 3-Year SW S&S072000FE956
5641-PL39.1.1IBM Platform LSF Standard for Platform HPC, V9.x, Per 250 RVU with 3-Year SW S&S072100FE957
5641-PL59.1.1IBM Platform LSF Standard for Platform HPC, V9.x, Per RVU with 5-Year SW S&S072200FE958
5641-PL59.1.1IBM Platform LSF Standard for Platform HPC, V9.x, Per 250 RVU with 5-Year SW S&S072300FE959


Related information

For more information, see the following documents:

Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.

Profile

Publish Date
13 December 2013


Rating: Not yet rated


Author(s)

IBM Form Number
TIPS1098