You are previewing Implementing an IBM High-Performance Computing Solution on IBM Power System S822LC.
O'Reilly logo
Implementing an IBM High-Performance Computing Solution on IBM Power System S822LC

Book Description

This IBM® Redbooks® publication demonstrates and documents that IBM Power Systems™ high-performance computing and technical computing solutions deliver faster time to value with powerful solutions. Configurable into highly scalable Linux clusters, Power Systems offer extreme performance for demanding workloads such as genomics, finance, computational chemistry, oil and gas exploration, and high-performance data analytics.

This book delivers a high-performance computing solution implemented on the IBM Power System S822LC. The solution delivers high application performance and throughput based on its built-for-big-data architecture that incorporates IBM POWER8® processors, tightly coupled Field Programmable Gate Arrays (FPGAs) and accelerators, and faster I/O by using Coherent Accelerator Processor Interface (CAPI). This solution is ideal for clients that need more processing power while simultaneously increasing workload density and reducing datacenter floor space requirements. The Power S822LC offers a modular design to scale from a single rack to hundreds, simplicity of ordering, and a strong innovation roadmap for graphics processing units (GPUs).

This publication is targeted toward technical professionals (consultants, technical support staff, IT Architects, and IT Specialists) responsible for delivering cost effective high-performance computing (HPC) solutions that help uncover insights from their data so they can optimize business results, product development, and scientific discoveries

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. IBM Redbooks promotions
  4. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  5. Chapter 1. Introduction to the IBM Power System S822LC for high performance computing workloads
    1. 1.1 IBM POWER8 technology
    2. 1.2 OpenPOWER
    3. 1.3 IBM Power System S822LC
      1. 1.3.1 Differences between 8335-GCA and 8335-GTA models
  6. Chapter 2. Reference architecture
    1. 2.1 Hardware components of an HPC system
      1. 2.1.1 Login nodes
      2. 2.1.2 Management nodes
      3. 2.1.3 Compute nodes
      4. 2.1.4 High performance interconnect
      5. 2.1.5 Management, service, and site (public) networks
      6. 2.1.6 Parallel file system
    2. 2.2 Software components of an HPC system
      1. 2.2.1 System software
      2. 2.2.2 Application development software
      3. 2.2.3 Application software
    3. 2.3 HPC system solution
      1. 2.3.1 Compute nodes
      2. 2.3.2 Management node
      3. 2.3.3 Login node
      4. 2.3.4 Combining the management and the login node
      5. 2.3.5 Parallel file system
      6. 2.3.6 High performance interconnect switch
  7. Chapter 3. Hardware components
    1. 3.1 IBM Power System S822LC
      1. 3.1.1 IBM POWER8 processor
      2. 3.1.2 Memory subsystem
      3. 3.1.3 Input and output
      4. 3.1.4 NVIDIA GPU
      5. 3.1.5 BMC
    2. 3.2 Mellanox InfiniBand
    3. 3.3 IBM System Storage
      1. 3.3.1 IBM Storwize family
      2. 3.3.2 IBM FlashSystem family
      3. 3.3.3 IBM XIV Storage System
  8. Chapter 4. Software stack
    1. 4.1 System management
    2. 4.2 OPAL firmware
    3. 4.3 xCAT
    4. 4.4 RHEL server
    5. 4.5 NVIDIA CUDA Toolkit
    6. 4.6 Mellanox OFED for Linux
    7. 4.7 IBM XL compilers, GCC, and Advance Toolchain
      1. 4.7.1 XL compilers
      2. 4.7.2 GCC and Advance Toolchain
    8. 4.8 IBM Parallel Environment
      1. 4.8.1 IBM PE Runtime Edition
      2. 4.8.2 IBM PE Developer Edition
    9. 4.9 IBM Engineering and Scientific Subroutine Library and Parallel ESSL
    10. 4.10 IBM Spectrum Scale (formerly IBM GPFS)
    11. 4.11 IBM Spectrum LSF (formerly IBM Platform LSF)
  9. Chapter 5. Software deployment
    1. 5.1 Software stack
    2. 5.2 System management
      1. 5.2.1 Build instructions for IPMItool
      2. 5.2.2 Frequently used commands with the IPMItool
      3. 5.2.3 Boot order configuration
      4. 5.2.4 System firmware upgrade
    3. 5.3 xCAT overview
      1. 5.3.1 xCAT cluster: Nodes and networks
      2. 5.3.2 xCAT database: Objects and tables
      3. 5.3.3 xCAT node booting
      4. 5.3.4 xCAT node discovery
      5. 5.3.5 xCAT BMC discovery
      6. 5.3.6 xCAT operating system installation types: Disks and state
      7. 5.3.7 xCAT network interfaces: Primary and additional
      8. 5.3.8 xCAT software kits
      9. 5.3.9 xCAT version
      10. 5.3.10 xCAT scenario
    4. 5.4 xCAT Management Node
      1. 5.4.1 RHEL server
      2. 5.4.2 xCAT packages
      3. 5.4.3 Static IP network configuration
      4. 5.4.4 Hostname and aliases
      5. 5.4.5 xCAT networks
      6. 5.4.6 DNS server
      7. 5.4.7 DHCP server
      8. 5.4.8 IPMI authentication credentials
    5. 5.5 xCAT Node Discovery
      1. 5.5.1 Verification of network boot configuration and Genesis image files
      2. 5.5.2 Configuration of the DHCP dynamic range
      3. 5.5.3 Configuration of BMCs to DHCP mode
      4. 5.5.4 Definition of temporary BMC objects
      5. 5.5.5 Definition of node objects
      6. 5.5.6 Configuration of host table, DNS, and DHCP servers
      7. 5.5.7 Boot into Node discovery
    6. 5.6 xCAT Compute Nodes
      1. 5.6.1 Network interfaces
      2. 5.6.2 RHEL Server
      3. 5.6.3 CUDA Toolkit
      4. 5.6.4 Mellanox OFED for Linux
      5. 5.6.5 XL C/C++ Compiler
      6. 5.6.6 XL Fortran Compiler
      7. 5.6.7 Advance Toolchain
      8. 5.6.8 PE RTE
      9. 5.6.9 PE DE
      10. 5.6.10 ESSL
      11. 5.6.11 PESSL
      12. 5.6.12 Spectrum Scale (formerly GPFS)
      13. 5.6.13 IBM Spectrum LSF
      14. 5.6.14 Node provisioning
      15. 5.6.15 Post-installation verification
    7. 5.7 xCAT Login Nodes
  10. Chapter 6. Application development and tuning
    1. 6.1 Compiler options
      1. 6.1.1 XL compiler options
      2. 6.1.2 GCC compiler options
    2. 6.2 Engineering and Scientific Subroutine Library
      1. 6.2.1 Compilation and run
      2. 6.2.2 Run different SMT modes
      3. 6.2.3 ESSL SMP CUDA library options
    3. 6.3 Parallel ESSL
      1. 6.3.1 Program development
      2. 6.3.2 Using GPUs with Parallel ESSL
      3. 6.3.3 Compilation
    4. 6.4 Using POWER8 vectorization
      1. 6.4.1 Implementation with GNU GCC
      2. 6.4.2 Implementation with IBM XL
    5. 6.5 Development models
      1. 6.5.1 MPI programs with IBM Parallel Environment
      2. 6.5.2 CUDA C programs with the NVIDIA CUDA Toolkit
      3. 6.5.3 Hybrid MPI and CUDA programs with IBM Parallel Environment
      4. 6.5.4 OpenMP programs with the IBM Parallel Environment
      5. 6.5.5 OpenSHMEM programs with the IBM Parallel Environment
      6. 6.5.6 Parallel Active Messaging Interface programs
    6. 6.6 GPU tuning
      1. 6.6.1 Power Cap Limit
      2. 6.6.2 CUDA Multi-Process Service
    7. 6.7 Tools for development and tuning of applications
      1. 6.7.1 The Parallel Environment Developer Edition
      2. 6.7.2 IBM PE Parallel Debugger
      3. 6.7.3 Eclipse for Parallel Application Developers
      4. 6.7.4 NVIDIA Nsight Eclipse Edition for CUDA C/C++
      5. 6.7.5 Command-line tools for CUDA C/C++
  11. Chapter 7. Running applications
    1. 7.1 Controlling the execution of multithreaded applications
      1. 7.1.1 Running OpenMP applications
      2. 7.1.2 Setting and retrieving process affinity at run time
      3. 7.1.3 Controlling NUMA policy for processes and shared memory
    2. 7.2 Using the IBM Parallel Environment runtime
      1. 7.2.1 Running applications
      2. 7.2.2 Managing application
      3. 7.2.3 Running OpenSHMEM programs
    3. 7.3 Using the IBM Spectrum LSF
      1. 7.3.1 Submit jobs
      2. 7.3.2 Manage jobs
  12. Chapter 8. Cluster monitoring
    1. 8.1 IBM Spectrum LSF tools for monitoring
      1. 8.1.1 General information about clusters
      2. 8.1.2 Getting information about hosts
      3. 8.1.3 Getting information about jobs and queues
      4. 8.1.4 Administering the cluster
    2. 8.2 nvidia-smi tool for monitoring GPU
      1. 8.2.1 Information about jobs on GPU
      2. 8.2.2 All GPU details
      3. 8.2.3 Compute modes
      4. 8.2.4 Persistence mode
  13. Appendix A. Applications and performance
    1. Application software
    2. Effects of basic performance tuning techniques
    3. General methodology of performance benchmarking
    4. Sample code for the construction of thread affinity strings
    5. ESSL performance results
  14. Related publications
    1. IBM Redbooks
    2. Online resources
    3. Help from IBM
  15. Back cover