You are previewing Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8.
O'Reilly logo
Performance Optimization and Tuning Techniques for IBM Power Systems Processors Including IBM POWER8

Book Description

This IBM® Redbooks® publication focuses on gathering the correct technical information, and laying out simple guidance for optimizing code performance on IBM POWER8® processor-based systems that run the IBM AIX®, IBM i, or Linux operating systems. There is straightforward performance optimization that can be performed with a minimum of effort and without extensive previous experience or in-depth knowledge.

The POWER8 processor contains many new and important performance features, such as support for eight hardware threads in each core and support for transactional memory. The POWER8 processor is a strict superset of the IBM POWER7+™ processor, and so all of the performance features of the POWER7+ processor, such as multiple page sizes, also appear in the POWER8 processor. Much of the technical information and guidance for optimizing performance on POWER8 processors that is presented in this guide also applies to POWER7+ and earlier processors, except where the guide explicitly indicates that a feature is new in the POWER8 processor.

This guide strives to focus on optimizations that tend to be positive across a broad set of IBM POWER® processor chips and systems. Specific guidance is given for the POWER8 processor; however, the general guidance is applicable to the IBM POWER7+, IBM POWER7®, IBM POWER6®, IBM POWER5, and even to earlier processors.

This guide is directed at personnel who are responsible for performing migration and implementation activities on POWER8 processor-based systems. This includes system administrators, system architects, network administrators, information architects, and database administrators (DBAs).

Table of Contents

  1. Front cover
  2. Notices
    1. Trademarks
  3. IBM Redbooks promotions
  4. Preface
    1. Authors
    2. Now you can become a published author, too!
    3. Comments welcome
    4. Stay connected to IBM Redbooks
  5. Summary of changes
    1. August 2015, Second Edition
  6. Chapter 1. Optimization and tuning on IBM POWER8 processor-based systems
    1. 1.1 Introduction
    2. 1.2 Outline of this guide
    3. 1.3 Conventions that are used in this guide
    4. 1.4 Background
    5. 1.5 Optimizing performance on POWER8 processor-based systems
      1. 1.5.1 Lightweight tuning and optimization guidelines
      2. 1.5.2 Deployment guidelines
      3. 1.5.3 Deep performance optimization guidelines
  7. Chapter 2. The IBM POWER8 processor
    1. 2.1 Introduction to the POWER8 processor
    2. 2.2 Using POWER8 features
      1. 2.2.1 Multi-core and multi-thread
      2. 2.2.2 Multipage size support (page sizes (4 KB, 64 KB, 16 MB, and 16 GB))
      3. 2.2.3 Efficient use of cache and memory
      4. 2.2.4 Transactional memory
      5. 2.2.5 Vector Scalar eXtension
      6. 2.2.6 Decimal floating point
      7. 2.2.7 In-core cryptography and integrity enhancements
      8. 2.2.8 On-chip accelerators
      9. 2.2.9 Storage synchronization (sync, lwsync, lwarx, stwcx., and eieio)
      10. 2.2.10 Fixed-point load and store quadword instructions
      11. 2.2.11 Instruction fusion
      12. 2.2.12 Event-based branches (or user-level fast interrupts)
      13. 2.2.13 Power management and system performance
      14. 2.2.14 Coherent Accelerator Processor Interface
    3. 2.3 I/O adapter affinity
    4. 2.4 Related publications
  8. Chapter 3. The IBM POWER Hypervisor
    1. 3.1 Introduction to PowerVM
    2. 3.2 Power Systems virtualization with PowerVM
      1. 3.2.1 Virtual processors
      2. 3.2.2 Page table sizes for LPARs
      3. 3.2.3 Placing LPAR resources to attain higher memory affinity
      4. 3.2.4 Active memory expansion
      5. 3.2.5 Optimizing resource placement: Dynamic Platform Optimizer
      6. 3.2.6 Partition compatibility mode
    3. 3.3 Introduction to KVM Virtualization
    4. 3.4 Related publications
  9. Chapter 4. IBM AIX
    1. 4.1 Introduction
    2. 4.2 Using Power Architecture features with AIX
      1. 4.2.1 Multi-core and multi-thread
      2. 4.2.2 Multipage size support on AIX
      3. 4.2.3 Efficient use of cache
      4. 4.2.4 Transactional memory
      5. 4.2.5 Vector Scalar eXtension
      6. 4.2.6 Decimal floating point
      7. 4.2.7 On-chip encryption accelerator
    3. 4.3 AIX operating system-specific optimizations
      1. 4.3.1 Malloc
      2. 4.3.2 Pthread tunables
      3. 4.3.3 pollset
      4. 4.3.4 File system performance benefits
      5. 4.3.5 Direct I/O
      6. 4.3.6 Concurrent I/O
      7. 4.3.7 Asynchronous I/O
      8. 4.3.8 I/O completion ports
      9. 4.3.9 shmat versus mmap
      10. 4.3.10 Large segment tunable aliasing (LSA)
      11. 4.3.11 64-bit versus 32-bit ABIs
      12. 4.3.12 Sleep and wake-up primitives (thread_wait and thread_post)
      13. 4.3.13 Shared versus private loads
      14. 4.3.14 Workload partition shared licensed program installations
    4. 4.4 AIX preferred practices
      1. 4.4.1 AIX preferred practices that are applicable to all Power Systems generations
      2. 4.4.2 AIX preferred practices that are applicable to POWER7 and POWER8 processor-based systems
    5. 4.5 Related publications
  10. Chapter 5. IBM i
    1. 5.1 Introduction
    2. 5.2 Using Power features with IBM i
      1. 5.2.1 Multi-core and multi-thread
      2. 5.2.2 Multipage size support on IBM i
      3. 5.2.3 Vector Scalar eXtension
      4. 5.2.4 Decimal floating point
    3. 5.3 IBM i operating system-specific optimizations
      1. 5.3.1 IBM i advanced optimization techniques
      2. 5.3.2 Performance management on IBM i
    4. 5.4 Related publications
  11. Chapter 6. Linux
    1. 6.1 Introduction
    2. 6.2 Using Power features with Linux
      1. 6.2.1 Multi-core and multi-thread
      2. 6.2.2 Multipage size support on Linux
      3. 6.2.3 Efficient use of cache
      4. 6.2.4 Transactional memory
      5. 6.2.5 Vector Scalar eXtension
      6. 6.2.6 Decimal floating point
      7. 6.2.7 Event-based branches
    3. 6.3 Linux operating system-specific optimizations
      1. 6.3.1 GCC, toolchain, and IBM Advance Toolchain
      2. 6.3.2 Tuning and optimizing malloc
      3. 6.3.3 Large TOC -mcmodel=medium optimization
      4. 6.3.4 POWER7 based distro considerations
      5. 6.3.5 Microthreading considerations
    4. 6.4 Little Endian
      1. 6.4.1 Application binary interface
    5. 6.5 Related publications
  12. Chapter 7. Compilers and optimization tools for C, C++, and Fortran
    1. 7.1 Compiler versions and optimization levels
    2. 7.2 Advanced compiler optimization techniques
      1. 7.2.1 Common prerequisites
      2. 7.2.2 XL compiler family
      3. 7.2.3 GCC compiler family
    3. 7.3 Capitalizing on POWER8 features with the XL and GCC compilers
      1. 7.3.1 In-core cryptography
      2. 7.3.2 Compiler support for Vector Scalar eXtension
      3. 7.3.3 Built-in functions for storage synchronization
      4. 7.3.4 Data Streams Control Register controls
      5. 7.3.5 Transactional memory
    4. 7.4 IBM Feedback Directed Program Restructuring
      1. 7.4.1 Introduction
      2. 7.4.2 Feedback Directed Program Restructuring supported environments
      3. 7.4.3 Acceptable input formats
      4. 7.4.4 General operation
      5. 7.4.5 Instrumentation and profiling
      6. 7.4.6 Optimization
    5. 7.5 Using the Advance Toolchain with IBM XLC and XLF
    6. 7.6 Using GPU accelerators with C/C++
    7. 7.7 Related publications
  13. Chapter 8. Java
    1. 8.1 Java levels
    2. 8.2 32-bit versus 64-bit Java
      1. 8.2.1 Little Endian support
    3. 8.3 Memory and page size considerations
      1. 8.3.1 Medium and large pages for Java heap and code cache
      2. 8.3.2 Configuring large pages for Java heap and code cache
      3. 8.3.3 Prefetching
      4. 8.3.4 Compressed references
      5. 8.3.5 JIT code cache
      6. 8.3.6 Shared classes
    4. 8.4 Capitalizing on POWER8 features with IBM Java
      1. 8.4.1 In-core Advanced Encryption Standard and Secure Hash Algorithm acceleration and instructions
      2. 8.4.2 Transactional memory
      3. 8.4.3 Runtime instrumentation
    5. 8.5 Java garbage collection tuning
      1. 8.5.1 GC strategy: Optthruput
      2. 8.5.2 GC strategy: Optavgpause
      3. 8.5.3 GC strategy: Gencon
      4. 8.5.4 GC strategy: Balanced
      5. 8.5.5 Optimal heap size
    6. 8.6 Application scaling
      1. 8.6.1 Choosing the correct simultaneous multithreading mode
      2. 8.6.2 Using resource sets
      3. 8.6.3 Java lock reservation
      4. 8.6.4 Java GC threads
      5. 8.6.5 Java concurrent marking
    7. 8.7 Using GPU accelerators with IBM Java
      1. 8.7.1 Automatic GPU compilation
      2. 8.7.2 Accessing the GPU through the CUDA4J application programming interface
      3. 8.7.3 The com.ibm.gpu application programming interface
      4. 8.7.4 NVIDIA Compute Unified Device Architecture: Java Native interface
    8. 8.8 Related publications
  14. Chapter 9. IBM DB2
    1. 9.1 DB2 and the POWER processor
    2. 9.2 Taking advantage of the POWER processor
      1. 9.2.1 Affinitization
      2. 9.2.2 Page sizes
      3. 9.2.3 Decimal arithmetic
      4. 9.2.4 Using simultaneous multithreading priorities for internal lock implementation
      5. 9.2.5 Single Instruction Multiple Data
    3. 9.3 Capitalizing on the compilers and optimization tools for POWER
      1. 9.3.1 Whole-program analysis and profile-based optimizations
      2. 9.3.2 IBM Feedback Directed Program Restructuring
    4. 9.4 Capitalizing on POWER virtualization
      1. 9.4.1 DB2 virtualization
      2. 9.4.2 DB2 in an AIX workload partition
    5. 9.5 Capitalizing on the AIX system libraries
      1. 9.5.1 Using the thread_post_many API
      2. 9.5.2 File systems
    6. 9.6 Capitalizing on performance tools
      1. 9.6.1 High-level investigation
      2. 9.6.2 Low-level investigation
    7. 9.7 Conclusion
    8. 9.8 Related publications
  15. Chapter 10. IBM WebSphere Application Server
    1. 10.1 IBM WebSphere
      1. 10.1.1 Installation
      2. 10.1.2 Deployment
      3. 10.1.3 Performance
      4. 10.1.4 Performance analysis, problem determination, and diagnostic tests
  16. Appendix A. Analyzing malloc usage under IBM AIX
    1. Introduction
    2. How to collect malloc usage information
  17. Appendix B. Performance tools and empirical performance analysis
    1. Introduction
    2. Performance advisors
    3. IBM Power Virtualization Performance
    4. AIX
    5. Linux
    6. Java (either AIX or Linux)
  18. Back cover