Book description
High Performance Parallelism Pearls shows how to leverage parallelism on processors and coprocessors with the same programming – illustrating the most effective ways to better tap the computational potential of systems with Intel Xeon Phi coprocessors and Intel Xeon processors or other multicore processors. The book includes examples of successful programming efforts, drawn from across industries and domains such as chemistry, engineering, and environmental science. Each chapter in this edited work includes detailed explanations of the programming techniques used, while showing high performance results on both Intel Xeon Phi coprocessors and multicore processors. Learn from dozens of new examples and case studies illustrating "success stories" demonstrating not just the features of these powerful systems, but also how to leverage parallelism across these heterogeneous systems.
- Promotes consistent standards-based programming, showing in detail how to code for high performance on multicore processors and Intel® Xeon Phi™
- Examples from multiple vertical domains illustrating parallel optimizations to modernize real-world codes
- Source code available for download to facilitate further exploration
Table of contents
- Cover image
- Title page
- Table of Contents
- Copyright
- Contributors
- Acknowledgments
- Foreword
- Preface
-
Chapter 1: Introduction
- Abstract
- Learning from successful experiences
- Code modernization
- Modernize with concurrent algorithms
- Modernize with vectorization and data locality
- Understanding power usage
- ISPC and OpenCL anyone?
- Intel Xeon Phi coprocessor specific
- Many-core, neo-heterogeneous
- No “Xeon Phi” in the title, neo-heterogeneous programming
- The future of many-core
- Downloads
- Chapter 2: From “Correct” to “Correct & Efficient”: A Hydro2D Case Study with Godunov’s Scheme
-
Chapter 3: Better Concurrency and SIMD on HBM
- Abstract
- The application: HIROMB-BOOS-Model
- Key usage: DMI
- HBM execution profile
- Overview for the optimization of HBM
- Data structures: Locality done right
- Thread parallelism in HBM
- Data parallelism: SIMD vectorization
- Results
- Profiling details
- Scaling on processor vs. coprocessor
- Contiguous attribute
- Summary
- Chapter 4: Optimizing for Reacting Navier-Stokes Equations
-
Chapter 5: Plesiochronous Phasing Barriers
- Abstract
- What can be done to improve the code?
- What more can be done to improve the code?
- Hyper-Thread Phalanx
- What is nonoptimal about this strategy?
- Coding the Hyper-Thread Phalanx
- Back to work
- Data alignment
- The plesiochronous phasing barrier
- Let us do something to recover this wasted time
- A few “left to the reader” possibilities
- Xeon host performance improvements similar to Xeon Phi
- Summary
- Chapter 6: Parallel Evaluation of Fault Tree Expressions
- Chapter 7: Deep-Learning Numerical Optimization
- Chapter 8: Optimizing Gather/Scatter Patterns
- Chapter 9: A Many-Core Implementation of the Direct N-Body Problem
- Chapter 10: N-Body Methods
- Chapter 11: Dynamic Load Balancing Using OpenMP 4.0
- Chapter 12: Concurrent Kernel Offloading
- Chapter 13: Heterogeneous Computing with MPI
- Chapter 14: Power Analysis on the Intel® Xeon Phi™ Coprocessor
- Chapter 15: Integrating Intel Xeon Phi Coprocessors into a Cluster Environment
- Chapter 16: Supporting Cluster File Systems on Intel® Xeon Phi™ Coprocessors
- Chapter 17: NWChem: Quantum Chemistry Simulations at Scale
- Chapter 18: Efficient Nested Parallelism on Large-Scale Systems
- Chapter 19: Performance Optimization of Black-Scholes Pricing
- Chapter 20: Data Transfer Using the Intel COI Library
- Chapter 21: High-Performance Ray Tracing
- Chapter 22: Portable Performance with OpenCL
- Chapter 23: Characterization and Optimization Methodology Applied to Stencil Computations
- Chapter 24: Profiling-Guided Optimization
- Chapter 25: Heterogeneous MPI application optimization with ITAC
- Chapter 26: Scalable Out-of-Core Solvers on a Cluster
- Chapter 27: Sparse Matrix-Vector Multiplication: Parallelization and Vectorization
- Chapter 28: Morton Order Improves Performance
- Author Index
- Subject Index
Product information
- Title: High Performance Parallelism Pearls Volume One
- Author(s):
- Release date: November 2014
- Publisher(s): Morgan Kaufmann
- ISBN: 9780128021996
You might also like
book
High Performance Parallelism Pearls Volume Two
High Performance Parallelism Pearls Volume 2 offers another set of examples that demonstrate how to leverage …
book
Structured Parallel Programming
Structured Parallel Programming offers the simplest way for developers to learn patterns for high-performance parallel programming. …
book
High Performance Silicon Imaging
High Performance Silicon Imaging covers the fundamentals of silicon image sensors, with a focus on existing …
audiobook
What's New in AI: Open Source Large Language Models with Eric Xing (Audio)
Join host George Anadiotis and guest Eric Xing, for a discussion about the current and expanding …