Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

O'Reilly logo
Intel® Xeon Phi™ Coprocessor Architecture and Tools: The Guide for Application Developers

Book Description

Intel® Xeon Phi™ Coprocessor Architecture and Tools: The Guide for Application Developers provides developers a comprehensive introduction and in-depth look at the Intel Xeon Phi coprocessor architecture and the corresponding parallel data structure tools and algorithms used in the various technical computing applications for which it is suitable. It also examines the source code-level optimizations that can be performed to exploit the powerful features of the processor.

Xeon Phi is at the heart of world's fastest commercial supercomputer, which thanks to the massively parallel computing capabilities of Intel Xeon Phi processors coupled with Xeon Phi coprocessors attained 33.86 teraflops of benchmark performance in 2013. Extracting such stellar performance in real-world applications requires a sophisticated understanding of the complex interaction among hardware components, Xeon Phi cores, and the applications running on them.

In this book, Rezaur Rahman, an Intel leader in the development of the Xeon Phi coprocessor and the optimization of its applications, presents and details all the features of Xeon Phi core design that are relevant to the practice of application developers, such as its vector units, hardware multithreading, cache hierarchy, and host-to-coprocessor communication channels. Building on this foundation, he shows developers how to solve real-world technical computing problems by selecting, deploying, and optimizing the available algorithms and data structure alternatives matching Xeon Phi's hardware characteristics. From Rahman's practical descriptions and extensive code examples, the reader will gain a working knowledge of the Xeon Phi vector instruction set and the Xeon Phi microarchitecture whereby cores execute 512-bit instruction streams in parallel.

What you'll learn

  • How to calculate theoretical Gigaflops and bandwidth numbers on the hardware and measure them through code segment

  • How to estimate latencies in fetching data from different cache hierarchies, including memory subsystems

  • How to measure PCIe bus bandwidth between the host and coprocessor

  • How to exploit power management and reliability features built into the hardware

  • How to select and manipulate the best tools to tune particular Xeon Phi applications

  • Algorithms and data structures for optimizing Xeon Phi performance

  • Case studies of real-world Xeon Phi technical computing applications in molecular dynamics and financial simulations

Who this book is for

This book is for developers wishing to design and develop technical computing applications to achieve the highest performance available in the Intel Xeon Phi coprocessor hardware. It provides a solid base on the coprocessor architecture, as well as algorithm and data structure case studies for Xeon Phi coprocessor. The book may also be of interest to students and practitioners in computer engineering as a case study for massively parallel core microarchitecture of modern day processors.

Table of Contents

  1. Title Page
  2. About ApressOpen
  3. Dedication
  4. Contents at a Glance
  5. Contents
  6. About the Author
  7. About the Technical Reviewer
  8. Acknowledgments
  9. Introduction
  10. PART 1: Hardware Foundation: Intel Xeon Phi Architecture
    1. CHAPTER 1: Introduction to Xeon Phi Architecture
      1. History of Intel Xeon Phi Development
      2. Intel Xeon Phi Coprocessor Chip Architecture
      3. Applicability of the Intel Xeon Phi Coprocessor
      4. Summary
    2. CHAPTER 2: Programming Xeon Phi
      1. Intel Xeon Phi Execution Models
      2. Development Tools for Intel Xeon Phi Architecture
      3. Setting Up an Intel Xeon Phi System
      4. Code Generation for Intel Xeon Phi Architecture
      5. Language Extensions to Support Offload Computation on Intel Xeon Phi
      6. Summary
    3. CHAPTER 3: Xeon Phi Vector Architecture and Instruction Set
      1. Xeon Phi Vector Microarchitecture
      2. Xeon Phi Vector Instruction Set Architecture
      3. Summary
    4. CHAPTER 4: Xeon Phi Core Microarchitecture
      1. Intel Xeon Phi Cores
      2. Core Pipeline Stages
      3. Cache and TLB Structure
      4. L2 Cache Structure
      5. Multithreading
      6. Summary
    5. CHAPTER 5: Xeon Phi Cache and Memory Subsystem
      1. The Interconnect Topologies for Manycore Processors
      2. The Ring Interconnect Architecture in Intel Xeon Phi
      3. L2 Cache
      4. Memory Transactions Flow
      5. Probing the Memory Subsystem
      6. Summary
    6. CHAPTER 6: Xeon Phi PCIe Bus Data Transfer and Power Management
      1. DMA Engine
      2. Reading Data from the Coprocessor
      3. Low-Level Data Transfer APIs for Intel Xeon Phi
      4. Placement of PCIe Cards for Optimal Data Transfer BW
      5. Power Management and Reliability
      6. Summary
  11. PART 2: Software Foundation: Intel Xeon Phi System Software and Tools
    1. CHAPTER 7: Xeon Phi System Software
      1. System Software Component
      2. Ring 0 Driver Layer Components of the MPSS
      3. Summary
    2. CHAPTER 8: Xeon Phi Application Development Tools
      1. The Application Development Tools
      2. Keywords
      3. Macros
      4. Intrinsics
      5. Application Programming Interfaces
      6. Intel Fortran Composer XE
      7. Environment Variables, Compiler Options, and Creating Static Libraries
      8. Optimization Tool: Intel Vtune Amplifier XE
      9. Libraries
      10. Intel Cluster Tools
      11. Summary
  12. PART 3: Applications: Technical Computing Software Development on Intel Xeon Phi
    1. CHAPTER 9: Xeon Phi Application Design and Implementation Considerations
      1. Workload-Related Considerations
      2. Effect of Grid Shape on Performance
      3. Implementation Considerations
      4. Summary
    2. CHAPTER 10: Application Performance Tuning on Xeon Phi
      1. Getting Baseline Data
      2. Timing Applications
      3. Detecting Application Execution Bottlenecks
      4. Setting Target Performance
      5. Optimizing Code
      6. Using the Math Kernel Library
      7. Cluster-Level Tuning
      8. Summary
    3. CHAPTER 11: Algorithm and Data Structures for Xeon Phi
      1. Algorithm and Data Structure Design Rules for Xeon Phi
      2. General Matrix-Matrix Multiply Algorithm (GEMM)
      3. Molecular Dynamics
      4. Stencil Operation
      5. European Option Pricing Using Monte Carlo Simulation in Financial Applications
      6. Summary
    4. CHAPTER 12: Xeon Phi Application Development on Windows OS
      1. MPSS
      2. Development Tools
      3. Debugging Offload Execution
      4. Using VTune Amplifier XE to Profile Offload Code on Windows
      5. Building and Running Xeon Phi Native Applications from the Windows Host
      6. Summary
    5. APPENDIX A: OpenCL on Xeon Phi
      1. Installation
      2. Building and Running OpenCL Application
      3. Performance Optimization
    6. APPENDIX B: Virtual Shared Memory Programming on Xeon Phi
      1. Placing Data on the Virtual Shared Memory Region
      2. Shared Functions
      3. Synchronizing Between the Host and the Coprocessors
  13. Index