You are previewing CUDA Fortran for Scientists and Engineers.
O'Reilly logo
CUDA Fortran for Scientists and Engineers

Book Description

CUDA Fortran for Scientists and Engineers shows how high-performance application developers can leverage the power of GPUs using Fortran, the familiar language of scientific computing and supercomputer performance benchmarking. The authors presume no prior parallel computing experience, and cover the basics along with best practices for efficient GPU computing using CUDA Fortran.

To help you add CUDA Fortran to existing Fortran codes, the book explains how to understand the target GPU architecture, identify computationally intensive parts of the code, and modify the code to manage the data and parallelism and optimize performance. All of this is done in Fortran, without having to rewrite in another language. Each concept is illustrated with actual examples so you can immediately evaluate the performance of your code in comparison.



  • Leverage the power of GPU computing with PGI’s CUDA Fortran compiler
  • Gain insights from members of the CUDA Fortran language development team
  • Includes multi-GPU programming in CUDA Fortran, covering both peer-to-peer and message passing interface (MPI) approaches
  • Includes full source code for all the examples and several case studies
  • Download source code and slides from the book's companion website

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Acknowledgments
  7. Preface
    1. Companion Site
  8. Part I: CUDA Fortran Programming
    1. Chapter 1. Introduction
      1. Abstract
      2. 1.1 A brief history of GPU computing
      3. 1.2 Parallel computation
      4. 1.3 Basic concepts
      5. 1.4 Determining CUDA hardware features and limits
      6. 1.5 Error handling
      7. 1.6 Compiling CUDA Fortran code
    2. Chapter 2. Performance Measurement and Metrics
      1. Abstract
      2. 2.1 Measuring kernel execution time
      3. 2.2 Instruction, bandwidth, and latency bound kernels
      4. 2.3 Memory bandwidth
    3. Chapter 3. Optimization
      1. Abstract
      2. 3.1 Transfers between host and device
      3. 3.2 Device memory
      4. 3.3 On-chip memory
      5. 3.4 Memory optimization example: matrix transpose
      6. 3.5 Execution configuration
      7. 3.6 Instruction optimization
      8. 3.7 Kernel loop directives
    4. Chapter 4. Multi-GPU Programming
      1. Abstract
      2. 4.1 CUDA multi-GPU features
      3. 4.2 Multi-GPU Programming with MPI
  9. Part II: Case Studies
    1. Chapter 5. Monte Carlo Method
      1. Abstract
      2. 5.1 CURAND
      3. 5.2 Computing π with CUF kernels
      4. 5.3 Computing π with reduction kernels
      5. 5.4 Accuracy of summation
      6. 5.5 Option pricing
    2. Chapter 6. Finite Difference Method
      1. Abstract
      2. 6.1 Nine-Point 1D finite difference stencil
      3. 6.2 2D Laplace equation
    3. Chapter 7. Applications of Fast Fourier Transform
      1. Abstract
      2. 7.1 CUFFT
      3. 7.2 Spectral derivatives
      4. 7.3 Convolution
      5. 7.4 Poisson Solver
  10. Part III: Appendices
    1. Appendix A. Tesla Specifications
    2. Appendix B. System and Environment Management
      1. B.1 Environment variables
      2. B.2 nvidia-smi System Management Interface
    3. Appendix C. Calling CUDA C from CUDA Fortran
      1. C.1 Calling CUDA C libraries
      2. C.2 Calling User-Written CUDA C Code
    4. Appendix D. Source Code
      1. D.1 Texture memory
      2. D.2 Matrix transpose
      3. D.3 Thread- and instruction-level parallelism
      4. D.4 Multi-GPU programming
      5. D.5 Finite difference code
      6. D.6 Spectral Poisson Solver
  11. References
  12. Index