Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo
Professional CUDA C Programming

Book Description

Break into the powerful world of parallel GPU programming with this down-to-earth, practical guide

Designed for professionals across multiple industrial sectors, Professional CUDA C Programming presents CUDA -- a parallel computing platform and programming model designed to ease the development of GPU programming -- fundamentals in an easy-to-follow format, and teaches readers how to think in parallel and implement parallel algorithms on GPUs. Each chapter covers a specific topic, and includes workable examples that demonstrate the development process, allowing readers to explore both the "hard" and "soft" aspects of GPU programming.

Computing architectures are experiencing a fundamental shift toward scalable parallel computing motivated by application requirements in industry and science. This book demonstrates the challenges of efficiently utilizing compute resources at peak performance, presents modern techniques for tackling these challenges, while increasing accessibility for professionals who are not necessarily parallel programming experts. The CUDA programming model and tools empower developers to write high-performance applications on a scalable, parallel computing platform: the GPU. However, CUDA itself can be difficult to learn without extensive programming experience. Recognized CUDA authorities John Cheng, Max Grossman, and Ty McKercher guide readers through essential GPU programming skills and best practices in Professional CUDA C Programming, including:

  • CUDA Programming Model

  • GPU Execution Model

  • GPU Memory model

  • Streams, Event and Concurrency

  • Multi-GPU Programming

  • CUDA Domain-Specific Libraries

  • Profiling and Performance Tuning

  • The book makes complex CUDA concepts easy to understand for anyone with knowledge of basic software development with exercises designed to be both readable and high-performance. For the professional seeking entrance to parallel computing and the high-performance computing community, Professional CUDA C Programming is an invaluable resource, with the most current information available on the market.

    Table of Contents

    1. Chapter 1: Heterogeneous Parallel Computing with CUDA
      1. Parallel Computing
      2. Heterogeneous Computing
      3. Hello World from GPU
      4. Is CUDA C Programming Difficult?
      5. Summary
      6. Chapter 1 Exercises
    2. Chapter 2: CUDA Programming Model
      1. Introducing the CUDA Programming Model
      2. Timing Your Kernel
      3. Organizing Parallel Threads
      4. Managing Devices
      5. Summary
      6. Chapter 2 Exercises
    3. Chapter 3: CUDA Execution Model
      1. Introducing the CUDA Execution Model
      2. Understanding the Nature of Warp Execution
      3. Exposing Parallelism
      4. Avoiding Branch Divergence
      5. Unrolling Loops
      6. Dynamic Parallelism
      7. Summary
      8. Chapter 3 Exercises
    4. Chapter 4: Global Memory
      1. Introducing the CUDA Memory Model
      2. Memory Management
      3. Memory Access Patterns
      4. What Bandwidth Can a Kernel Achieve?
      5. Matrix Addition with Unified Memory
      6. Summary
      7. Chapter 4 Exercises
    5. Chapter 5: Shared Memory and Constant Memory
      1. Introducing CUDA Shared Memory
      2. Checking the Data Layout of Shared Memory
      3. Reducing Global Memory Access
      4. Coalescing Global Memory Accesses
      5. Constant Memory
      6. The Warp Shuffle Instruction
      7. Summary
      8. Chapter 5 Exercises
    6. Chapter 6: Streams and Concurrency
      1. Introducing Streams and Events
      2. Concurrent Kernel Execution
      3. Overlapping Kernel Execution and Data Transfer
      4. Overlapping GPU and CPU Execution
      5. Stream Callbacks
      6. Summary
      7. Chapter 6 Exercises
    7. Chapter 7: Tuning Instruction-Level Primitives
      1. Introducing CUDA Instructions
      2. Optimizing Instructions for Your Application
      3. Summary
      4. Chapter 7 Exercises
    8. Chapter 8: GPU-Accelerated CUDA Libraries and OpenACC
      1. Introducing the CUDA Libraries
      2. The cuSPARSE Library
      3. The CUBLAS Library
      4. The cuFFT Library
      5. The cuRAND Library
      6. CUDA Library Features Introduced in CUDA 6
      7. A Survey of CUDA Library Performance
      8. Using OpenACC
      9. Summary
      10. Chapter 8 Exercises
    9. Chapter 9: Multi-GPU Programming
      1. Moving to Multiple GPUs
      2. Subdividing Computation across Multiple GPUs
      3. Peer-to-Peer Communication on Multiple GPUs
      4. Finite Difference on Multi-GPU
      5. Scaling Applications across GPU Clusters
      6. Summary
      7. Chapter 9 Exercises
    10. Chapter 10: Implementation Considerations
      1. The CUDA C Development Process
      2. Profile-Driven Optimization
      3. CUDA Debugging
      4. A Case Study in Porting C Programs to CUDA C
      5. Summary
      6. Chapter 10 Exercises
    11. Appendix: Suggested Readings
    12. Introduction
      1. Who This Book Is For
      2. What This Book Covers
      3. How This Book Is Structured
      4. What You Need to Use This Book
      5. CUDA Toolkit Download
      6. Conventions
      7. Source Code
      8. Errata
      10. Useful Links
    13. Advertisement
    14. End User License Agreement