You are previewing CUDA Programming.
O'Reilly logo
CUDA Programming

Book Description

If you need to learn CUDA but don't have experience with parallel computing, CUDA Programming: A Developer's Introduction offers a detailed guide to CUDA with a grounding in parallel fundamentals. It starts by introducing CUDA and bringing you up to speed on GPU parallelism and hardware, then delving into CUDA installation. Chapters on core concepts including threads, blocks, grids, and memory focus on both parallel and CUDA-specific issues. Later, the book demonstrates CUDA in practice for optimizing applications, adjusting to new hardware, and solving common problems.



  • Comprehensive introduction to parallel programming with CUDA, for readers new to both
  • Detailed instructions help readers optimize the CUDA software development kit
  • Practical techniques illustrate working with memory, threads, algorithms, resources, and more
  • Covers CUDA on multiple hardware platforms: Mac, Linux and Windows with several NVIDIA chipsets
  • Each chapter includes exercises to test reader knowledge

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Preface
  6. Chapter 1. A Short History of Supercomputing
    1. Introduction
    2. Von Neumann Architecture
    3. Cray
    4. Connection Machine
    5. Cell Processor
    6. Multinode Computing
    7. The Early Days of Gpgpu Coding
    8. The Death of the Single-Core Solution
    9. Nvidia and Cuda
    10. Gpu Hardware
    11. Alternatives to Cuda
    12. Conclusion
  7. Chapter 2. Understanding Parallelism with GPUs
    1. Introduction
    2. Traditional Serial Code
    3. Serial/Parallel Problems
    4. Concurrency
    5. Types of Parallelism
    6. Flynn’s Taxonomy
    7. Some Common Parallel Patterns
    8. Conclusion
  8. Chapter 3. CUDA Hardware Overview
    1. PC Architecture
    2. GPU Hardware
    3. CPUs and GPUs
    4. Compute Levels
  9. Chapter 4. Setting Up CUDA
    1. Introduction
    2. Installing the Sdk Under Windows
    3. Visual Studio
    4. Linux
    5. Mac
    6. Installing a Debugger
    7. Compilation Model
    8. Error Handling
    9. Conclusion
  10. Chapter 5. Grids, Blocks, and Threads
    1. What it all Means
    2. Threads
    3. Blocks
    4. Grids
    5. Warps
    6. Block Scheduling
    7. A Practical Example—Histograms
    8. Conclusion
  11. Chapter 6. Memory Handling with CUDA
    1. Introduction
    2. Caches
    3. Register Usage
    4. Shared Memory
    5. Constant Memory
    6. Global Memory
    7. Texture Memory
    8. Conclusion
  12. Chapter 7. Using CUDA in Practice
    1. Introduction
    2. Serial and Parallel Code
    3. Processing Datasets
    4. Profiling
    5. An Example Using AES
    6. Conclusion
    7. References
  13. Chapter 8. Multi-CPU and Multi-GPU Solutions
    1. Introduction
    2. Locality
    3. Multi-CPU Systems
    4. Multi-GPU Systems
    5. Algorithms on Multiple GPUS
    6. Which GPU?
    7. Single-Node Systems
    8. Streams
    9. Multiple-Node Systems
    10. Conclusion
  14. Chapter 9. Optimizing Your Application
    1. Strategy 1: Parallel/Serial GPU/CPU Problem Breakdown
    2. Strategy 2: Memory Considerations
    3. Strategy 3: Transfers
    4. Strategy 4: Thread Usage, Calculations, and Divergence
    5. Strategy 5: Algorithms
    6. Strategy 6: Resource Contentions
    7. Strategy 7: Self-Tuning Applications
    8. Conclusion
  15. Chapter 10. Libraries and SDK
    1. Introduction
    2. Libraries
    3. CUDA Computing SDK
    4. Directive-Based Programming
    5. Writing Your Own Kernels
    6. Conclusion
  16. Chapter 11. Designing GPU-Based Systems
    1. Introduction
    2. CPU Processor
    3. GPU Device
    4. PCI-E Bus
    5. GeForce cards
    6. CPU Memory
    7. Air Cooling
    8. Liquid Cooling
    9. Desktop Cases and Motherboards
    10. Mass Storage
    11. Power Considerations
    12. Operating Systems
    13. Conclusion
  17. Chapter 12. Common Problems, Causes, and Solutions
    1. Introduction
    2. Errors With CUDA Directives
    3. Parallel Programming Issues
    4. Algorithmic Issues
    5. Finding and Avoiding Errors
    6. Developing for Future GPUs
    7. Further Resources
    8. Conclusion
    9. References
  18. Index