Chapter 10Implementation Considerations

What's in this chapter?

  • Understanding the CUDA development process
  • Discovering optimization opportunities using profiling tools
  • Using the right metrics/events to determine most likely performance limiters
  • Integrating NVTX to mark a critical section of code for profiling
  • Using CUDA debugging tools to debug kernel and memory errors in CUDA
  • Porting a real-world application from legacy C to CUDA C

Modern heterogeneous and parallel systems are not exclusively used for high-performance computing, but also apply to embedded development, mobile development, tablets, notebooks, PCs, and workstations. This ubiquity is causing a paradigm shift in general-purpose software development toward heterogeneous parallel programming as access to these systems becomes more common. Parallel programming has never been more convenient and beneficial, and so understanding how to efficiently and correctly implement parallel and heterogeneous software has never been more important.

This chapter covers the following aspects of CUDA C project development:

  • The CUDA C development process
  • Profile-driven optimization
  • CUDA development tools

A case study is provided at the end of this chapter to demonstrate porting a legacy C application ...

Get Professional CUDA C Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.