Summary

We started out in this chapter by seeing how printf can be used within a CUDA kernel to output data from individual threads; we saw in particular how useful this can be for debugging code. We then covered some of the gaps in our knowledge in CUDA-C, so that we can write full test programs that we can compile into proper executable binary files: there is a lot of overhead here that was hidden from us before that we have to be meticulous about. Next, we saw how to create and compile a project in the Nsight IDE and how to use it for debugging. We saw how to stop at any breakpoint we set in a CUDA kernel and switch between individual threads to see the different local variables. We also used the Nsight debugger to learn about the warp ...

Get Hands-On GPU Programming with Python and CUDA now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.