Questions

  1. In the launch parameters for the kernel in the first example, our kernels were each launched over 64 threads. If we increase the number of threads to and beyond the number of cores in our GPU, how does this affect the performance of both the original to the stream version?
  2. Consider the CUDA C example that was given at the very beginning of this chapter, which illustrated the use of cudaDeviceSynchronize. Do you think it is possible to get some level of concurrency among multiple kernels without using streams and only using cudaDeviceSynchronize?
  3. If you are a Linux user, modify the last example that was given to operate over processes rather than threads.
  4. Consider the multi-kernel_events.py program; we said it is good that there ...

Get Hands-On GPU Programming with Python and CUDA now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.