You are previewing GPU Computing Gems Jade Edition.
O'Reilly logo
GPU Computing Gems Jade Edition

Book Description

GPU Computing Gems, Jade Edition describes successful application experiences in GPU computing and the techniques that contributed to that success. Divided into five sections, the book explains how GPU execution is achieved with algorithm implementation techniques and approaches to data structure layout. More specifically, it considers three general requirements: high level of parallelism, coherent memory access by threads within warps, and coherent control flow within warps.
This book begins with an overview of parallel algorithms and data structures. The first few chapters focus on accelerating database searches, how to leverage the Fermi GPU architecture to further accelerate prefix operations, and GPU implementation of hash tables. The reader is then systematically walked through the fundamental optimization steps when implementing a bandwidth-limited algorithm, GPU-based libraries of numerical algorithms and software products for numerical analysis with dedicated GPU support, and the adoption of GPU computing techniques in production engineering simulation codes. The next chapters discuss the state of GPU computing in interactive physics and artificial intelligence, programming tools and techniques for GPU computing, and the edge and node parallelism approach for computing graph centrality metrics. The book also proposes an alternative approach that balances computation regardless of node degree variance.
This book will be useful to application developers in a wide range of application areas.

  • This second volume of GPU Computing Gems offers 100% new material of interest across industry, including finance, medicine, imaging, engineering, gaming, environmental science, green computing, and more
  • Covers new tools and frameworks for productive GPU computing application development and offers immediate benefit to researchers developing improved programming environments for GPUs
  • Even more hands-on, proven techniques demonstrating how general purpose GPU computing is changing scientific research
  • Distills the best practices of the community of CUDA programmers; each chapter provides insights and ideas as well as 'hands on' skills applicable to a variety of fields

Table of Contents

  1. Cover image
  2. Table of Contents
  3. Front Matter
  4. Copyright
  5. Editors, Reviewers, and Authors
  6. Introduction
  7. Introduction
  8. Chapter 1. Large-Scale GPU Search
  9. 1.1. Introduction
  10. 1.2. Memory Performance
  11. 1.3. Searching Large Data Sets
  12. 1.4. Experimental Evaluation
  13. 1.5. Conclusion
  14. Chapter 2. Edge v. Node Parallelism for Graph Centrality Metrics
  15. 2.1. Introduction
  16. 2.2. Background
  17. 2.3. Node v. Edge Parallelism
  18. 2.4. Data Structure
  19. 2.5. Implementation
  20. 2.6. Analysis
  21. 2.7. Results
  22. 2.8. Conclusions
  23. Chapter 3. Optimizing Parallel Prefix Operations for the Fermi Architecture
  24. 3.1. Introduction to Parallel Prefix Operations
  25. 3.2. Efficient Binary Prefix Operations on Fermi
  26. 3.3. Conclusion
  27. Chapter 4. Building an Efficient Hash Table on the GPU
  28. 4.1. Introduction
  29. 4.2. Overview
  30. 4.3. Building and Querying a Basic Hash Table
  31. 4.4. Specializing the Hash Table
  32. 4.5. Analysis
  33. 4.6. Conclusion
  34. Chapter 5. Efficient CUDA Algorithms for the Maximum Network Flow Problem
  35. 5.1. Introduction, Problem Statement, and Context
  36. 5.2. Core Method
  37. 5.3. Algorithms, Implementations, and Evaluations
  38. 5.4. Final Evaluation
  39. 5.5. Future Directions
  40. Chapter 6. Optimizing Memory Access Patterns for Cellular Automata on GPUs
  41. 6.1. Introduction, Problem Statement, and Context
  42. 6.2. Core Methods
  43. 6.3. Algorithms, Implementations, and Evaluations
  44. 6.4. Final Results
  45. 6.5. Future Directions
  46. Chapter 7. Fast Minimum Spanning Tree Computation
  47. 7.1. Introduction, Problem Statement, and Context
  48. 7.2. The MST Algorithm: Overview
  49. 7.3. CUDA Implementation of MST
  50. 7.4. Evaluation
  51. 7.5. Conclusions
  52. Chapter 8. Comparison-Based In-Place Sorting with CUDA
  53. 8.1. Introduction
  54. 8.2. Bitonic Sort
  55. 8.3. Implementation
  56. 8.4. Evaluation
  57. 8.5. Conclusion
  58. Introduction
  59. Chapter 9. Interval Arithmetic in CUDA
  60. 9.1. Interval Arithmetic
  61. 9.2. Importance of Rounding Modes
  62. 9.3. Interval Operators in CUDA
  63. 9.4. Some Evaluations: Synthetic Benchmark
  64. 9.5. Application-Level Benchmark
  65. 9.6. Conclusion
  66. Chapter 10. Approximating the erfinv Function
  67. 10.1. Introduction
  68. 10.2. New erfinv Approximations
  69. 10.3. Performance and Accuracy
  70. 10.4. Conclusions
  71. Chapter 11. A Hybrid Method for Solving Tridiagonal Systems on the GPU
  72. 11.1. Introduction
  73. 11.3. Algorithms
  74. 11.4. Implementation
  75. 11.5. Results and Evaluation
  76. 11.6. Future Directions
  77. Chapter 12. Accelerating CULA Linear Algebra Routines with Hybrid GPU and Multicore Computing
  78. 12.1. Introduction, Problem Statement, and Context
  79. 12.2. Core Methods
  80. 12.3. Algorithms, Implementations, and Evaluations
  81. 12.4. Final Evaluation and Validation]{Final Evaluation and Validation of Results, Total Benefits, and Limitations
  82. 12.5. Future Directions
  83. Chapter 13. GPU Accelerated Derivative-Free Mesh Optimization
  84. 13.1. Introduction, Problem Statement, and Context
  85. 13.2. Core Method
  86. 13.3. Algorithms, Implementations, and Evaluations
  87. 13.4. Final Evaluation
  88. 13.5. Future Direction
  89. Introduction
  90. Chapter 14. Large-Scale Gas Turbine Simulations on GPU Clusters
  91. 14.1. Introduction, Problem Statement, and Context
  92. 14.2. Core Method
  93. 14.3. Algorithms, Implementations, and Evaluations
  94. 14.4. Final Evaluation
  95. 14.5. Test Case and Parallel Performance
  96. 14.6. Future Directions
  97. Chapter 15. GPU Acceleration of Rarefied Gas Dynamic Simulations
  98. 15.1. Introduction, Problem Statement, and Context
  99. 15.2. Core Methods
  100. 15.3. Algorithms, Implementations, and Evaluations
  101. 15.4. Final Evaluation
  102. 15.5. Future Directions
  103. Chapter 16. Application of Assembly of Finite Element Methods on Graphics Processors for Real-Time Elastodynamics
  104. 16.1. Introduction, Problem Statement, and Context
  105. 16.2. Core Method
  106. 16.3. Algorithms, Implementations, and Evaluations
  107. 16.4. Evaluation and Validation of Results, Total Benefits, Limitations
  108. 16.5. Future Directions
  109. Chapter 17. CUDA Implementation of Vertex-Centered, Finite Volume CFD Methods on Unstructured Grids with Flow Control Applications
  110. 17.1. Introduction, Problem Statement, and Context
  111. 17.2. Core (CFD and Optimization) Methods
  112. 17.3. Implementations and Evaluation
  113. 17.4. Applications to Flow Control — Optimization
  114. Chapter 18. Solving Wave Equations on Unstructured Geometries
  115. 18.1. Introduction, Problem Statement, and Context
  116. 18.2. Core Method
  117. 18.3. Algorithms, Implementations, and Evaluations
  118. 18.4. Final Evaluation
  119. 18.5. Future Directions
  120. Chapter 19. Fast Electromagnetic Integral Equation Solvers on Graphics Processing Units
  121. 19.1. Problem Statement and Background
  122. 19.2. Algorithms Introduction
  123. 19.3. Algorithm Description
  124. 19.4. GPU Implementations
  125. 19.5. Results
  126. 19.6. Integrating the GPU NGIM Algorithms with Iterative IE Solvers
  127. 19.7. Future directions
  128. Introduction
  129. Chapter 20. Solving Large Multibody Dynamics Problems on the GPU
  130. 20.1. Introduction, Problem Statement, and Context
  131. 20.2. Core Method
  132. 20.3. The Time-Stepping Scheme
  133. 20.4. Algorithms, Implementations, and Evaluations
  134. 20.5. Final Evaluation
  135. 20.6. Future Directions
  136. Chapter 21. Implicit FEM Solver on GPU for Interactive Deformation Simulation
  137. 21.1. Problem Statement and Context
  138. 21.2. Core Method
  139. 21.3. Algorithms and Implementations
  140. 21.4. Results and Evaluation
  141. 21.5. Future Directions
  142. Chapter 22. Real-Time Adaptive GPU Multiagent Path Planning
  143. 22.1. Introduction
  144. 22.2. Core Method
  145. 22.3. Implementation
  146. 22.4. Results
  147. Introduction
  148. Chapter 23. Pricing Financial Derivatives with High Performance Finite Difference Solvers on GPUs
  149. 23.1. Introduction, Problem Statement, and Context
  150. 23.2. Core Method
  151. 23.3. Algorithms, Implementations, and Evaluations
  152. 23.4. Final Evaluation
  153. 23.5. Future Directions
  154. Chapter 24. Large-Scale Credit Risk Loss Simulation
  155. 24.1. Introduction, Problem Statement, and Context
  156. 24.2. Core Methods
  157. 24.3. Algorithms, Implementations, Evaluations
  158. 24.4. Results and Conclusions
  159. 24.5. Future Developments
  160. Chapter 25. Monte Carlo–Based Financial Market Value-at-Risk Estimation on GPUs
  161. 25.1. Introduction, Problem Statement, and Context
  162. 25.2. Core Methods
  163. 25.3. Algorithms, Implementations, and Evaluations
  164. 25.4. Final Results
  165. 25.5. Conclusion
  166. Introduction
  167. Chapter 26. Thrust
  168. 26.1. Motivation
  169. 26.2. Diving In
  170. 26.3. Generic Programming
  171. 26.4. Benefits of Abstraction
  172. 26.5. Best Practices
  173. Chapter 27. GPU Scripting and Code Generation with PyCUDA
  174. 27.1. Introduction, Problem Statement, and Context
  175. 27.2. Core Method
  176. 27.3. Algorithms, Implementations, and Evaluations
  177. 27.4. Evaluation
  178. 27.5. Availability
  179. 27.6. Future Directions
  180. Chapter 28. Jacket
  181. 28.1. Introduction
  182. 28.2. Jacket
  183. 28.3. Benchmarking Procedures
  184. 28.4. Experimental Results
  185. 28.5. Future Directions
  186. Chapter 29. Accelerating Development and Execution Speed with Just-in-Time GPU Code Generation
  187. 29.1. Introduction, Problem Statement, and Context
  188. 29.2. Core Methods
  189. 29.3. Algorithms, Implementations, and Evaluations
  190. 29.4. Final Evaluation
  191. 29.5. Future Directions
  192. Chapter 30. GPU Application Development, Debugging, and Performance Tuning with GPU Ocelot
  193. 30.1. Introduction
  194. 30.2. Core Technology
  195. 30.3. Algorithm, Implementation, and Benefits
  196. 30.4. Future Directions
  197. Chapter 31. Abstraction for AoS and SoA Layout in C++
  198. 31.1. Introduction, Problem Statement, and Context
  199. 31.2. Core Method
  200. 31.3. Implementation
  201. 31.4. ASA in Practice
  202. 31.5. Final Evaluation
  203. Chapter 32. Processing Device Arrays with C++ Metaprogramming
  204. 32.1. Introduction, Problem Statement, and Context
  205. 32.2. Core Method
  206. 32.3. Implementation
  207. 32.4. Evaluation
  208. 32.5. Future Directions
  209. Chapter 33. GPU Metaprogramming
  210. 33.1. Introduction, Problem Statement, and Context
  211. 33.2. Core Method
  212. 33.3. Algorithms, Implementations, and Evaluations
  213. 33.4. Final Evaluation
  214. 33.5. Future Directions
  215. Chapter 34. A Hybridization Methodology for High-Performance Linear Algebra Software for GPUs
  216. 34.1. Introduction, Problem Statement, and Context
  217. 34.2. Core Method
  218. 34.3. Algorithms, Implementations, and Evaluations
  219. 34.4. Final Evaluation
  220. 34.5. Future Directions
  221. Chapter 35. Dynamic Load Balancing Using Work-Stealing
  222. 35.1. Introduction
  223. 35.2. Core Method
  224. 35.3. Algorithms and Implementations
  225. 35.4. Case Studies and Evaluation
  226. 35.5. Future Directions
  227. Chapter 36. Applying Software-Managed Caching and CPU/GPU Task Scheduling for Accelerating Dynamic Workloads
  228. 36.1. Introduction, Problem Statement, and Context
  229. 36.2. Core Method
  230. 36.3. Algorithms, Implementations, and Evaluations
  231. 36.4. Final Evaluation
  232. Index