O'Reilly logo

GPU Computing Gems Emerald Edition by Wen-mei W. Hwu

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 6. An Efficient CUDA Implementation of the Tree-Based Barnes Hut n -Body Algorithm
Martin Burtscher and Keshav Pingali
This chapter describes the first CUDA implementation of the classical Barnes Hut n -body algorithm that runs entirely on the GPU. Unlike most other CUDA programs, our code builds an irregular tree-based data structure and performs complex traversals on it. It consists of six GPU kernels. The kernels are optimized to minimize memory accesses and thread divergence and are fully parallelized within and across blocks. Our CUDA code takes 5.2 seconds to simulate one time step with 5,000,000 bodies on a 1.3 GHz Quadro FX 5800 GPU with 240 cores, which is 74 times faster than an optimized serial implementation running on a ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required