Accelerators such as a typical GPU have a relatively small programmable cache that can be accessed far more quickly than the global accelerator memory used for an array or array_view. How much more quickly? On the order of a hundred times faster! Of course, there’s more to your algorithm than accessing memory, but taking advantage of that fast memory can produce a significant gain that can have a big impact on your application’s overall performance. It’s a bit com...
- 4. Tiling
- from C++ AMP: Accelerated Massive Parallelism with Microsoft® Visual C++®
- Publisher: Microsoft Press
- Released: September 2012
should use fast memory
Share this highlighthttp://www.safaribooksonline.com/a/c-amp-accelerated/21120/