Chapter 24

Profiling-Guided Optimization

Andrey Vladimirov    Colfax International, USA

Abstract

The chapter focuses on a matrix transposition, a small and self-contained workload of great practical value. The optimization process applied to the code relies exclusively on programming in a high-level language plus utilization of the OpenMP framework. The result is a portable code that can run on both CPU (processor) and MIC (coprocessor) architectures, and can be recompiled for future generations of Intel architectures. The focus of the chapter is on the use of Intel® VTune™ Amplifier XE reports to understand where to apply optimization. Through VTune, the performance monitoring functionality of Intel Xeon Phi coprocessors is showcased not only ...

Get High Performance Parallelism Pearls Volume One now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.