Chapter 9

Tuning Parallel Applications

What's in This Chapter?

Using Amplifier XE to profile a parallel program

The five tuning steps

Using the Intel Software Autotuning Tool

Chapters 6–8 described the first three steps to make your code parallel — analyze, implement, and debug. This chapter discusses the final challenge — tuning your parallel application so that it is load-balanced and runs efficiently.

The chapter begins by describing how to use Amplifier XE to check the concurrency of your parallel program, and then shows how to detect and tune any synchronization problems. The chapter concludes by describing the experimental Intel Software Autotuning Tool (ISAT).

Note that all the screenshots and instructions in this chapter are based on Windows XE; however, you can run the hands-on activities on Linux, as well.

Introduction

Amplifier XE provides two predefined analysis types to help tune your parallel application:

  • Concurrency analysis — Use this to find out which logical CPUs are being used, to discover where parallelism is incurring synchronization overhead, and to identify potential candidates for further parallelization.
  • Locks and Waits analysis — Use this to identify where your application is waiting on synchronization objects or I/O operations, and to discover how these waits affect your program performance.

In this chapter, you use the Concurrency analysis as the main vehicle for parallel tuning. If your program has a lot of synchronization events, you may find the ...

Get Parallel Programming with Intel® Parallel Studio XE now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.