You are previewing Power and Performance.
O'Reilly logo
Power and Performance

Book Description

Power and Performance: Software Analysis and Optimization is a guide to solving performance problems in modern Linux systems. Power-efficient chips are no help if the software those chips run on is inefficient. Starting with the necessary architectural background as a foundation, the book demonstrates the proper usage of performance analysis tools in order to pinpoint the cause of performance problems, and includes best practices for handling common performance issues those tools identify.



  • Provides expert perspective from a key member of Intel’s optimization team on how processors and memory systems influence performance
  • Presents ideas to improve architectures running mobile, desktop, or enterprise platforms
  • Demonstrates best practices for designing experiments and benchmarking throughout the software lifecycle
  • Explains the importance of profiling and measurement to determine the source of performance issues

Table of Contents

  1. Cover image
  2. Title page
  3. Table of Contents
  4. Copyright
  5. Dedication
  6. Introduction
    1. Performance Apologetic
    2. A Word on Premature Optimization
    3. The Roadmap
  7. Part 1: Background Knowledge
    1. Chapter 1: Early Intel® Architecture
      1. Abstract
      2. 1.1 Intel® 8086
      3. 1.2 Intel® 8087
      4. 1.3 Intel® 80286 and 80287
      5. 1.4 Intel® 80386 and 80387
    2. Chapter 2: Intel® Pentium® Processors
      1. Abstract
      2. 2.1 Intel® Pentium®
      3. 2.2 Intel® Pentium® Pro
      4. 2.3 Intel® Pentium® 4
    3. Chapter 3: Intel® Core™ Processors
      1. Abstract
      2. 3.1 Intel® Pentium® M
      3. 3.2 Second Generation Intel® Core™ Processor Family
    4. Chapter 4: Performance Workflow
      1. Abstract
      2. 4.1 Step 0: Defining the Problem
      3. 4.2 Step 1: Determine the Source of the Problem
      4. 4.3 Step 2: Determine Whether the Bottleneck Can Be Avoided
      5. 4.4 Step 3: Design a Reproducible Experiment
      6. 4.5 Step 4: Check Upstream
      7. 4.6 Step 5: Algorithmic Improvement
      8. 4.7 Step 6: Architectural Tuning
      9. 4.8 Step 7: Testing
      10. 4.9 Step 8: Performance Regression Testing
    5. Chapter 5: Designing Experiments
      1. Abstract
      2. 5.1 Choosing a Metric
      3. 5.2 Dealing with External Variables
      4. 5.3 Timing
      5. 5.4 Phoronix Test Suite
  8. Part 2: Monitors
    1. Chapter 6: Introduction to Profiling
      1. Abstract
      2. 6.1 PMU
      3. 6.2 Top-Down Hierarchical Analysis
    2. Chapter 7: Intel® VTune™ Amplifier XE
      1. Abstract
      2. 7.1 Installation and Configuration
      3. 7.2 Data Collection and Reporting
    3. Chapter 8: Perf
      1. Abstract
      2. 8.1 Event Infrastructure
      3. 8.2 Perf Tool
    4. Chapter 9: Ftrace
      1. Abstract
      2. 9.1 DebugFS
      3. 9.2 Kernel Shark
    5. Chapter 10: GPU Profiling Tools
      1. Abstract
      2. 10.1 Traditional Graphics Stack
      3. 10.2 buGLe
      4. 10.3 Apitrace
    6. Chapter 11: Other Helpful Tools
      1. Abstract
      2. 11.1 GNU Profiler
      3. 11.2 Gcov
      4. 11.3 PowerTOP
      5. 11.4 LatencyTOP
      6. 11.5 Sysprof
  9. Part 3: Optimization Techniques
    1. Chapter 12: Toolchain Primer
      1. Abstract
      2. 12.1 Compiler Flags
      3. 12.2 ELF and the x86/x86_64 ABIs
      4. 12.3 CPU Dispatch
      5. 12.4 Coding Style
      6. 12.5 x86 Unleashed
    2. Chapter 13: Branching
      1. Abstract
      2. 13.1 Avoiding Branches
      3. 13.2 Improving Prediction
    3. Chapter 14: Optimizing Cache Usage
      1. Abstract
      2. 14.1 Processor Cache Organization
      3. 14.2 Querying Cache Topology
      4. 14.3 Prefetch
      5. 14.4 Improving Locality
    4. Chapter 15: Exploiting Parallelism
      1. Abstract
      2. 15.1 SIMD
    5. Chapter 16: Special Instructions
      1. Abstract
      2. 16.1 Intel® Advanced Encryption Standard New Instructions (AES-NI)
      3. 16.2 PCLMUL-Packed Carry-Less Multiplication
      4. 16.3 CRC32
      5. 16.4 SSE4.2 String Functions
  10. Index