O'Reilly logo
live online training icon Live Online training

Performance Monitoring and Diagnostics for Linux Applications

Sasha Goldshtein

Learn how to approach performance monitoring and diagnostics methodically, and quickly zoom in on the faulty resource. After identifying it, you’ll learn how to obtain accurate, low-overhead performance information across all areas of the system: CPU usage, disk and file I/O, networking, blocked time, database access, and more. Being able to deploy safe, low-overhead tools to get performance insights into your system is a critical skill for any Linux developer and administrator.

What you'll learn-and how you can apply it

  • Which components in a Linux system need to be monitored for utilization and saturation issues?
  • How stack sampling works and how samples are aggregated to analyze performance?
  • The difference between sampling and tracing for performance analysis?
  • How perf collects trace statements and statistics from running processes?
  • The various trace events embedded into high-level language runtimes such as Java and Python?
  • Why BPF offers a superior approach for tracing and performance investigation?
  • Which BPF-based tools to use for identifying bottlenecks

And you’ll be able to:

  • Apply a methodological approach for identifying overloaded resources?
  • Generate stack reports for system events such as CPU samples, disk accesses, blocked threads, database queries and more?
  • Visualize and explore stack traces using flame graphs;?
  • Trace interesting or suspicious system activity (such as syscalls, TCP events, disk accesses) using low-overhead, dynamic tools;?
  • Obtain performance information and events from high-level language runtimes, such as Java, Python, and Node.js;?
  • Develop ad-hoc tools for performance investigations using perf and BPF.

This training course is for you because...

  • You are a Linux application developer and you need to identify bottlenecks in your application (CPU, disk, network, database, memory, GC, and so on).?
  • You are a Linux system administrator (SRE, production engineer) and you need to monitor key performance and load metrics and quickly zoom in on the overloaded resource to find the root cause.


  • Experience with Linux system administration: the shell, pipes, very basic scripting
  • Understanding of operating systems concepts: processes, threads, scheduling, memory
  • Familiarity with a/the programming language used by your Linux application (Java, C++, Python, Ruby, etc.)

During the live training, you’ll will receive a link that gives you access to the virtual lab environment for the day. Only a web browser is required.

Suggested reading:

Read The USE Method by Brendan Gregg for a background on the methodology we will discuss and use in class.

Read CPU Flame Graphs by Brendan Gregg for an overview of flame graphs and how they can be used to drill into a CPU performance problem. (In class, we will use flame graphs for additional tasks and runtimes.)

Read Introducing gobpf by Michael Shubert for a short overview of BPF and how Kinvolk are building a Go binding to BPF.

About your instructor

  • Sasha Goldshtein is the CTO of Sela Group, a Microsoft MVP, Pluralsight author, and international consultant and trainer. Sasha is the author of two books and multiple online courses, and a prolific blogger. He is also an active open source contributor to projects focused on system diagnostics, performance monitoring, and tracing -- across multiple operating systems and runtimes. Sasha authored and delivered training courses on Linux performance optimization, event tracing, production debugging, mobile application development, and modern C++. Between his consulting engagements, Sasha speaks at international conferences world-wide.


The timeframes are only estimates and may vary according to how the class is progressing


Introduction and course overview (5 min)

  • Instructor introduction
  • Goals and non-goals
  • Course modules
  • Prerequisites
  • Overview of the lab environment

The USE method (15 min)

  • Methods for performance investigation: under the street light, USE
  • The USE method for hardware resources
  • The USE method for software resources
  • USE tools checklist for Linux systems
  • Demo: applying USE to find the saturated resource

CPU sampling with perf (10 min)

  • Introduction to perf, how to install it, supported Linux versions
  • Major perf features (in addition to CPU sampling)
  • Recording CPU stacks with perf record -ag -F
  • Demo: finding the CPU bottleneck in a C++ application

Flame graphs and missing symbols (20 min)

  • Resolving stack symbols for JIT- or interpreted runtimes using perf maps
  • Java perf-map-agent, Node.js --perf_basic_prof
  • Demo: finding the CPU bottleneck in a Java application
  • Visualizing stack traces with flame graphs
  • Demo: generating and reading a flame graph

Lab: CPU investigation with perf and flame graphs (20 min)

  • Record stack traces of a C application with perf record
  • Visualize with a flame graph
  • Use jps and perf-map-agent to record stack traces of a JVM application
  • Bonus: try to record CPU stacks from your own system, or from a runtime we haven’t discussed (e.g. Python)

Break (10 min)

Lab discussion, review, Q&A (10 min)

Kernel tracepoints, kprobes, and perf-tools (20 min)

  • Kernel tracepoints and perf support for recording them (perf -e)
  • Interesting kernel tracepoints: scheduler, disk, network device
  • Demo: recording fork events with perf -e (poor man’s execsnoop)
  • Brendan Gregg’s perf-tools repository -- an ftrace + perf_events frontend
  • Demo: characterizing load generated by a process using syscount -c -p PID
  • Dynamic instrumentation with kprobes and uprobes (tracepoints aren’t available)
  • Discovering functions to probe with /proc/kallsyms and objdump -T
  • Demo: probe-based ad-hoc investigations (failed malloc(), long pthread_mutex_lock(), MySQL queries)

Lab: opensnoop (10 min)

  • Diagnose an application that fails to start because it keeps looking for a configuration file that doesn’t exist

Lab: disk stack accesses investigation (10 min)

  • First, identify heavy disk writes performed by the application, and their latency
  • Then, collect call stacks with perf -g -p PID -e … for the block_rq_insert tracepoint and optionally create a flame graph to pinpoint where the writes are coming from

Break (10 min)

Lab discussion, review, Q&A (10 min)

eBPF (20 min)

  • The challenges with perf and similar tools that we’re trying to address, mostly that of overhead (transferring all events to user space for analysis via a file)
  • BPF history -- from Berkeley Socket Filters to tomorrow’s tracing infrastructure
  • BPF support in various kernel versions
  • Structure and architecture of a BPF tracing program
  • Example: contrast between perf-based slow file I/O investigation and BPF-based (only aggregate interesting events and report summaries)

Closing discussion and Q&A (10 min)


Opening and quick recap of Day 1 (5 min)

BCC performance checklist (15 min)

  • The BCC library and toolkit
  • Linux performance checklist based on BCC tools
  • Demo: biolatency from BCC for block I/O latency histograms
  • BCC’s profile tool based on perf_events support in kernel 4.9 (contrast with perf -ag -F-based profiling)

Lab: offcputime (10 min)

  • No considerable CPU load, but high latency for certain operations in an application
  • Blocked time investigation points to inefficient locking and even sleep() calls, visualized through a flame graph using BCC’s offcputime

Lab: fileslower (5 min)

  • Occasional writes performed by an application are slower than usual -- use BCC’s fileslower to figure out where they are coming from and why they are slower

Lab: memleak (15 min)

  • C++ application leaks memory at an alarming rate when processing text files
  • BCC’s specialized memleak tool aggregates allocating and freeing stacks and points to memory that was allocated but not freed (boils down to a std::shared_ptr cycle)

Break (10 min)

Lab discussion, review, Q&A (10 min)

Ad-hoc investigations with BCC tools (30 min)

  • Sources of tracing with BCC tools: tracepoints, kprobes, uprobes, USDT, perf_events
  • Examples of USDT tracepoints in high-level language runtimes: Java GCs, Node.js server requests, Python method calls, Ruby allocs, and so on
  • BCC’s trace syntax and examples
  • Demo: tracing Node.js requests with trace
  • Demo: tracing Java GCs with ugc (or trace)
  • BCC’s argdist syntax and examples, collecting argument histograms and frequency counts
  • Demo: aggregating failed file opens with argdist
  • Demo: displaying slow MySQL queries with argdist

Lab: trace and argdist one-liners (20 min)

  • Display all PostgreSQL queries in real-time with trace
  • Display all system login attempts (including ssh) with trace
  • Identify “hot” files and/or network sockets with argdist
  • Displaying a latency histogram of a specific PostgreSQL query with argdist

Break (10 min)

Lab discussion, review, Q&A (10 min)

Developing BCC tools (20 min)

  • Tool structure and design
  • Example: network send summary tool
  • Attaching BPF programs to probes using the BPF Python module from BCC
  • Data structures: arrays, maps, histograms, stack maps
  • Demo: writing dbslower for identifying slow SQL queries

Course wrap-up, objectives review, final Q&A (20 min)