O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

C++ High Performance

Book Description

Write code that scales across CPU registers, multi-core, and machine clusters

About This Book

  • Explore concurrent programming in C++
  • Identify memory management problems
  • Use SIMD and STL containers for performance improvement

Who This Book Is For

If you're a C++ developer looking to improve the speed of your code or simply wanting to take your skills up to the next level, then this book is perfect for you.

What You Will Learn

  • Find out how to use exciting new tools that will help you improve your code
  • Identify bottlenecks to optimize your code
  • Develop applications that utilize GPU computation
  • Reap the benefits of concurrent programming
  • Write code that can protect against application errors using error handling
  • Use STL containers and algorithms effciently
  • Extend your toolbox with Boost containers
  • Achieve effcient memory management by using custom memory allocators

In Detail

C++ is a highly portable language and can be used to write complex applications and performance-critical code. It has evolved over the last few years to become a modern and expressive language. This book will guide you through optimizing the performance of your C++ apps by allowing them to run faster and consume fewer resources on the device they're running on. The book begins by helping you to identify the bottlenecks in C++. It then moves on to measuring performance, and you'll see how this affects the way you write code. Next, you'll see the importance of data structure optimization and how it can be used efficiently. After that, you'll see which algorithm should be used to achieve faster execution, followed by how to use STL containers. Moving on, you'll learn how to improve memory management in C++. You'll get hands on experience making use of multiple cores to enable more efficient and faster execution. The book ends with a brief overview of utilizing the capabilities of your GPU by using Boost Compute and OpenCL.

Style and approach

This easy-to-follow guide is full of examples and self-sufficient code snippets that help you with high performance programming with C++. You'll get your hands dirty with this all-inclusive guide that uncovers hidden performance improvement areas for any C++ code.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Conventions used
    4. Get in touch
      1. Reviews
  2. A Brief Introduction to C++
    1. Why C++?
      1. Zero-cost abstractions
        1. Programming languages and machine code abstractions
        2. Abstractions in other languages
      2. Portability
      3. Robustness
      4. C++ of today
    2. The aim of this book
      1. Expected knowledge of the reader
    3. C++ compared with other languages
      1. Competing languages and performance
      2. Non-performance-related C++ language features
        1. Value semantics
        2. Const correctness
      3. Object ownership and garbage collection in C++
      4. Avoiding null objects using C++ references
      5. Drawbacks of C++
    4. Class interfaces and exceptions
      1. Strict class interfaces
      2. Error handling and resource acquisition
        1. Preserving the valid state
        2. Resource acquisition
        3. Exceptions versus error codes
    5. Libraries used in this book
    6. Summary
  3. Modern C++ Concepts
    1. Automatic type deduction with the auto keyword 
      1. Using auto in function signatures
      2. Using auto for variables
        1. Const reference
        2. Mutable reference
        3. Forwarding reference
        4. Conclusion
    2. The lambda function
      1. Basic syntax of a C++ lambda function
      2. The capture block
        1. Capture by reference versus capture by value
        2. Similarities between a Lambda and a class
          1. Initializing variables in capture
        3. Mutating lambda member variables
          1. Mutating member variables from the compiler's perspective
        4. Capture all
        5. Assigning C function pointers to lambdas
      3. Lambdas and std::function
        1. Assigning lambdas to std::functions
        2. Implementing a simple Button class with std::function
        3. Performance consideration of std::function
          1. An std::function cannot be inlined
          2. An std::function heap allocates and captures variables
          3. Invoking an std::function requires a few more operations than a lambda
      4. The polymorphic lambda
        1. Creating reusable polymorphic lambdas
    3. Const propagation for pointers
    4. Move semantics explained
      1. Copy-construction, swap, and move
        1. Copy-constructing an object
          1. Swapping two objects
          2. Move-constructing an object
      2. Resource acquisition and the rule of three
        1. Implementing the rule of three
          1. Constructor
        2. Limitations of the rule of three
        3. Avoiding copies without move semantics
      3. Introducing move semantics
      4. Named variables and r-values
        1. Accept arguments by move when applicable
      5. Default move semantics and the rule of zero
        1. Rule of zero in a real code base
          1. A note on empty destructors
        2. A common pitfall - moving non-resources
        3. Applying the && modifier to class member functions
    5. Representing optional values with std::optional
      1. Optional return values
      2. Optional member variables
      3. Sorting and comparing std::optional
    6. Representing dynamic values with std::any
      1. Performance of std::any
    7. Summary
  4. Measuring Performance
    1. Asymptotic complexity and big O notation
      1. Growth rates
      2. Amortized time complexity
    2. What to measure?
      1. Performance properties
      2. Performance testing – best practices
    3. Knowing your code and hot spots
      1. Profilers
        1. Instrumentation profilers
        2. Sampling profilers
    4. Summary
  5. Data Structures
    1. Properties of computer memory
    2. STL containers
      1. Sequence containers
        1. Vector and array
        2. Deque
        3. List and forward_list
        4. The basic_string
      2. Associative containers
        1. Ordered sets and maps
        2. Unordered sets and maps
          1. Hash and equals
          2. Hash policy
      3. Container adaptors
        1. Priority queues
    3. Parallel arrays
    4. Summary
  6. A Deeper Look at Iterators
    1. The iterator concept
      1. Iterator categories
      2. Pointer-mimicking syntax
      3. Iterators as generators
      4. Iterator traits
        1. Implementing a function using iterator categories
        2. Extending the IntIterator to bidirectional
      5. Practical example – iterating floating point values within a range
        1. Illustrated usage examples
        2. Utility functions
        3. How to construct a linear range iterator
          1. Iterator usage example
        4. Generalizing the iterator pair to a range
          1. The make_linear_range convenience function
        5. Linear range usage examples
    2. Summary
  7. STL Algorithms and Beyond
    1. Using STL algorithms as building blocks
      1. STL algorithm concepts
        1. Algorithms operate on iterators
        2. Implementing a generic algorithm that can be used with any container
        3. Iterators for a range point to the first element and the element after the last
        4. Algorithms do not change the size of the container
        5. Algorithms with output require allocated data
        6. Algorithms use operator== and operator< by default
          1. Custom comparator function
          2. General-purpose predicates
        7. Algorithms require move operators not to throw
        8. Algorithms have complexity guarantees
        9. Algorithms perform just as well as C library function equivalents
      2. STL algorithms versus handcrafted for-loops
        1. Readability and future-proofing
          1. Real-world code base example
        2. Usage examples of STL algorithms versus handcrafted for-loops
          1. Example 1 – Unfortunate exceptions and performance problems
          2. Example 2 – STL has subtle optimizations even in simple algorithms
        3. Sorting only for the data you need to retrieve
          1. Use cases
          2. Performance evaluation
    2. The future of STL and the ranges library
      1. Limitations of the iterators in STL
      2. Introduction to the ranges library
        1. Composability and pipeability
      3. Actions, views, and algorithms
        1. Actions
        3. Algorithms
    3. Summary
  8. Memory Management
    1. Computer memory
      1. The virtual address space
      2. Memory pages
      3. Thrashing
    2. Process memory
      1. Stack memory
      2. Heap memory
    3. Objects in memory
      1. Creating and deleting objects
        1. Placement new
        2. The new and delete operators
      2. Memory alignment
      3. Padding
    4. Memory ownership
      1. Handling resources implicitly
      2. Containers
      3. Smart pointers
        1. Unique pointer
        2. Shared pointer
        3. Weak pointer
    5. Small size optimization
    6. Custom memory management
      1. Building an arena
      2. A custom memory allocator
    7. Summary
  9. Metaprogramming and Compile-Time Evaluation
    1. Introduction to template metaprogramming
      1. Using integers as template parameters
      2. How the compiler handles a template function
      3. Using static_assert to trigger errors at compile time
    2. Type traits
      1. Type trait categories
      2. Using type traits
      3. Receiving the type of a variable with decltype
      4. Conditionally enable functions based on types with std::enable_if_t
      5. Introspecting class members with std::is_detected
        1. Usage example of is_detected and enable_if_t combined
    3. The constexpr keyword
      1. Constexpr functions in a runtime context
      2. Verify compile-time computation using std::integral_constant
      3. The if constexpr statement
        1. Comparison with runtime polymorphism
        2. Example of generic modulus function using if constexpr
    4. Heterogeneous containers
      1. Static-sized heterogenous containers
        1. The std::tuple container
          1. Accessing the members of a tuple
          2. Iterating std::tuple
          3. Unrolling the tuple
          4. Implementing other algorithms for tuples
        2. Accessing tuple elements
          1. Structured bindings
        3. The variadic template parameter pack
          1. An example of a function with variadic number of arguments
          2. How to construct a variadic parameter pack
      2. Dynamic-sized heterogenous containers
        1. Using std::any as the base for a dynamic-size heterogenous container
    5. The std::variant
      1. Visiting variants
      2. Heterogenous container of variants
      3. Accessing the values in our variant container
      4. Global function std::get
    6. Real world examples of metaprogramming
      1. Example 1 – Reflection
        1. Making a class reflect its members
        2. C++ libraries which simplifies reflection
        3. Using the reflection
          1. Evaluating the assembler output of the reflection
      2. Conditionally overloading global functions
        1. Testing reflection capabilities
      3. Example 2 – Creating a generic safe cast function
      4. Example 3 – Hash strings at compile time
        1. The advantages of compile-time hash sum calculation
        2. Implement and verify a compile-time hash function
        3. Constructing a PrehashedString class
        4. Forcing PrehashedString to only accept compile time string literals
        5. Evaluating PrehashedString
        6. Evaluating get_bitmap_resource() with PrehashedString
    7. Summary
  10. Proxy Objects and Lazy Evaluation
    1. An introduction to lazy evaluation and proxy objects
      1. Lazy versus eager evaluation
    2. Proxy objects
      1. Comparing concatenated strings using a proxy
      2. Implementing the proxy
        1. Performance evaluation
      3. The r-value modifier
      4. Assigning a concatenated proxy
    3. Postponing an sqrt computation when comparing distances
      1. A simple two-dimensional point class
      2. The underlying mathematics
      3. Implementing the DistProxy object
      4. Expanding DistProxy to something more useful
      5. Comparing distances with DistProxy
      6. Calculating distances with DistProxy
        1. Preventing the misuse of DistProxy
      7. Performance evaluation
    4. Creative operator overloading and proxy objects
      1. The pipe operator as an extension method
        1. The pipe operator
      2. The infix operator
      3. Further reading
    5. Summary
  11. Concurrency
    1. Understanding the basics of concurrency
    2. What makes concurrent programming hard?
    3. Concurrency and parallelism
      1. Time slicing
      2. Shared memory
      3. Data races
      4. Mutex
      5. Deadlock
      6. Synchronous and asynchronous tasks
    4. Concurrent programming in C++
      1. The thread support library
        1. Threads
        2. Thread states
        3. Protecting critical sections
        4. Avoiding deadlocks
        5. Condition variables
        6. Returning data and handling errors
        7. Tasks
      2. Atomic support in C++
        1. Using shared_ptr in a multithreaded environment
      3. C++ memory model
        1. Instruction reordering
        2. Atomics and memory orders
    5. Lock-free programming
      1. Lock-free queue example
    6. Performance guidelines
      1. Avoid contention
      2. Avoid blocking operations
      3. Number of threads/CPU cores
      4. Thread priorities
      5. Thread affinity
      6. False sharing
    7. Summary
  12. Parallel STL
    1. Importance of parallelism
    2. Parallel algorithms
      1. Implementing parallel std::transform()
        1. Naive implementation
          1. Performance evaluation
        2. Shortcomings of the naive implementation
        3. Divide and conquer
          1. Implementation
          2. Performance evaluation
      2. Implementing parallel std::count_if
      3. Implementing parallel std::copy_if
        1. Approach one – Use a synchronized write position
          1. Inner function
          2. Outer function
        2. Approach two – Split algorithm into two parts
          1. Part one – Copy elements in parallel into the destination range
          2. Part two – Move the sparse range sequentially into a continuous range
        3. Performance evaluation
    3. Parallel STL
      1. Execution policies
        1. Sequenced policy
        2. Parallel policy
        3. Parallel unsequenced policy
      2. Parallel modifications of algorithm
        1. std::accumulate and std::reduce
          1. std::transform_reduce
        2. std::for_each
      3. Parallelizing an index-based for-loop
        1. Combining std::for_each with linear range
        2. Simplifying construction via a wrapper
    4. Executing STL algorithms on the GPU
      1. GPU APIs and parallel operations
        1. Programmable GPUs
        2. Shader programs
      2. STL algorithms and the GPU
    5. Boost Compute
      1. Basic concepts of Boost Compute
      2. OpenCL
      3. Initializing Boost Compute
      4. Transfer a simple transform-reduce algorithm to Boost Compute
        1. The algorithm in standard C++
        2. Transforming the algorithm to Boost Compute
          1. Adapting the circle struct for use with Boost Compute
          2. Converting circle_area_cpu to Boost Compute
          3. The BOOST_COMPUTE_FUNCTION macro
          4. Implementing the transform-reduction algorithm on the GPU
      5. Using predicates with Boost Compute
      6. Using a custom kernel in Boost Compute
        1. Box filter
        2. Implementing the kernel
        3. Parallelizing for two dimensions
        4. Verify GPU computation on the CPU
      7. Summary
  13. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think