Cover image for High Performance Python

Book description

Your Python code may run correctly, but you need it to run faster. By exploring the fundamental theory behind design choices, this practical guide helps you gain a deeper understanding of Python’s implementation. You’ll learn how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs. This book includes war stories from companies that use high performance Python for social media analytics, productionized machine learning, and other situations.

Table of Contents

  1. Preface
    1. Who This Book is For
    2. Who This Book is Not For
    3. What You’ll Learn
    4. We focus on Python 2.7 on 64-bit *nix Systems
    5. Moving to Python 3
    6. License
    7. How to Make an Attribution
    8. Errata and Feedback
    9. Resources
    10. Conventions Used in This Book
    11. Using Code Examples
    12. Safari® Books Online
    13. How to Contact Us
    14. Acknowledgements
  2. 1. Understanding Performant Python
    1. Fundamental Computer System
      1. Computing Units
      2. Memory Units
      3. Communications Layers
    2. Putting the Fundamental Elements Together
      1. Idealized computing vs Python VM
        1. Idealized Computing
        2. Python’s Virtualized Machine
    3. So Why Use Python?
  3. 2. Profiling to find bottlenecks
    1. Profiling efficiently
    2. Introducing the Julia Set
    3. Calculating the full Julia Set
    4. Simple approaches to timing - print and a decorator
    5. Simple timing using the Unix time command
    6. Using the cProfile module
    7. runsnakerun to visualise cProfile output
    8. line_profiler for line-by-line measurements
    9. memory_profiler for diagnosing memory usage
    10. Inspecting objects on the heap with heapy
    11. Dowser for live graphing of instantiated variables
    12. The dis module to examine CPython bytecode
      1. Different approaches, different complexity
    13. Unit testing during optimization to maintain correctness
      1. No-op @profile decorator
    14. Strategies to profile your code successfully
  4. 3. Lists and Tuples
    1. A more efficient search
    2. Lists vs Tuples
      1. Lists as dynamic arrays
      2. Tuples as static arrays
    3. Wrap Up
  5. 4. Dictionaries and Sets
    1. How do dictionaries and sets work?
      1. Inserting and Retrieving
      2. Deletion
      3. Resizing
      4. Hash functions and Entropy
    2. Dictionaries and Namespaces
    3. Wrap Up
  6. 5. Iterators and Generators
    1. Iterators for Infinite Series
    2. Lazy Generator Evaluation
    3. Wrap Up
  7. 6. Matrix and Vector Computation
    1. Introduction to the Problem
    2. Aren’t python lists good enough?
      1. Problems with allocating too much
    3. Memory Fragmentation
      1. Understanding
      2. Making decisions with
    4. Enter numpy
      1. Memory Allocations and In-place Operations
      2. Selective optimizations: finding what needs to be fixed
    5. numexpr: making inplace operations faster and easier
    6. A Cautionary Tale: Verify “optimizations” (scipy)
    7. Wrap Up
  8. 7. Compiling to C
    1. What sort of speed gains are possible?
    2. JITs vs Compilers
    3. Why does type information help the code run faster?
    4. Using a C compiler
    5. Reviewing the Julia Set example
    6. Cython
      1. Compiling a pure-Python version using Cython
      2. Cython annotations to analyse a block of code
      3. Adding some type annotations
    7. Shed Skin
      1. Building an extension module
      2. The cost of the memory copies
    8. Cython and numpy
      1. Parallelizing the solution with OpenMP on One Machine
    9. Numba
    10. Pythran
    11. PyPy
      1. Garbage Collection differences
      2. Running PyPy and installing modules
    12. When to use each technology
      1. Other upcoming projects
      2. A note on Graphics Processing Units (GPUs)
      3. A wish for a future compiler project
    13. Foreign function interfaces
      1. ctypes
      2. CFFI
      3. f2py
      4. cpython module
    14. Wrap Up
  9. 8. Concurrency
    1. Introduction to Async
    2. Serial Crawler
    3. Gevent
    4. Tornado
    5. AsyncIO
    6. Database Example
    7. Wrap Up
  10. 9. The multiprocessing module
    1. An overview of the multiprocessing module
    2. Estimating Pi using the Monte Carlo method
    3. Estimating Pi using Processes and Threads
      1. Using Python objects
      2. Random Numbers in Parallel Systems
      3. Using numpy
    4. Finding Prime Numbers
      1. Queues of work
        1. Asynchronously adding jobs to the Queue
    5. Verifying Primes using Inter Process Communication
      1. Serial verification is inefficient
      2. Naive Pool solution
      3. A Less Naive Pool solution
      4. Using Manager.Value as a flag
      5. Using Redis as a flag
      6. Using RawValue as a flag
      7. Using mmap as a flag
      8. Using mmap as a flag redux
    6. Sharing numpy data with multiprocessing
    7. Synchronizing File and Variable Access
      1. File locking
      2. Locking a Value
    8. Summary
  11. 10. Clusters and Job Queues
    1. Benefits of clustering
    2. Clusters can introduce more pain than you might expect
      1. $462 Million Wall Street loss through poor cluster upgrade strategy
      2. Skype’s 24 hour global outage
    3. Common cluster designs
    4. How to start a clustered solution
    5. Ways to avoid pain when using clusters
    6. Three clustering solutions
    7. Using the ParallelPython module for simple local clusters
    8. Using IPython Parallel to support research
    9. NSQ for robust production clustering
      1. Queues
      2. Pub/Sub
      3. Distributed Prime Calculation
    10. Other clustering tools to look at
  12. 11. Using Less RAM
    1. Objects for primitives are
      1. The
    2. Understanding the RAM used in a collection
    3. Bytes vs Unicode
    4. Efficiently storing lots of text in RAM
      1. Trying these approaches on 8 million tokens
      2. list
      3. set
      4. More Efficient Tree Structures
      5. Directed Acyclic Word Graph (DAWG)
      6. Marisa trie
      7. datrie
      8. HAT Trie
      9. Using Tries in production systems
    5. Tips for using less RAM
    6. Probabilistic data structures
      1. Very approximate counting with a 1 byte Morris Counter
      2. K-Min Values
      3. Bloom Filter
      4. LogLog Counter
      5. Real World Example
  13. 12. Lessons from the Field
    1. AdpativeLab for Social Media Analytics (SoMA)
      1. Python at Adaptive Lab
      2. SoMA’s Design
      3. Our Development Methodology
      4. Maintaining SoMA
      5. Advice for Fellow Engineers
    2. Making deep learning fly with RadimRehurek.com
      1. The Sweet Spot
      2. Lessons in Optimizing
      3. Wrap up
    3. Large scale productionized machine learning at Lyst.com
      1. Python’s place at Lyst
      2. Cluster design
      3. Code evolution in a fast moving start-up
      4. Building the recommendation engine
      5. Reporting and Monitoring
      6. Some advice
    4. Large Scale Social Media Analysis at Sme.sh
      1. Python’s role at Smesh
      2. The Platform
      3. High performance real-time string matching
      4. Reporting, monitoring, debugging and deployment
    5. PyPy for successful web and data processing systems
      1. Introduction
      2. Prerequisites
      3. Database
      4. Web Application
      5. OCR and Translation
      6. Task Distribution and Workers
      7. Conclusion
    6. Task queues at Lanyrd.com
      1. Python’s role at Lanyrd
      2. Making the task queue performant
      3. Reporting, monitoring, debugging and deployment
      4. Advice to a fellow developer
  14. About the Authors
  15. Copyright