High Performance Python

Book description

Your Python code may run correctly, but you need it to run faster. By exploring the fundamental theory behind design choices, this practical guide helps you gain a deeper understanding of Python’s implementation. You’ll learn how to locate performance bottlenecks and significantly speed up your code in high-data-volume programs.

How can you take advantage of multi-core architectures or clusters? Or build a system that can scale up and down without losing reliability? Experienced Python programmers will learn concrete solutions to these and other issues, along with war stories from companies that use high performance Python for social media analytics, productionized machine learning, and other situations.

  • Get a better grasp of numpy, Cython, and profilers
  • Learn how Python abstracts the underlying computer architecture
  • Use profiling to find bottlenecks in CPU time and memory usage
  • Write efficient programs by choosing appropriate data structures
  • Speed up matrix and vector computations
  • Use tools to compile Python down to machine code
  • Manage multiple I/O and computational operations concurrently
  • Convert multiprocessing code to run on a local or remote cluster
  • Solve large problems while using less RAM

Publisher resources

View/Submit Errata

Table of contents

  1. Preface
    1. Who This Book Is For
    2. Who This Book Is Not For
    3. What You’ll Learn
    4. Python 2.7
    5. Moving to Python 3
    6. License
    7. How to Make an Attribution
    8. Errata and Feedback
    9. Conventions Used in This Book
    10. Using Code Examples
    11. Safari® Books Online
    12. How to Contact Us
    13. Acknowledgments
  2. 1. Understanding Performant Python
    1. The Fundamental Computer System
      1. Computing Units
      2. Memory Units
      3. Communications Layers
    2. Putting the Fundamental Elements Together
      1. Idealized Computing Versus the Python Virtual Machine
        1. Idealized computing
        2. Python’s virtual machine
    3. So Why Use Python?
  3. 2. Profiling to Find Bottlenecks
    1. Profiling Efficiently
    2. Introducing the Julia Set
    3. Calculating the Full Julia Set
    4. Simple Approaches to Timing—print and a Decorator
    5. Simple Timing Using the Unix time Command
    6. Using the cProfile Module
    7. Using runsnakerun to Visualize cProfile Output
    8. Using line_profiler for Line-by-Line Measurements
    9. Using memory_profiler to Diagnose Memory Usage
    10. Inspecting Objects on the Heap with heapy
    11. Using dowser for Live Graphing of Instantiated Variables
    12. Using the dis Module to Examine CPython Bytecode
      1. Different Approaches, Different Complexity
    13. Unit Testing During Optimization to Maintain Correctness
      1. No-op @profile Decorator
    14. Strategies to Profile Your Code Successfully
    15. Wrap-Up
  4. 3. Lists and Tuples
    1. A More Efficient Search
    2. Lists Versus Tuples
      1. Lists as Dynamic Arrays
      2. Tuples As Static Arrays
    3. Wrap-Up
  5. 4. Dictionaries and Sets
    1. How Do Dictionaries and Sets Work?
      1. Inserting and Retrieving
      2. Deletion
      3. Resizing
      4. Hash Functions and Entropy
    2. Dictionaries and Namespaces
    3. Wrap-Up
  6. 5. Iterators and Generators
    1. Iterators for Infinite Series
    2. Lazy Generator Evaluation
    3. Wrap-Up
  7. 6. Matrix and Vector Computation
    1. Introduction to the Problem
    2. Aren’t Python Lists Good Enough?
      1. Problems with Allocating Too Much
    3. Memory Fragmentation
      1. Understanding perf
      2. Making Decisions with perf’s Output
      3. Enter numpy
    4. Applying numpy to the Diffusion Problem
      1. Memory Allocations and In-Place Operations
      2. Selective Optimizations: Finding What Needs to Be Fixed
    5. numexpr: Making In-Place Operations Faster and Easier
    6. A Cautionary Tale: Verify “Optimizations” (scipy)
    7. Wrap-Up
  8. 7. Compiling to C
    1. What Sort of Speed Gains Are Possible?
    2. JIT Versus AOT Compilers
    3. Why Does Type Information Help the Code Run Faster?
    4. Using a C Compiler
    5. Reviewing the Julia Set Example
    6. Cython
      1. Compiling a Pure-Python Version Using Cython
      2. Cython Annotations to Analyze a Block of Code
      3. Adding Some Type Annotations
    7. Shed Skin
      1. Building an Extension Module
      2. The Cost of the Memory Copies
    8. Cython and numpy
      1. Parallelizing the Solution with OpenMP on One Machine
    9. Numba
    10. Pythran
    11. PyPy
      1. Garbage Collection Differences
      2. Running PyPy and Installing Modules
    12. When to Use Each Technology
      1. Other Upcoming Projects
      2. A Note on Graphics Processing Units (GPUs)
      3. A Wish for a Future Compiler Project
    13. Foreign Function Interfaces
      1. ctypes
      2. cffi
      3. f2py
      4. CPython Module
    14. Wrap-Up
  9. 8. Concurrency
    1. Introduction to Asynchronous Programming
    2. Serial Crawler
    3. gevent
    4. tornado
    5. AsyncIO
    6. Database Example
    7. Wrap-Up
  10. 9. The multiprocessing Module
    1. An Overview of the Multiprocessing Module
    2. Estimating Pi Using the Monte Carlo Method
    3. Estimating Pi Using Processes and Threads
      1. Using Python Objects
      2. Random Numbers in Parallel Systems
      3. Using numpy
    4. Finding Prime Numbers
      1. Queues of Work
        1. Asynchronously adding jobs to the Queue
    5. Verifying Primes Using Interprocess Communication
      1. Serial Solution
      2. Naive Pool Solution
      3. A Less Naive Pool Solution
      4. Using Manager.Value as a Flag
      5. Using Redis as a Flag
      6. Using RawValue as a Flag
      7. Using mmap as a Flag
      8. Using mmap as a Flag Redux
    6. Sharing numpy Data with multiprocessing
    7. Synchronizing File and Variable Access
      1. File Locking
      2. Locking a Value
    8. Wrap-Up
  11. 10. Clusters and Job Queues
    1. Benefits of Clustering
    2. Drawbacks of Clustering
      1. $462 Million Wall Street Loss Through Poor Cluster Upgrade Strategy
      2. Skype’s 24-Hour Global Outage
    3. Common Cluster Designs
    4. How to Start a Clustered Solution
    5. Ways to Avoid Pain When Using Clusters
    6. Three Clustering Solutions
      1. Using the Parallel Python Module for Simple Local Clusters
      2. Using IPython Parallel to Support Research
    7. NSQ for Robust Production Clustering
      1. Queues
      2. Pub/sub
      3. Distributed Prime Calculation
    8. Other Clustering Tools to Look At
    9. Wrap-Up
  12. 11. Using Less RAM
    1. Objects for Primitives Are Expensive
      1. The Array Module Stores Many Primitive Objects Cheaply
    2. Understanding the RAM Used in a Collection
    3. Bytes Versus Unicode
    4. Efficiently Storing Lots of Text in RAM
      1. Trying These Approaches on 8 Million Tokens
        1. list
        2. set
        3. More efficient tree structures
        4. Directed acyclic word graph (DAWG)
        5. Marisa trie
        6. Datrie
        7. HAT trie
        8. Using tries (and DAWGs) in production systems
    5. Tips for Using Less RAM
    6. Probabilistic Data Structures
      1. Very Approximate Counting with a 1-byte Morris Counter
      2. K-Minimum Values
      3. Bloom Filters
      4. LogLog Counter
      5. Real-World Example
  13. 12. Lessons from the Field
    1. Adaptive Lab’s Social Media Analytics (SoMA)
      1. Python at Adaptive Lab
      2. SoMA’s Design
      3. Our Development Methodology
      4. Maintaining SoMA
      5. Advice for Fellow Engineers
    2. Making Deep Learning Fly with RadimRehurek.com
      1. The Sweet Spot
      2. Lessons in Optimizing
      3. Wrap-Up
    3. Large-Scale Productionized Machine Learning at Lyst.com
      1. Python’s Place at Lyst
      2. Cluster Design
      3. Code Evolution in a Fast-Moving Start-Up
      4. Building the Recommendation Engine
      5. Reporting and Monitoring
      6. Some Advice
    4. Large-Scale Social Media Analysis at Smesh
      1. Python’s Role at Smesh
      2. The Platform
      3. High Performance Real-Time String Matching
      4. Reporting, Monitoring, Debugging, and Deployment
    5. PyPy for Successful Web and Data Processing Systems
      1. Prerequisites
      2. The Database
      3. The Web Application
      4. OCR and Translation
      5. Task Distribution and Workers
      6. Conclusion
    6. Task Queues at Lanyrd.com
      1. Python’s Role at Lanyrd
      2. Making the Task Queue Performant
      3. Reporting, Monitoring, Debugging, and Deployment
      4. Advice to a Fellow Developer
  14. Index
  15. Colophon
  16. Copyright

Product information

  • Title: High Performance Python
  • Author(s): Micha Gorelick, Ian Ozsvald
  • Release date: September 2014
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781449361594