You are previewing PostgreSQL 9.0 High Performance.
O'Reilly logo
PostgreSQL 9.0 High Performance

Book Description

If you’re an intermediate to advanced database administrator, this book is the shortcut to optimizing and troubleshooting your PostgreSQL database. With a balanced mix of theory and practice, it will quickly hone your expertise.

  • Learn the right techniques to obtain optimal PostgreSQL database performance, from initial design to routine maintenance

  • Discover the techniques used to scale successful database installations

  • Avoid the common pitfalls that can slow your system down

  • Filled with advice about what you should be doing; how to build experimental databases to explore performance topics, and then move what you've learned into a production database environment

  • Covers versions 8.1 through 9.0

  • In Detail

    PostgreSQL database servers have a common set of problems they encounter as their usage gets heavier and requirements more demanding. You could spend years discovering solutions to them all, step by step as you encounter them. Or you can just look in here.

    All successful database applications are destined to eventually run into issues scaling up their performance. Peek into the future of your PostgreSQL database's problems today. Know the warning signs to look for, and how to avoid the most common issues before they even happen.

    Surprisingly, most PostgreSQL database applications evolve in the same way: Choose the right hardware. Tune the operating system and server memory use. Optimize queries against the database, with the right indexes. Monitor every layer, from hardware to queries, using some tools that are inside PostgreSQL and others that are external.

    Using monitoring insight, continuously rework the design and configuration. On reaching the limits of a single server, break things up; connection pooling, caching, partitioning, and replication can all help handle increasing database workloads.

    The path to a high performance database system isn't always easy. But it doesn't have to be mysterious with the right guide.

    A clear, step-by-step guide to optimizing and scaling up PostgreSQL database servers

    Table of Contents

    1. PostgreSQL 9.0 High Performance
      1. PostgreSQL 9.0 High Performance
      2. Credits
      3. About the Author
      4. About the Reviewers
      5. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Errata
          2. Piracy
          3. Questions
      6. 1. PostgreSQL Versions
        1. Performance of historical PostgreSQL releases
          1. Choosing a version to deploy
          2. Upgrading to a newer major version
            1. Upgrades to PostgreSQL 8.3+ from earlier ones
            2. Minor version upgrades
        2. PostgreSQL or another database?
        3. PostgreSQL tools
          1. PostgreSQL contrib
            1. Finding contrib modules on your system
            2. Installing a contrib module from source
            3. Using a contrib module
          2. pgFoundry
          3. Additional PostgreSQL-related software
        4. PostgreSQL application scaling lifecycle
        5. Performance tuning as a practice
        6. Summary
      7. 2. Database Hardware
        1. Balancing hardware spending
          1. CPUs
          2. Memory
          3. Disks
            1. RAID
            2. Drive error handling
            3. Hard drive reliability studies
            4. Drive firmware and RAID
            5. SSDs
          4. Disk controllers
            1. Hardware and Software RAID
            2. Recommended disk controllers
            3. Attached storage—SAN and NAS
        2. Reliable controller and disk setup
          1. Write-back caches
            1. Sources of write-back caching
            2. Disk controller monitoring
            3. Disabling drive write caches
          2. Performance impact of write-through caching
        3. Summary
      8. 3. Database Hardware Benchmarking
        1. CPU and memory benchmarking
          1. memtest86+
          2. STREAM memory testing
            1. STREAM and Intel vs. AMD
          3. CPU benchmarking
          4. Sources of slow memory and processors
        2. Physical disk performance
          1. Random access and I/Os Per Second
          2. Sequential access and ZCAV
            1. Short stroking
          3. Commit rate
            1. PostgreSQL test_fsync
            2. INSERT rate
            3. Windows commit rate
        3. Disk benchmarking tools
          1. hdtune
            1. Short stroking tests
            2. IOPS
            3. Unpredictable performance and Windows
          2. dd
          3. bonnie++
            1. bonnie++ 2.0
            2. bonnie++ ZCAV
          4. sysbench
            1. Seek rate
            2. fsync commit rate
          5. Complicated disk benchmarks
        4. Sample disk results
          1. Disk performance expectations
            1. Sources of slow disk and array performance
        5. Summary
      9. 4. Disk Setup
        1. Maximum filesystem sizes
        2. Filesystem crash recovery
          1. Journaling filesystems
        3. Linux filesystems
          1. ext2
          2. ext3
          3. ext4
          4. XFS
          5. Other Linux filesystems
          6. Write barriers
            1. Drive support for barriers
            2. Filesystem support for barriers
          7. General Linux filesystem tuning
            1. Read-ahead
            2. File access times
            3. Read caching and swapping
            4. Write cache sizing
            5. I/O scheduler elevator
        4. Solaris and FreeBSD filesystems
          1. Solaris UFS
          2. FreeBSD UFS2
          3. ZFS
        5. Windows filesystems
          1. FAT32
          2. NTFS
            1. Adjusting mounting behaviour
        6. Disk layout for PostgreSQL
          1. Symbolic links
          2. Tablespaces
          3. Database directory tree
            1. Temporary files
          4. Disk arrays, RAID, and disk layout
            1. Disk layout guidelines
        7. Summary
      10. 5. Memory for Database Caching
          1. Memory units in the postgresql.conf
          2. Increasing UNIX shared memory parameters for larger buffer sizes
            1. Kernel semaphores
            2. Estimating shared memory allocation
        1. Inspecting the database cache
          1. Installing pg_buffercache into a database
          2. Database disk layout
          3. Creating a new block in a database
          4. Writing dirty blocks to disk
        2. Crash recovery and the buffer cache
          1. Checkpoint processing basics
          2. Write-ahead log and recovery processing
          3. Checkpoint timing
            1. Checkpoint spikes
            2. Spread checkpoints
          4. Database block lifecycle
            1. Dirty block write paths
        3. Database buffer cache versus operating system cache
          1. Doubly cached data
            1. Inspecting the OS cache
          2. Checkpoint overhead
          3. Starting size guidelines
            1. Platform, version, and workload limitations
        4. Analyzing buffer cache contents
          1. Inspection of the buffer cache queries
            1. Top relations in the cache
            2. Summary by usage count
            3. Buffer contents summary, with percentages
            4. Buffer usage count distribution
          2. Using buffer cache inspection for sizing feedback
        5. Summary
      11. 6. Server Configuration Tuning
        1. Interacting with the live configuration
          1. Defaults and reset values
          2. Allowed change context
          3. Reloading the configuration file
            1. Commented out settings
        2. Server-wide settings
          1. Database connections
            1. listen_addresses
            2. max_connections
          2. Shared memory
            1. shared_buffers
            2. Free space map (FSM) settings
          3. Logging
            1. log_line_prefix
            2. log_statement
            3. log_min_duration_statement
          4. Vacuuming and statistics
            1. autovacuum
            2. Enabling autovacuum on older versions
            3. maintainance_work_mem
            4. default_statistics_target
          5. Checkpoints
            1. checkpoint_segments
            2. checkpoint_timeout
            3. checkpoint_completion_target
          6. WAL settings
            1. wal_buffers
            2. wal_sync_method
          7. PITR and WAL Replication
        3. Per-client settings
            1. effective_cache_size
            2. synchronous_commit
            3. work_mem
            4. random_page_cost
            5. constraint_exclusion
          1. Tunables to avoid
            1. fsync
            2. full_page_writes
            3. commit_delay and commit_siblings
            4. max_prepared_transactions
            5. Query enable parameters
        4. New server tuning
        5. Dedicated server guidelines
        6. Shared server guidelines
        7. pgtune
        8. Summary
      12. 7. Routine Maintenance
        1. Transaction visibility with multiversion concurrency control
          1. Visibility computation internals
          2. Updates
          3. Row lock conflicts
            1. Serialization
          4. Deletions
          5. Advantages of MVCC
          6. Disadvantages of MVCC
          7. Transaction ID wraparound
        2. Vacuum
          1. Vacuum Implementation
            1. Regular vacuum
            2. Returning free disk space
            3. Full vacuum
            4. HOT
          2. Cost-based vacuuming
          3. autovacuum
            1. autovacuum logging
            2. autovacuum monitoring
            3. autovacuum triggering
            4. Per-table adjustments
          4. Common vacuum and autovacuum problems
            1. autovacuum is running even though it was turned off
            2. autovacuum is constantly running
            3. Out of memory errors
            4. Not keeping up on a busy server
            5. autovacuum is too disruptive
            6. Long running transactions
            7. Free Space Map exhaustion
            8. Recovering from major problems
        3. Autoanalyze
        4. Index bloat
          1. Measuring index bloat
        5. Detailed data and index page monitoring
        6. Monitoring query logs
          1. Basic PostgreSQL log setup
            1. Log collection
            2. log_line_prefix
            3. Multi-line queries
            4. Using syslog for log messages
            5. CSV logging
          2. Logging difficult queries
            1. auto_explain
          3. Log file analysis
            1. Normalized query fingerprints
            2. pg_stat_statements
            3. pgFouine
            4. PQA
            5. EPQA
            6. pgsi
            7. mk-query-digest
        7. Summary
      13. 8. Database Benchmarking
        1. pgbench default tests
          1. Table definition
          2. Scale detection
          3. Query script definition
          4. Configuring the database server for pgbench
            1. Sample server configuration
        2. Running pgbench manually
        3. Graphing results with pgbench-tools
          1. Configuring pgbench-tools
            1. Customizing for 8.3
        4. Sample pgbench test results
          1. SELECT-only test
          2. TPC-B-like test
          3. Latency analysis
        5. Sources for bad results and variation
            1. Developer PostgreSQL builds
            2. Worker threads and pgbench program limitations
        6. pgbench custom tests
          1. Insert speed test
        7. Transaction Processing Performance Council benchmarks
        8. Summary
      14. 9. Database Indexing
        1. Indexing example walkthrough
          1. Measuring query disk and index block statistics
          2. Running the example
          3. Sample data setup
          4. Simple index lookups
          5. Full table scans
          6. Index creation
          7. Lookup with an inefficient index
          8. Combining indexes
          9. Switching from indexed to sequential scans
            1. Planning for plan changes
          10. Clustering against an index
          11. Explain with buffer counts
        2. Index creation and maintenance
          1. Unique indexes
          2. Concurrent index creation
          3. Clustering an index
            1. Fill factor
          4. Reindexing
        3. Index types
          1. B-tree
            1. Text operator classes
          2. Hash
          3. GIN
          4. GiST
        4. Advanced index use
          1. Multicolumn indexes
          2. Indexes for sorting
          3. Partial indexes
          4. Expression-based indexes
          5. Indexing for full-text search
        5. Summary
      15. 10. Query Optimization
        1. Sample data sets
          1. Pagila
          2. Dell Store 2
        2. EXPLAIN basics
          1. Timing overhead
          2. Hot and cold cache behavior
            1. Clearing the cache
        3. Query plan node structure
          1. Basic cost computation
            1. Estimated costs and real world costs
        4. Explain analysis tools
          1. Visual explain
          2. Verbose output
          3. Machine readable explain output
          4. Plan analysis tools
        5. Assembling row sets
          1. Tuple id
            1. Object id
          2. Sequential scan
          3. Index scan
          4. Bitmap heap and index scans
        6. Processing nodes
          1. Sort
          2. Limit
            1. Offsets
          3. Aggregate
          4. HashAggregate
          5. Unique
            1. WindowAgg
          6. Result
          7. Append
          8. Group
          9. Subquery Scan and Subplan
            1. Subquery conversion and IN lists
          10. Set operations
          11. Materialize
          12. CTE Scan
        7. Joins
          1. Nested loop
            1. Nested loop with inner Index Scan
          2. Merge Join
            1. Nested loop and Merge Join materialization
          3. Hash Joins
            1. Hash semi and anti joins
            2. Join ordering
            3. Forcing join order
            4. Join removal
            5. Genetic query optimizer
        8. Statistics
          1. Viewing and estimating with statistics
          2. Statistics targets
            1. Adjusting a column target
            2. Distinct values
          3. Difficult areas to estimate
        9. Other query planning parameters
          1. effective_cache_size
          2. work_mem
          3. constraint_exclusion
          4. cursor_tuple_fraction
        10. Executing other statement types
        11. Improving queries
          1. Optimizing for fully cached data sets
          2. Testing for query equivalence
          3. Disabling optimizer features
          4. Working around optimizer bugs
          5. Avoiding plan restructuring with OFFSET
          6. External trouble spots
        12. SQL Limitations
          1. Numbering rows in SQL
          2. Using Window functions for numbering
          3. Using Window functions for cumulatives
        13. Summary
      16. 11. Database Activity and Statistics
        1. Statistics views
        2. Cumulative and live views
        3. Table statistics
          1. Table I/O
        4. Index statistics
          1. Index I/O
        5. Database wide totals
        6. Connections and activity
        7. Locks
          1. Virtual transactions
          2. Decoding lock information
          3. Transaction lock waits
          4. Table lock waits
          5. Logging lock information
            1. Deadlocks
        8. Disk usage
        9. Buffer, background writer, and checkpoint activity
          1. Saving pg_stat_bgwriter snapshots
          2. Tuning using background writer statistics
        10. Summary
      17. 12. Monitoring and Trending
        1. UNIX monitoring tools
          1. Sample setup
          2. vmstat
          3. iostat
            1. iotop for Linux
            2. Examples of good performance
            3. Overloaded system samples
          4. top
            1. Solaris top replacements
            2. htop for Linux
          5. sysstat and sar
            1. Enabling sysstat and its optional features
            2. Graphing with kSar
        2. Windows monitoring tools
          1. Task Manager
            1. Sysinternals tools
          2. Windows System Monitor
            1. Saving Windows System Monitor data
        3. Trending software
          1. Types of monitoring and trending software
            1. Storing historical trend data
          2. Nagios
            1. Nagios and PostgreSQL
            2. Nagios and Windows
          3. Cacti
            1. Cacti and PostgreSQL
            2. Cacti and Windows
          4. Munin
          5. Other trending packages
            1. pgstatspack
            2. Zenoss
            3. Hyperic HQ
            4. Reconnoiter
            5. Staplr
            6. SNMP tools
        4. Summary
      18. 13. Pooling and Caching
        1. Connection pooling
          1. Pooling connection counts
          2. pgpool-II
            1. pgpool-II load balancing for replication scaling
          3. pgBouncer
            1. Application server pooling
        2. Database caching
          1. memcached
          2. pgmemcache
        3. Summary
      19. 14. Scaling with Replication
        1. Hot Standby
          1. Terminology
          2. Setting up WAL shipping
          3. Streaming Replication
          4. Tuning Hot Standby
        2. Replication queue managers
          1. Slony
          2. Londiste
          3. Read scaling with replication queue software
        3. Special application requirements
          1. Bucardo
          2. pgpool-II
        4. Other interesting replication projects
        5. Summary
      20. 15. Partitioning Data
        1. Table range partitioning
          1. Determining a key field to partition over
          2. Sizing the partitions
            1. List partitioning
          3. Creating the partitions
          4. Redirecting INSERT statements to the partitions
            1. Dynamic trigger functions
            2. Partition rules
          5. Empty partition query plans
          6. Date change update trigger
          7. Live migration of a partitioned table
          8. Partitioned queries
          9. Creating new partitions
            1. Scheduled creation
            2. Dynamic creation
          10. Partitioning advantages
          11. Common partitioning mistakes
        2. Horizontal partitioning with PL/Proxy
          1. Hash generation
          2. Scaling with PL/Proxy
            1. Sharding
          3. Scaling with GridSQL
        3. Summary
      21. 16. Avoiding Common Problems
        1. Bulk loading
          1. Loading methods
            1. External loading programs
          2. Tuning for bulk loads
          3. Skipping WAL acceleration
          4. Recreating indexes and adding constraints
          5. Parallel restore
          6. Post load cleanup
        2. Common performance issues
          1. Counting rows
          2. Unexplained writes
          3. Slow function and prepared statement execution
          4. PL/pgSQL benchmarking
          5. High foreign key overhead
          6. Trigger memory use
          7. Heavy statistics collector overhead
            1. Targeted statistics resets
          8. Materialized views
        3. Profiling the database
          1. gprof
          2. OProfile
          3. Visual Studio
          4. DTrace
            1. DTrace on FreeBSD
            2. Linux SystemTap emulation of DTrace
        4. Performance related features by version
          1. Aggressive PostgreSQL version upgrades
          2. 8.1
          3. 8.2
          4. 8.3
          5. 8.4
          6. 9.0
            1. Replication
            2. Queries and EXPLAIN
            3. Database development
            4. Configuration and monitoring
            5. Tools
            6. Internals
        5. Summary