O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

PostgreSQL 10 High Performance

Book Description

Leverage the power of PostgreSQL 10 to design, administer and maintain a high-performance database solution

About This Book
  • Obtain optimal PostgreSQL 10 database performance, ranging from initial design to routine maintenance
  • Fine tune the performance of your queries and avoid the common pitfalls that can slow your system down
  • Contains tips and tricks on scaling successful database installations, and ensuring a highly available PostgreSQL solution
Who This Book Is For

This book is designed for database administrators and PostgreSQL architects who already use or plan to exploit the features of PostgreSQL 10 to design and maintain a high-performance PostgreSQL database. A working knowledge of SQL, and some experience with PostgreSQL will be helpful in getting the most out of this book.

What You Will Learn
  • Learn best practices for scaling PostgreSQL 10 installations
  • Discover the best hardware for developing high-performance PostgreSQL applications
  • Benchmark your whole system – from hardware to application
  • Learn by real examples how server parameters impact performance
  • Discover PostgreSQL 10 features for partitioning and parallel query
  • Monitor your server, both inside and outside the database
  • Design and implement a good replication system on PostgreSQL 10
In Detail

PostgreSQL database servers have a common set of problems that they encounter as their usage gets heavier and requirements get more demanding. Peek into the future of your PostgreSQL 10 database's problems today. Know the warning signs to look for and how to avoid the most common issues before they even happen.

Surprisingly, most PostgreSQL database applications evolve in the same way—choose the right hardware, tune the operating system and server memory use, optimize queries against the database and CPUs with the right indexes, and monitor every layer, from hardware to queries, using tools from inside and outside PostgreSQL. Also, using monitoring insight, PostgreSQL database applications continuously rework the design and configuration. On reaching the limits of a single server, they break things up; connection pooling, caching, partitioning, replication, and parallel queries can all help handle increasing database workloads.

By the end of this book, you will have all the knowledge you need to design, run, and manage your PostgreSQL solution while ensuring high performance and high availability

Style and approach

This book has been organized in such a manner that will help you understand basic PostgreSQL 10 performance tuning to an advanced-level configuration.

Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

Table of Contents

  1. Title Page
  2. Copyright and Credits
    1. PostgreSQL 10 High Performance
  3. Dedication
  4. Packt Upsell
    1. Why subscribe?
    2. PacktPub.com
  5. Contributors
    1. About the authors
    2. About the reviewer
    3. Packt is searching for authors like you
  6. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Download the color images
      3. Conventions used
    4. Get in touch
      1. Reviews
  7. PostgreSQL Versions
    1. Performance of historical PostgreSQL releases
    2. Choosing a version to deploy
    3. Upgrading to a newer major version
    4. Upgrades to PostgreSQL 8.3+ from earlier ones
      1. Minor version upgrades
      2. Migrating from PostgreSQL 9.x to 10.x – a new way to work
    5. PostgreSQL or another database?
      1. PostgreSQL 10.x and NoSQL
      2. PostgreSQL as HUB
    6. PostgreSQL tools
    7. PostgreSQL contrib
    8. Finding contrib modules on your system
    9. Installing a contrib module from source
    10. Using a contrib module
    11. pgFoundry
    12. PGXN
    13. Additional PostgreSQL-related software
    14. PostgreSQL application scaling life cycle
    15. Performance tuning as a practice
    16. Summary
  8. Database Hardware
    1. Balancing hardware spending
      1. CPUs
      2. Memory
      3. Disks
        1. RAID
        2. Drive error handling
        3. Hard drive reliability studies
        4. Drive firmware and RAID
        5. SSDs
      4. Disk controllers
        1. Hardware and software RAID
        2. Recommended disk controllers
        3. Attached storage – SAN and NAS
    2. Reliable controller and disk setup
      1. Write-back caches
        1. Sources of write-back caching
        2. Disk controller monitoring
        3. Disabling drive write caches
      2. Performance impact of write-through caching
    3. Summary
  9. Database Hardware Benchmarking
    1. CPU and memory benchmarking
      1. Memtest86+
      2. STREAM memory testing
        1. STREAM and Intel versus AMD
      3. CPU benchmarking
      4. Sources of slow memory and processors
    2. Physical disk performance
      1. Random access and input/outputs per second
      2. Sequential access and ZCAV
        1. Short stroking
      3. Commit rate
        1. PostgreSQL test_fsync
      4. INSERT rate
      5. Windows commit rate
    3. Disk benchmarking tools
      1. HD Tune
        1. Short stroking tests
        2. IOPS
        3. Unpredictable performance and Windows
      2. Disk throughput in case of sequential read and write
      3. Bonnie++
        1. Bonnie++ 2.0
        2. Bonnie++ ZCAV
      4. sysbench
      5. pgbench
      6. Seek rate
        1. Removing test files
        2. fsync commit rate
      7. Complicated disk benchmarks
    4. Sample disk results
      1. Disk performance expectations
      2. Sources of slow disk and array performance
    5. Summary
  10. Disk Setup
    1. Maximum filesystem sizes
    2. Filesystem crash recovery
      1. Journaling filesystems
    3. Linux filesystems
      1. ext2
      2. ext3
      3. ext4
      4. XFS
      5. Benchmarks
      6. Other Linux filesystems
      7. Write barriers
        1. Drive support for barriers
        2. Filesystem support for barriers
      8. General Linux filesystem tuning
        1. Read-ahead
        2. File access times
        3. Read caching and swapping
        4. Write cache sizing
        5. I/O scheduler elevator
    4. Solaris and FreeBSD filesystems
      1. Solaris UFS
      2. FreeBSD UFS2
      3. ZFS
      4. Windows filesystems
      5. FAT32
      6. NTFS
        1. Adjusting mounting behavior
    5. Disk layout for PostgreSQL
      1. Symbolic links
      2. Tablespaces
      3. Database directory tree
        1. Temporary files
      4. Disk arrays, RAID, and disk layout
        1. Disk layout guidelines
    6. Summary
  11. Memory for Database Caching
    1. Memory units in postgresql.conf
    2. Increasing Unix shared memory parameters for larger buffer sizes
      1. Kernel semaphores
      2. Estimating shared memory allocation
      3. Inspecting the database cache
      4. Installing pg_buffercache into a database
      5. Database disk layout
      6. Creating a new block in a database
      7. Writing dirty blocks to disk
    3. Crash recovery and the buffer cache
      1. Checkpoint processing basics
      2. Write-ahead log and recovery processing
      3. Checkpoint timing
        1. Checkpoint spikes
        2. Spread checkpoints
      4. Database block life cycle
        1. Dirty block write paths
    4. Database buffer cache versus operating system cache
      1. Doubly cached data
        1. Inspecting the OS cache
      2. Checkpoint overhead
      3. Starting size guidelines
        1. Platform, version, and workload limitations
    5. Analyzing buffer cache contents
      1. Inspection of the buffer cache queries
        1. Top relations in the cache
        2. Summary by usage count
        3. Buffer content summary with percentages
        4. Buffer usage count distribution
      2. Using buffer cache inspection for sizing feedback
    6. Summary
  12. Server Configuration Tuning
    1. Interacting with the live configuration
      1. Defaults and reset values
      2. Allowed change context
      3. Reloading the configuration file
        1. Commented out settings
        2. Server-wide settings
      4. Database connections
        1. listen_addresses
        2. max_connections
      5. Shared memory
        1. shared_buffers
        2. Free Space Map settings
      6. Logging
        1. log_line_prefix
        2. log_statement
        3. log_min_duration_statement
      7. News on PostgreSQL 10
      8. Vacuuming and statistics
        1. autovacuum
        2. Enabling autovacuum on older versions
        3. maintainance_work_mem
        4. default_statistics_target
      9. Checkpoints
        1. checkpoint_segments – max_wal_size
        2. checkpoint_timeout
        3. checkpoint_completion_target
      10. WAL settings
        1. wal_buffers
        2. wal_sync_method
      11. PITR and WAL replication
        1. Per-client settings
        2. effective_cache_size
        3. synchronous_commit
        4. work_mem
        5. random_page_cost
        6. constraint_exclusion
      12. Tunables to avoid
        1. fsync
        2. full_page_writes
        3. commit_delay and commit_siblings
        4. max_prepared_transactions
        5. Querying enable parameters
          1. New server tuning
          2. Dedicated server guidelines
          3. Shared server guidelines
          4. PgTune
    2. Summary
  13. Routine Maintenance
    1. Transaction visibility with multiversion concurrency control
      1. Visibility computation internals
      2. Updates
      3. Row lock conflicts
        1. Serialization
      4. Deletions
      5. Advantages of MVCC
      6. Disadvantages of MVCC
      7. Transaction ID wraparound
    2. Vacuum
      1. Vacuum implementation
        1. Regular vacuum
        2. Returning free disk space
        3. Full vacuum
        4. HOT
      2. Cost-based vacuuming
      3. autovacuum
        1. autovacuum logging
        2. autovacuum monitoring
        3. Autovacuum triggering
        4. Per-table adjustments
      4. Common vacuum and autovacuum problems
        1. autovacuum is running even though it was turned off
        2. autovacuum is constantly running
        3. Out of memory errors
        4. Not keeping up with a busy server
        5. autovacuum is too disruptive
        6. Long running transactions
        7. FSM exhaustion
        8. Recovering from major problems
      5. Autoanalyze
    3. Index bloat
      1. Measuring index bloat
    4. Detailed data and index page monitoring
    5. Monitoring query logs
      1. Basic PostgreSQL log setup
        1. Log collection
        2. log_line_prefix
        3. Multiline queries
        4. Using syslog for log messages
        5. CSV logging
      2. Logging difficult queries
        1. auto_explain
      3. Log file analysis
        1. Normalized query fingerprints
        2. pg_stat_statements
        3. pgBadger
    6. Summary
  14. Database Benchmarking
    1. pgbench default tests
      1. Table definition
      2. Scale detection
      3. Query script definition
      4. Configuring the database server for pgbench
        1. Sample server configuration
          1. Running pgbench manually
    2. Graphing results with pgbench-tools
      1. Configuring pgbench-tools
        1. Customizing for 8.3
    3. Sample pgbench test results
      1. Select-only test
      2. TPC-B-like test
      3. Latency analysis
    4. Sources of bad results and variation
      1. Developer PostgreSQL builds
      2. Worker threads and pgbench program limitations
    5. pgbench custom tests
      1. Insert speed test
    6. Transaction Processing Performance Council benchmarks
    7. Summary
  15. Database Indexing
    1. Indexing example walkthrough
      1. Measuring query disk and index block statistics
      2. Running the example
      3. Sample data setup
      4. Simple index lookups
      5. Full table scans
      6. Index creation
      7. Lookup with an inefficient index
      8. Combining indexes
      9. Switching from indexed to sequential scans
        1. Planning for plan changes
      10. Clustering against an index
      11. Explain with buffer counts
    2. Index creation and maintenance
      1. Unique indexes
      2. Concurrent index creation
      3. Clustering an index
        1. Fill factor
      4. Reindexing
    3. Index types
      1. B-tree
        1. Text operator classes
      2. Hash
      3. GIN
      4. B-tree GIN versus bitmap indexes
      5. GiST
    4. Advanced index use
      1. Multicolumn indexes
      2. Indexes for sorting
      3. Partial indexes
      4. Expression-based indexes
      5. Indexing for full-text search
      6. Indexing like or like queries with pg_trgm contrib
      7. Indexing JSON datatype
    5. Summary
  16. Query Optimization
    1. Sample data sets
      1. Pagila
      2. dellstore2
    2. EXPLAIN basics
      1. Timing overhead
      2. Hot and cold cache behavior
        1. Clearing the cache
    3. Query plan node structure
      1. Basic cost computation
        1. Estimated costs and real-world costs
    4. Explain analysis tools
      1. Visual explain
      2. Verbose output
      3. Machine-readable EXPLAIN output
      4. Plan analysis tools
    5. Assembling row sets
      1. Tuple ID
        1. Object ID
      2. Sequential scan
      3. Index Scan
      4. Bitmap heap and index scans
    6. Processing nodes
      1. Sort
      2. Limit
        1. Offsets
      3. Aggregate
      4. HashAggregate
      5. Unique
        1. WindowAgg
      6. Result
      7. Append
      8. Group
      9. Subquery scan and subplan
        1. Subquery conversion and IN lists
      10. Set operations
      11. Materialize
      12. CTE scan
    7. Joins
      1. Nested loop
        1. Nested loop with inner index scan
      2. Merge Join
        1. Nested Loop and Merge Join materialization
      3. Hash joins
        1. Hash semi and anti-joins
        2. Join ordering
        3. Forcing join order
        4. Join removal
        5. Genetic query optimizer
    8. Statistics
      1. Viewing and estimating with statistics
      2. Statistics targets
        1. Adjusting a column target
        2. Distinct values
      3. Difficult areas to estimate
    9. Other query-planning parameters
      1. effective_cache_size
      2. work_mem
      3. constraint_exclusion
      4. cursor_tuple_fraction
    10. Executing other statement types
    11. Improving queries
      1. Optimizing for fully cached data sets
      2. Testing for query equivalence
      3. Disabling optimizer features
      4. Working around optimizer bugs
      5. Avoiding plan restructuring with OFFSET
      6. External trouble spots
      7. Parallel queries
    12. SQL limitations
      1. Numbering rows in SQL
      2. Using Window functions for numbering
      3. Using Window functions for cumulatives
    13. Summary
  17. Database Activity and Statistics
    1. Statistics views
    2. Cumulative and live views
    3. Table statistics
      1. Table I/O
    4. Index statistics
      1. Index I/O
    5. Database-wide totals
    6. Connections and activity
    7. Locks
      1. Virtual transactions
      2. Decoding lock information
      3. Transaction lock waits
      4. Table lock waits
      5. Logging lock information
        1. Deadlocks
    8. Disk usage
    9. Buffer, background writer, and checkpoint activity
      1. Saving pg_stat_bgwriter snapshots
      2. Tuning using background writer statistics
    10. Summary
  18. Monitoring and Trending
    1. UNIX monitoring tools
      1. Sample setup
      2. vmstat
      3. iostat
        1. iotop for Linux
        2. Examples of good performance
        3. Overloaded system samples
      4. top
        1. Solaris top replacements
        2. htop for Linux
      5. sysstat and sar
        1. Enabling sysstat and its optional features
        2. Graphing with kSar
    2. Windows monitoring tools
      1. Task Manager
        1. Sysinternals tools
      2. Windows system monitor
        1. Saving Windows system monitor data
    3. Trending software
      1. Types of monitoring and trending software
        1. Storing historical trend data
      2. Nagios
        1. Nagios and PostgreSQL
        2. Nagios and Windows
      3. Cacti
        1. Cacti and PostgreSQL
        2. Cacti and Windows
      4. Munin
      5. Other trending packages
        1. pgstatspack
        2. Zenoss
        3. Hyperic HQ
        4. Reconnoiter
        5. Staplr
        6. SNMP tools
    4. Summary
  19. Pooling and Caching
    1. Connection pooling
      1. Pooling connection counts
      2. pgpool-II
        1. pgpool-II load balancing for replication scaling
      3. pgBouncer
        1. Application server pooling
        2. Database caching
      4. memcached
      5. pgmemcache
    2. Summary
  20. Scaling with Replication
    1. Hot Standby
      1. Terminology
      2. Setting up WAL shipping
      3. Streaming replication
      4. Tuning Hot Standby
    2. Replication queue managers
    3. Synchronous replication
    4. Logical replication
      1. Slony
      2. Londiste
      3. Read scaling with replication queue software
    5. Special application requirements
      1. Bucardo
      2. pgpool-II
    6. Other interesting replication projects
    7. Replication solution comparison
    8. Summary
  21. Partitioning Data
    1. Table range partitioning
      1. Determining a key field to partition over
        1. Sizing the partitions
      2. List partitioning
        1. Creating the partitions
        2. Redirecting INSERT statements to the partitions
        3. Dynamic trigger functions
        4. Partition rules
      3. Empty partition query plans
      4. Date change update trigger
      5. Live migration of a partitioned table
    2. PostgreSQL 10 – declarative partitioning – the built-in partitioning 
      1. Range partitioning
      2. Partition maintenance
      3. Caveats
      4. Partitioned queries
      5. Creating new partitions
        1. Scheduled creation
        2. Dynamic creation
      6. Partitioning advantages
      7. Common partitioning mistakes
    3. Horizontal partitioning with PL/Proxy
      1. Hash generation
      2. Scaling with PL/Proxy
        1. Sharding
      3. Scaling with GridSQL
    4. Summary
  22. Avoiding Common Problems
    1. Bulk loading
      1. Loading methods
        1. External loading programs
      2. Tuning for bulk loads
      3. Skipping WAL acceleration
      4. Recreating indexes and adding constraints
      5. Parallel restore
      6. Post-load cleanup
    2. Backup
      1. pg_dump
      2. Continuous archiving and point in time recovery
    3. Common performance issues
      1. Counting rows
      2. Unexplained writes
      3. Slow function and prepared statement execution
      4. PL/pgSQL benchmarking
      5. High foreign key overhead
      6. Trigger memory use
        1. Transition tables for trigger
      7. Heavy statistics collector overhead
        1. Targeted statistics resets
      8. Extended statistics
      9. Materialized views
    4. Foreign data wrapper
    5. The amcheck module
    6. pgAdmin
      1. gprof
      2. OProfile
      3. Visual Studio
      4. DTrace
        1. DTrace on FreeBSD
        2. Linux SystemTap emulation of DTrace
    7. Performance-related features by version
      1. Aggressive PostgreSQL version upgrades
      2. Version 9.0
        1. Replication
        2. Queries and EXPLAIN
        3. Database development
        4. Configuration and monitoring
        5. Tools
        6. Internals
    8. Summary
  23. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think