You are previewing PostgreSQL Replication - Second Edition.
O'Reilly logo
PostgreSQL Replication - Second Edition

Book Description

Leverage the power of PostgreSQL replication to make your databases more robust, secure, scalable, and fast

In Detail

PostgreSQL offers a comprehensive set of replication - related features. Unleashing the power of PostgreSQL provides you with countless opportunities and a competitive advantage over other database systems.

This book will guide you through the most important concepts of PostgreSQL replication. It contains all the information you need to design and operate replicated setups.

Beginning by giving you an understanding of replication concepts, the PostgreSQL transaction log, and Point-in-time Recovery, we gradually move on to setting up asynchronous and synchronous replication. Next up, you will learn to monitor a PostgreSQL cluster setup, deal with monitoring tools, and then move on to understanding Linux High Availability. Further, we explore widely-used tools such as Slony, SkyTools, Postgres-XC, and walbouncer, and set up PL/Proxy.

Finally, you'll get acquainted with the new technology of BDR, which allows bidirectional replication in PostgreSQL.

What You Will Learn

  • Use Point-in-time Recovery to perform data recovery as well as replication

  • Set up synchronous as well as asynchronous streaming replication

  • Get familiarized with the transaction log, the core component of most replication setups and its purpose

  • Improve speed and reliability with an understanding of pgpool and PgBouncer

  • Increase your data security and geographically distribute data

  • Make your systems more available and secure with Linux High Availability

  • Scale out with PL/Proxy and Postgres-XC

  • Detect, investigate, and solve replication-related problems

  • Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.

    Table of Contents

    1. PostgreSQL Replication Second Edition
      1. Table of Contents
      2. PostgreSQL Replication Second Edition
      3. Credits
      4. About the Author
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers, and more
          1. Why subscribe?
          2. Free access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Errata
          2. Piracy
          3. Questions
      8. 1. Understanding the Concepts of Replication
        1. The CAP theorem and physical limitations of replication
          1. Understanding the CAP theorem
          2. Understanding the limits of physics
        2. Different types of replication
          1. Synchronous versus asynchronous replication
            1. Considering performance issues
            2. Understanding replication and data loss
          2. Single-master versus multimaster replication
          3. Logical versus physical replication
            1. When to use physical replication
            2. When to use logical replication
        3. Using sharding and data distribution
          1. Understanding the purpose of sharding
            1. Designing a sharded system
            2. Querying different fields
          2. Pros and cons of sharding
          3. Choosing between sharding and redundancy
          4. Increasing and decreasing the size of a cluster
          5. Combining sharding and replication
          6. Various sharding solutions
            1. PostgreSQL-based sharding
        4. Summary
      9. 2. Understanding the PostgreSQL Transaction Log
        1. How PostgreSQL writes data
          1. The PostgreSQL disk layout
            1. Looking into the data directory
            2. PG_VERSION – the PostgreSQL version number
            3. base – the actual data directory
              1. Growing data files
              2. Performing I/O in chunks
              3. Relation forks
            4. global – the global data
              1. Dealing with standalone data files
            5. pg_clog – the commit log
            6. pg_dynshmem – shared memory
            7. pg_hba.conf – host-based network configuration
            8. pg_ident.conf – ident authentication
            9. pg_logical – logical decoding
            10. pg_multixact – multitransaction status data
            11. pg_notify – LISTEN/NOTIFY data
            12. pg_replslot – replication slots
            13. pg_serial – information about committed serializable transactions
            14. pg_snapshot – exported snapshots
            15. pg_stat – permanent statistics
            16. pg_stat_tmp – temporary statistics data
            17. pg_subtrans – subtransaction data
            18. pg_tblspc – symbolic links to tablespaces
            19. pg_twophase – information about prepared statements
            20. pg_xlog – the PostgreSQL transaction log (WAL)
            21. postgresql.conf – the central PostgreSQL configuration file
          2. Writing one row of data
            1. A simple INSERT statement
              1. Crashing during WAL writing
              2. Crashing after WAL writing
          3. Read consistency
            1. The purpose of the shared buffer
            2. Mixed reads and writes
          4. The format of the XLOG
        2. The XLOG and replication
        3. Understanding consistency and data loss
          1. All the way to the disk
            1. From memory to memory
            2. From the memory to the disk
            3. A word about batteries
            4. Beyond the fsync function
          2. PostgreSQL consistency levels
        4. Tuning checkpoints and the XLOG
          1. Understanding the checkpoints
          2. Configuring the checkpoints
            1. Segments and timeouts
            2. To write or not to write?
              1. Scenario 1 – storing stock market data
              2. Scenario 2 – bulk loading
              3. Scenario 3 – I/O spikes and throughput considerations
              4. Conclusion
          3. Tweaking WAL buffers
        5. Experiencing the XLOG in action
          1. Understanding the XLOG records
            1. Making the XLOG deterministic
            2. Making the XLOG reliable
          2. LSNs and shared buffer interaction
            1. Debugging the XLOG and putting it all together
        6. Making use of replication slots
          1. Physical replication slots
          2. Logical replication slots
            1. Configuring replication identities
        7. Summary
      10. 3. Understanding Point-in-time Recovery
        1. Understanding the purpose of PITR
          1. Moving to the bigger picture
        2. Archiving the transaction log
        3. Taking base backups
          1. Using pg_basebackup
            1. Modifying pg_hba.conf
            2. Signaling the master server
            3. pg_basebackup – basic features
              1. Backup throttling
              2. pg_basebackup – self-sufficient backups
          2. Making use of traditional methods to create base backups
          3. Tablespace issues
          4. Keeping an eye on the network bandwidth
        4. Replaying the transaction log
          1. Performing a basic recovery
          2. More sophisticated positioning in the XLOG
          3. Cleaning up the XLOG on the way
          4. Switching the XLOG files
        5. Summary
      11. 4. Setting Up Asynchronous Replication
        1. Setting up streaming replication
          1. Tweaking the config files on the master
          2. Handling pg_basebackup and recovery.conf
          3. Making the slave readable
          4. The underlying protocol
        2. Configuring a cascaded replication
        3. Turning slaves into masters
        4. Mixing streaming-based and file-based recovery
          1. The master configuration
          2. The slave configuration
          3. Error scenarios
            1. Network connection between the master and slave is dead
            2. Rebooting the slave
            3. Rebooting the master
            4. Corrupted XLOG in the archive
        5. Making streaming-only replication more robust
          1. Using wal_keep_segments
          2. Utilizing replication slots
        6. Efficient cleanup and the end of recovery
          1. Gaining control over the restart points
          2. Tweaking the end of your recovery
        7. Conflict management
        8. Dealing with timelines
        9. Delayed replicas
          1. Handling crashes
        10. Summary
      12. 5. Setting Up Synchronous Replication
        1. Synchronous replication setup
          1. Understanding the downside to synchronous replication
          2. Understanding the application_name parameter
          3. Making synchronous replication work
          4. Checking the replication
          5. Understanding performance issues
          6. Setting synchronous_commit to on
            1. Setting synchronous_commit to remote_write
            2. Setting synchronous_commit to off
            3. Setting synchronous_commit to local
          7. Changing durability settings on the fly
        2. Understanding the practical implications and performance
        3. Redundancy and stopping replication
        4. Summary
      13. 6. Monitoring Your Setup
        1. Checking your archive
          1. Checking archive_command
          2. Monitoring the transaction log archive
        2. Checking pg_stat_replication
          1. Relevant fields in pg_stat_replication
        3. Checking for operating system processes
        4. Checking for replication slots
        5. Dealing with monitoring tools
          1. Installing check_postgres
          2. Deciding on a monitoring strategy
        6. Summary
      14. 7. Understanding Linux High Availability
        1. Understanding the purpose of High Availability
        2. Measuring availability
        3. Durability and availability
        4. Detecting failures
        5. The split-brain syndrome
        6. Understanding Linux-HA
          1. Corosync
          2. Pacemaker
          3. Resource agents / fence agents
          4. PCS
          5. The PostgreSQL resource agent
        7. Setting up a simple HA cluster
          1. Preparing the servers
          2. Installing the necessary software
          3. Configuring the clustering software
          4. Preparing for the PostgreSQL installation
          5. Syncing the standby
          6. Configuring the cluster
          7. Configuring cluster resources
          8. Configuring the constraints
          9. Setting up fencing
          10. Verifying the setup
        8. Common maintenance tasks
          1. Performing maintenance on a single node
          2. Forcing a failover
          3. Recovering from failed PostgreSQL starts
          4. Performing cluster-wide maintenance
          5. Resynchronizing after master failure
        9. Summary
      15. 8. Working with PgBouncer
        1. Understanding the fundamental PgBouncer concepts
        2. Installing PgBouncer
        3. Configuring your first PgBouncer setup
          1. Writing a simple config file and starting PgBouncer up
            1. Dispatching requests
            2. More basic settings
          2. Handling pool sizes
            1. max_client_conn
            2. default_pool_size
            3. min_pool_size
            4. reserve_pool_size
            5. pool_size
            6. Authentication
          3. Connecting to PgBouncer
            1. Java issues
          4. Pool modes
          5. Cleanup issues
        4. Improving performance
          1. A simple benchmark
        5. Maintaining PgBouncer
          1. Configuring the admin interface
          2. Using the management database
          3. Extracting runtime information
          4. Suspending and resuming operations
        6. Summary
      16. 9. Working with pgpool
        1. Installing pgpool
          1. Installing additional modules
        2. Understanding the features of pgpool
        3. Understanding the pgpool architecture
        4. Setting up replication and load balancing
          1. Password authentication
          2. Firing up pgpool and testing the setup
          3. Attaching hosts
        5. Checking the replication
        6. Running pgpool with streaming replication
          1. Optimizing the pgpool configuration for master/slave mode
        7. Dealing with failovers and High Availability
          1. Using PostgreSQL streaming and Linux HA
          2. pgpool mechanisms for High Availability and failover
        8. Summary
      17. 10. Configuring Slony
        1. Installing Slony
        2. Understanding how Slony works
          1. Dealing with logical replication
          2. The slon daemon
        3. Replicating your first database
        4. Deploying DDLs
        5. Adding tables to replication and managing problems
        6. Performing failovers
          1. Planned failovers
          2. Unplanned failovers
        7. Summary
      18. 11. Using SkyTools
        1. Installing SkyTools
        2. Dissecting SkyTools
        3. Managing pgq queues
          1. Running pgq
            1. Creating queues and adding data
            2. Adding consumers
            3. Configuring the ticker
            4. Consuming messages
            5. Dropping queues
            6. Using pgq for large projects
        4. Using Londiste to replicate data
          1. Replicating our first table
        5. A word about walmgr
        6. Summary
      19. 12. Working with Postgres-XC
        1. Understanding the Postgres-XC architecture
          1. Data nodes
          2. GTM
          3. Coordinators
          4. GTM proxy
        2. Installing Postgres-XC
        3. Configuring a simple cluster
          1. Creating the GTM
        4. Optimizing for performance
          1. Dispatching the tables
          2. Optimizing joins
          3. Optimizing for warehousing
          4. Creating a GTM proxy
        5. Creating tables and issuing queries
        6. Adding nodes
          1. Rebalancing data
        7. Handling failovers and dropping nodes
          1. Handling node failovers
          2. Replacing the nodes
          3. Running a GTM standby
        8. Summary
      20. 13. Scaling with PL/Proxy
        1. Understanding the basic concepts
          1. Dealing with the bigger picture
          2. Partitioning the data
        2. Setting up PL/Proxy
          1. A basic example
          2. Partitioned reads and writes
        3. Extending and handling clusters in a clever way
          1. Adding and moving partitions
          2. Increasing the availability
          3. Managing foreign keys
          4. Upgrading the PL/Proxy nodes
        4. Summary
      21. 14. Scaling with BDR
        1. Understanding BDR replication concepts
          1. Understanding eventual consistency
            1. Handling conflicts
            2. Distributing sequences
            3. Handling DDLs
          2. Use cases for BDR
            1. Good use cases for BDR
            2. Bad use cases for BDR
          3. Logical decoding does the trick
        2. Installing BDR
          1. Installing binary packages
        3. Setting up a simple cluster
          1. Arranging storage
          2. Creating database instances
          3. Loading modules and firing up the cluster
          4. Checking your setup
          5. Handling conflicts
        4. Understanding sets
          1. Unidirectional replication
          2. Handling data tables
        5. Controlling replication
        6. Summary
      22. 15. Working with Walbouncer
        1. The concepts of walbouncer
          1. Filtering XLOG
        2. Installing walbouncer
          1. Configuring walbouncer
          2. Creating a base backup
          3. Firing up walbouncer
        3. Using additional configuration options
        4. Adjusting filtering rules
          1. Removing and filtering objects
          2. Adding objects to slaves
        5. Summary
      23. Index