You are previewing PostgreSQL Replication.
O'Reilly logo
PostgreSQL Replication

Book Description

Replication is an advanced subject but this tutorial makes it so simple to assimilate thanks to a carefully structured approach based on examples and screenshots. As a PostrgreSQL administrator it will expand your horizons.

  • Explains the new replication features introduced in PostgreSQL 9

  • Contains easy to understand explanations and lots of screenshots that simplify an advanced topic like replication

  • Teaches PostgreSQL administrators how to maintain consistency between redundant resources and to improve reliability, fault-tolerance, and accessibility

  • In Detail

    PostgreSQL offers a comprehensive set of replication related features, which can be used to make your database servers more robust and way more scalable. Unleashing the power of PostgreSQL provides the user with countless opportunities and a competitive advantage over other database systems. To make things more powerful, PostgreSQL can be used in conjunction with a handful of sophisticated tools serving various different needs such as queuing, logical replication, or simplified transaction log handling.

    "PostgreSQL Replication" is a practical, hands-on guide to PostgreSQL replication. It will provide you with the theoretical background as well as simple examples showing you how to make replication work on your system. A broad toolchain will be presented along with mature PostgreSQL-core technology.

    "PostgreSQL Replication" starts with an introduction to replication concepts as well as the physical limitations of different replication solutions. You will be guided through various techniques such as Point-In-Time-Recovery, transaction-log-based replication and you will be introduced to a set of replication-related tools. In the final chapter you will learn to scale PostgreSQL to many different servers using PL/Proxy.

    You will learn how to reset PostgreSQL to a certain point in time and figure out how to replicate data in many ways. You will deal with both synchronous as well as asynchronous replication. In addition to that, the book covers important topics, such as Slony, and upgrades with virtually no downtime. We will also cover important performance-related topics to make sure your database setups will provide you with high speed AND high availability.

    "PostgreSQL Replication"contains all the information you need to design and operate replicated setups. You will learn everything you need to know for your daily work and a lot more.

    Table of Contents

    1. PostgreSQL Replication
      1. Table of Contents
      2. PostgreSQL Replication
      3. Credits
      4. About the Authors
      5. About the Reviewers
      6. www.PacktPub.com
        1. Support files, eBooks, discount offers and more
          1. Why Subscribe?
          2. Free Access for Packt account holders
      7. Preface
        1. What this book covers
        2. What you need for this book
        3. Who this book is for
        4. Conventions
        5. Reader feedback
        6. Customer support
          1. Downloading the example code
          2. Errata
          3. Piracy
          4. Questions
      8. 1. Understanding Replication Concepts
        1. The CAP theory and physical limitations
          1. Understanding the CAP theory
          2. Why the speed of light matters
          3. Long distance transmission
          4. Why latency matters
        2. Different types of replication
          1. Synchronous versus asynchronous replication
            1. Understanding replication and data loss
            2. Considering the performance issues
          2. Single-master versus multi-master replication
          3. Logical versus physical replication
            1. When to use physical replication
            2. When to use logical replication
        3. Using sharding and data distribution
          1. Understanding the purpose of sharding
            1. An example of designing a sharded system
            2. An example of querying different fields
          2. Pros and cons of sharding
          3. Choosing between sharding and redundancy
          4. Increasing and decreasing the size of a cluster
          5. Combining sharding and replication
          6. Various sharding solutions
            1. PostgreSQL-based sharding
            2. External frameworks/middleware
        4. Summary
      9. 2. Understanding the PostgreSQL Transaction Log
        1. How PostgreSQL writes data
          1. The PostgreSQL disk layout
            1. Looking into the data directory
            2. PG_VERSION – PostgreSQL version number
            3. base – the actual data directory
              1. Growing data files
              2. Performing I/O in chunks
              3. Relation forks
            4. global – the global data
              1. Dealing with standalone data files
            5. pg_clog – the commit log
            6. pg_hba.conf – host-based network configuration
            7. pg_ident.conf – ident authentication
            8. pg_multixact – multi-transaction status data
            9. pg_notify – LISTEN/NOTIFY data
            10. pg_serial – information about committed serializable transactions
            11. pg_snapshot – exported snapshots
            12. pg_stat_tmp – temporary statistics data
            13. pg_subtrans – subtransaction data
            14. pg_tblspc – symbolic links to tablespaces
            15. pg_twophase – information about prepared statements
            16. pg_XLOG – the PostgreSQL transaction log (WAL)
            17. postgresql.conf – the central PostgreSQL configuration file
          2. Writing one row of data
            1. A simple INSERT statement
              1. Crashing during WAL-writing
              2. Crashing after WAL-writing
          3. Read consistency
            1. The purpose of the shared buffer
            2. Mixed reads and writes
        2. The XLOG and replication
        3. Understanding consistency and data loss
          1. All the way to the disk
            1. From memory to memory
            2. From memory to the disk
            3. One word about batteries
            4. Beyond fsync()
          2. PostgreSQL consistency levels
        4. Tuning checkpoints and the XLOG
          1. Understanding the checkpoints
          2. Configuring checkpoints
            1. About segments and timeouts
            2. To write or not to write?
              1. Scenario 1 – Storing stock-market data
              2. Scenario 2 – Bulk loading
              3. Scenario 3 – I/O spikes and throughput considerations
              4. Conclusion
          3. Tweaking WAL buffers
        5. The internal structure of the XLOG
          1. Understanding the XLOG records
            1. Making the XLOG deterministic
            2. Making the XLOG reliable
          2. LSNs and shared buffer interaction
            1. Debugging the XLOG and putting it all together
        6. Summary
      10. 3. Understanding Point-In-Time-Recovery
        1. Understanding the purpose of PITR
          1. Moving to the bigger picture
        2. Archiving the transaction log
        3. Taking base backups
          1. Using pg_basebackup
            1. Modifying pg_hba.conf
          2. Signaling the master server
          3. pg_basebackup – basic features
            1. pg_basebackup – self-sufficient backups
        4. Making use of traditional methods to create base backups
        5. Tablespace issues
        6. Keeping an eye on network bandwidth
        7. Replaying the transaction log
          1. Performing a basic recovery
          2. More sophisticated positioning in the XLOG
          3. Cleaning up the XLOG on the way
          4. Switching the XLOG files
        8. Summary
      11. 4. Setting up Asynchronous Replication
        1. Setting up streaming replication
          1. Tweaking the config files on the master
          2. Handling pg_basebackup and recovery.conf
          3. Making the slave readable
          4. The underlying protocol
        2. Configuring a cascaded replication
        3. Turning slaves to masters
        4. Mixing streaming and file-based recovery
          1. The master configuration
          2. The slave configuration
          3. Error scenarios
            1. Network connection between the master and slave is dead
            2. Rebooting the slave
            3. Rebooting the master
            4. Corrupted XLOG in the archive
        5. Making the streaming-only replication more robust
        6. Efficient cleanup and the end of recovery
          1. Gaining control over the restart points
          2. Tweaking the end of your recovery
        7. Conflict management
        8. Dealing with the timelines
        9. Summary
      12. 5. Setting up Synchronous Replication
        1. Setting up synchronous replication
          1. Understanding the downside of synchronous replication
          2. Understanding the application_name parameter
          3. Making synchronous replication work
          4. Checking replication
          5. Understanding performance issues
          6. Setting synchronous_commit to on
            1. Setting synchronous_commit to remote_write
            2. Setting synchronous_commit to off
            3. Setting synchronous_commit to local
          7. Changing durability settings on the fly
        2. Understanding practical implications and performance
        3. Redundancy and stopping replication
        4. Summary
      13. 6. Monitoring Your Setup
        1. Checking your archive
          1. Checking the archive_command
          2. Monitoring the transaction log archive
        2. Checking pg_stat_replication
          1. Relevant fields in pg_stat_replication
        3. Checking for operating system processes
        4. Dealing with monitoring tools
          1. Installing check_postgres
          2. Deciding on a monitoring strategy
        5. Summary
      14. 7. Understanding Linux High Availability
        1. Understanding the purpose of high availability
        2. Measuring availability
        3. History of high-availability software
          1. OpenAIS and Corosync
          2. Linux-HA (Heartbeat) and Pacemaker
        4. Terminology and concepts
        5. High availability is all about redundancy
        6. PostgreSQL and high availability
          1. High availability with quorum
          2. High availability with STONITH
        7. Summary
      15. 8. Working with pgbouncer
        1. Understanding fundamental pgbouncer concepts
        2. Installing pgbouncer
        3. Configuring your first pgbouncer setup
          1. Writing a simple config file and starting pgbouncer up
            1. Dispatching requests
            2. More basic settings
            3. Authentication
          2. Connecting to pgbouncer
            1. Java issues
          3. Pool modes
          4. Cleanup issues
        4. Improving performance
          1. A simple benchmark
        5. Maintaining pgbouncer
          1. Configuring the admin interface
          2. Using the management database
          3. Extracting runtime information
          4. Suspending and resuming operations
        6. Summary
      16. 9. Working with pgpool
        1. Installing pgpool
          1. Installing pgpool-regclass and insert_lock
        2. Understanding pgpool features
        3. Understanding the pgpool architecture
        4. Setting up replication and load balancing
          1. Password authentication
          2. Firing up pgpool and testing the setup
          3. Attaching hosts
        5. Checking replication
        6. Running pgpool with streaming replication
          1. Optimizing pgpool configuration for master/slave mode
        7. Dealing with failovers and high availability
          1. Using PostgreSQL streaming and Linux HA
          2. pgpool mechanisms for high availability and failover
        8. Summary
      17. 10. Configuring Slony
        1. Installing Slony
        2. Understanding how Slony works
          1. Dealing with logical replication
          2. The slon daemon
        3. Replicating your first database
        4. Deploying DDLs
        5. Adding tables to replication and managing problems
        6. Performing failovers
          1. Planned failovers
          2. Unplanned failovers
        7. Summary
      18. 11. Using Skytools
        1. Installing skytools
        2. Dissecting skytools
        3. Managing pgq-queues
          1. Running pgq
            1. Creating queues and adding data
            2. Adding consumers
            3. Configuring the ticker
            4. Consuming messages
            5. Dropping queues
            6. Using pgq for large projects
        4. Using londiste to replicate data
          1. Replicating our first table
        5. One word about walmgr
        6. Summary
      19. 12. Working with Postgres-XC
        1. Understanding the Postgres-XC architecture
          1. Data nodes
          2. GTM – Global Transaction Manager
          3. Coordinators
          4. GTM Proxy
        2. Installing Postgres-XC
        3. Configuring a simple cluster
          1. Creating the GTM
        4. Optimizing for performance
          1. Dispatching the tables
          2. Optimizing the joins
          3. Optimizing for warehousing
          4. Creating a GTM Proxy
        5. Creating the tables and issuing the queries
        6. Adding nodes
        7. Handling failovers and dropping nodes
          1. Handling node failovers
          2. Replacing the nodes
          3. Running a GTM standby
        8. Summary
      20. 13. Scaling with PL/Proxy
        1. Understanding the basic concepts
          1. Dealing with the bigger picture
          2. Partitioning the data
        2. Setting up PL/Proxy
          1. A basic example
          2. Partitioned reads and writes
        3. Extending and handling clusters in a clever way
          1. Adding and moving partitions
          2. Increasing the availability
          3. Managing the foreign keys
          4. Upgrading the PL/Proxy nodes
        4. Summary
      21. Index