Table of Contents

Resources

Objective

The aim is to add, as a starting point in the dev env (docker containers), a High Availability PostgreSQL cluster with replication, load balancing and failover. This would consist of a Master (primary) for the write operations and 2 secondaries (standby) with load balancing for the read-only operations.

Available strategies

There are several available strategies for different scenarios: comparison of different solutions

Key points for the first experiments

  • Synchronous replication (minimal asynchrony).
  • Minimal risk of data loss.
  • Service level replication: no Shared Disk Failover : material failure or I/O errors would be shared too. File System Replication (see DRBD) could be used as a failover solution but not for replication.
  • Keep as simple as possible : replication of the entire database, no data partitioning (i.e.: tables split into sets) and consequently no Parallel Query Execution.
  • No risk of data loss: discard Trigger-Based Primary-Standby Replication, as according to the documentation “there is possible data loss during fail over”.
  • Hot secondaries: the replication needs to be synchronous for the load balancing.
  • Transparent: no adaptation of the code or queries to the used strategy. This discard the SQL-Based Replication Middleware: adaptation of function like CURRENT_TIMESTAMP and transactions must be managed for all servers independently.
  • No multi masters.

Tested strategy

The first tested strategy is Write-Ahead Log Shipping. In a second step, Logical Replication (table level replication) and Synchronous Multimaster Replication could be tested. NB: This is not implemented directly in Postgres, it has to be implemented in the app code. In a third step, we could experiment Data Partitioning and Multiple-Server Parallel Query Execution.

Write-Ahead Log Shipping

Two implementations: Log-Shipping and Streaming Replication. The last one will be used as there is less latency.

Streaming replication

  • Principle: WALL replication
  • Small delay (smaller than with log-shipping)
  • archive_timeout not required (data loss window)
  • wal_keep_size this has to be set carefully (large enough): if WAL segment are recycled too quickly, the standby server has to be reinitialized.

Implementation steps

Primary

  1. Create a PostgreSQL replication user and allow connections to replication database from secondaries.
  2. Creates replication slots for each secondary.
  3. Activate the continuous archiving. NB.: the folder where the WAL files are copied must not be hosted on the primary.

Secondaries

  1. Restore each secondary from a primary backup
  2. Create a file called standby.signal in the data directory of each secondary.
  3. Modify the file postgresql.conf of each secondary to set the connection info to the primary and the copy command (for WAL restoring)

Docker implementation

  • The archive directory is a docker volume mounted on the primary and secondaries.
  • The modified postgresql.conf and pg_hba.conf (Primary & Secondaries - step 3) are mounted via docker.
  • The sql initialization of the primary is executed via a docker entry point (Primary - step 1).
  • For the secondaries, the database initialization and the creation of standby.signal is handled in the docker command of the service.


See the implementation in the development environment