In simpler terms, logical replication in PostgreSQL is a way to copy data from one database to another in a more flexible and customizable manner compared to physical replication. Instead of copying the entire database, logical replication focuses on replicating changes made to individual rows or transactions. To set up logical replication, you create a publication on the source database and a subscription on the target database. When you enable the subscription, a process called the logical replication launcher identifies it and starts an application worker process. The worker then synchronizes the tables by launching table-sync workers, which copy the changes from the source to the target.

This blog post will examine an alternative method for conducting the initial data synchronization in PostgreSQL. Instead of utilizing the table synchronization process by accelerating the synchronization process through a data dump, we will explore the usage of pg_dump and pg_restore.

Lab environment – The lab environment contains below servers:

  • One primary and two replicas running on Postgres version 12
  • One server running on PostgreSQL version 15
  Server Details  IP Address  PostgreSQL Version
  Primary  10.0.0.25  12
  Replica1  10.0.0.80  12
  Replica2  10.0.0.40  12
  Logical replica (Target Server)  10.0.0.221  15

In this lab, I’ve set up two replicas to make this approach work. I needed to pause the WAL replay process, so having two replicas, including replica2, helps ensure that ongoing replication, which may be important for meeting business requirements, isn’t affected by this approach.

Step 1: Create a publication on the primary server
The first step is to create a publication on the primary server, which defines the tables that will be replicated. Connect to the primary server using the PostgreSQL command line tool psql, and execute the following commands

These commands create a publication named my_pub and add the table table1 to it. The publication is now ready to replicate changes made to table1 on the primary server.

Step 2: Create a logical replication slot on the primary server
Next, we need to create a logical replication slot on the primary server to capture the changes made to the table1. Execute the following command in the psql console on the primary server:

This command creates a logical replication slot named my_pub_slot using the pgoutput output plugin, which is a built-in output plugin for logical replication in PostgreSQL. The slot will capture the changes made to the table1 and store them for replication.

Step 3: Create a subscription on the target server
On the target logical replication server, we need to create a subscription that connects to the primary server and replicates the changes made to table1. Connect to the target server using psql, and execute the following commands. Note that we have omitted the password from the command, as it is located in the .pgpass file

This command creates a subscription named my_pub_subscription that connects to the primary server at IP address 10.0.0.25, using the Percona database, and replicates the publication my_pub, which includes table1. The subscription is created with the following options:

copy_data=false: This option specifies that we will not copy the data from the publisher using logical replication, but instead, we will use the pg_dump tool to copy the data later.
create_slot=false: This option specifies that we will not create a replication slot on the target server, as we have already created a logical replication slot on the primary server.
enabled=false: This option specifies that the subscription will not be enabled immediately, as we want to perform some additional steps before enabling it.
slot_name=my_pub_slot: This option specifies the name of the logical replication slot that we created on the primary server.

Step 4: Pause replication on replica2::10.0.0.40 to pg_dump a data snapshot for table1 and restore it on the target logical replication node

On replica2:

NOTE: For additional details on how to use and apply pg_dump, I suggest taking a look at my colleague’s blog posts, which covers several topics, including simple backup and restore, PostgreSQL upgrade using pg_dump and pg_restore, PostgreSQL upgrade using pg_dumpall, and working with PostgreSQL dump manifests.

Now replication is paused on replica2, and transactions on the primary node won’t be replicated to replica2. To confirm whether the replay is paused on replica2, we insert a new record into table1 on the primary server, increasing the total count to 1,000,001. We then check the record count on replica1 and replica2. Replica1 shows the expected count of 1,000,001, while replica2 has a replay paused status of “t” and a record count of 1,000,000, indicating that the newly inserted record has not been replicated to it due to the pause in replication.

On primary server:  Count and insert records in table1.

We can verify whether the newly inserted record has been replicated to replica1 and replica2. Since replication is paused on replica2, it will not have the record we inserted.

Replica1:

Replica2: 

 

Step 5: Note the replay LSN for replica2 (e.g., 0/1C8BEC40) on the primary

 

Step 6: Dump only the data for table1 from replica2 as we already have the schema on the target node

 

Step 7: Restore the data on the target node

NOTE: The custom compression format employed by pg_dump has several advantages, such as reducing the size of dump files and speeding up backup and restore processes. It also enables restoring multiple tables simultaneously by utilizing the jobs switch. However, when restoring data from a dump file, it’s essential to include constraints, if they exist, on the target tables after restoring the data. This can be done by using the section=post-data switch.

Step 8: Resume the logical replication worker process to capture continuous data changes for table1

On the primary node, there may be more data in table1 compared to replica2 and the target logical replication node. Once we enable logical replication, the delta data (i.e., the changes that occur after the logical replication is set up) will be replicated by the logical replication worker to the target server. This ensures that any new changes made to table1 on the primary node will be accurately replicated to the target server through logical replication.

 

Step 9: On the target server, advance the replication origin on the target to the location from replay_lsn (gathered from replica2) in Step5 for the logical replica target

pg_replication_origin_advance is used for replication progress for the given node to the given location. This is primarily useful for setting up the initial location or setting a new location after configuration changes and similar.

Step 10: Enable the subscription

After successfully enabling the subscription, we can observe that the changes in table1 are being logically replicated to the target server.

This confirms that the data in table1 is being replicated from the source to the target server through logical replication. Going forward, any changes made to table1 on the source server will be replicated to the target server as well. It’s important to note that resuming replication for replica2 to ensure that the changes are properly applied. This can be done using the pg_wal_replay_resume() function in PostgreSQL:

When considering the implementation of this method for sizable tables, it is essential to keep a close eye on the replication slot age to avert any potential wraparound complications. Furthermore, refrain from utilizing the count(*) function on extensive tables, as it can consume a lot of resources and take up a significant amount of time.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
v0112358

replay_lsn in the step 5 and step 9 is different.