How To Start Logical Replication in PostgreSQL for Specific Tables Based on a pg

In simpler terms, logical replication in PostgreSQL is a way to copy data from one database to another in a more flexible and customizable manner compared to physical replication. Instead of copying the entire database, logical replication focuses on replicating changes made to individual rows or transactions. To set up logical replication, you create a publication on the source database and a subscription on the target database. When you enable the subscription, a process called the logical replication launcher identifies it and starts an application worker process. The worker then synchronizes the tables by launching table-sync workers, which copy the changes from the source to the target.

This blog post will examine an alternative method for conducting the initial data synchronization in PostgreSQL. Instead of utilizing the table synchronization process by accelerating the synchronization process through a data dump, we will explore the usage of pg_dump and pg_restore.

Lab environment – The lab environment contains below servers:

One primary and two replicas running on Postgres version 12
One server running on PostgreSQL version 15

Server Details	IP Address	PostgreSQL Version
Primary	10.0.0.25	12
Replica1	10.0.0.80	12
Replica2	10.0.0.40	12
Logical replica (Target Server)	10.0.0.221	15

In this lab, I’ve set up two replicas to make this approach work. I needed to pause the WAL replay process, so having two replicas, including replica2, helps ensure that ongoing replication, which may be important for meeting business requirements, isn’t affected by this approach.

Step 1: Create a publication on the primary server
The first step is to create a publication on the primary server, which defines the tables that will be replicated. Connect to the primary server using the PostgreSQL command line tool psql, and execute the following commands

percona=# create publication my_pub;

CREATE PUBLICATION

percona=# alter publication my_pub add table table1;

ALTER PUBLICATION

These commands create a publication named my_pub and add the table table1 to it. The publication is now ready to replicate changes made to table1 on the primary server.

Step 2: Create a logical replication slot on the primary server
Next, we need to create a logical replication slot on the primary server to capture the changes made to the table1. Execute the following command in the psql console on the primary server:

percona=# SELECT pg_create_logical_replication_slot('my_pub_slot', 'pgoutput');

pg_create_logical_replication_slot

------------------------------------

(my_pub_slot,0/1C8BEB58)

This command creates a logical replication slot named my_pub_slot using the pgoutput output plugin, which is a built-in output plugin for logical replication in PostgreSQL. The slot will capture the changes made to the table1 and store them for replication.

Step 3: Create a subscription on the target server
On the target logical replication server, we need to create a subscription that connects to the primary server and replicates the changes made to table1. Connect to the target server using psql, and execute the following commands. Note that we have omitted the password from the command, as it is located in the .pgpass file

1 2	percona=# CREATE SUBSCRIPTION my_pub_subscription CONNECTION 'host=10.0.0.25 port=5432 dbname=percona' PUBLICATION my_pub WITH (copy_data = false, create_slot=false, enabled=false, slot_name=my_pub_slot); CREATE SUBSCRIPTION

This command creates a subscription named my_pub_subscription that connects to the primary server at IP address 10.0.0.25, using the Percona database, and replicates the publication my_pub, which includes table1. The subscription is created with the following options:

copy_data=false: This option specifies that we will not copy the data from the publisher using logical replication, but instead, we will use the pg_dump tool to copy the data later.
create_slot=false: This option specifies that we will not create a replication slot on the target server, as we have already created a logical replication slot on the primary server.
enabled=false: This option specifies that the subscription will not be enabled immediately, as we want to perform some additional steps before enabling it.
slot_name=my_pub_slot: This option specifies the name of the logical replication slot that we created on the primary server.

Step 4: Pause replication on replica2::10.0.0.40 to pg_dump a data snapshot for table1 and restore it on the target logical replication node

On replica2:

postgres@ip-10-0-0-40:~$ psql -d percona -c 'select pg_wal_replay_pause() '

pg_wal_replay_pause

---------------------

(1 row)

postgres@ip-10-0-0-40:~$ psql -d percona -c 'select pg_is_wal_replay_paused() '

pg_is_wal_replay_paused

-------------------------

(1 row)

NOTE: For additional details on how to use and apply pg_dump, I suggest taking a look at my colleague’s blog posts, which covers several topics, including simple backup and restore, PostgreSQL upgrade using pg_dump and pg_restore, PostgreSQL upgrade using pg_dumpall, and working with PostgreSQL dump manifests.

Now replication is paused on replica2, and transactions on the primary node won’t be replicated to replica2. To confirm whether the replay is paused on replica2, we insert a new record into table1 on the primary server, increasing the total count to 1,000,001. We then check the record count on replica1 and replica2. Replica1 shows the expected count of 1,000,001, while replica2 has a replay paused status of “t” and a record count of 1,000,000, indicating that the newly inserted record has not been replicated to it due to the pause in replication.

On primary server: Count and insert records in table1.

postgres@ip-10-0-0-25:~$ psql -d percona -tc 'select count(*) from table1'

1000000

percona=# insert into table1(name,age) values('KB','29');

INSERT 0 1

percona=# select count(*) from table1;

count

---------

1000001

(1 row)

We can verify whether the newly inserted record has been replicated to replica1 and replica2. Since replication is paused on replica2, it will not have the record we inserted.

Replica1:

1 2	postgres@ip-10-0-0-80:~$ psql -d percona -U postgres -tc 'select count(*) from table1;' 1000001

Replica2:

postgres@ip-10-0-0-40:~$ psql -p 5434 -d percona -U postgres -tc 'select pg_is_wal_replay_paused();'

postgres@ip-10-0-0-40:~$ psql -p 5434 -d percona -U postgres -tc 'select count(*) from table1;'

1000000

Step 5: Note the replay LSN for replica2 (e.g., 0/1C8BEC40) on the primary

postgres@ip-10-0-0-25:~$ psql -d percona -c 'select client_addr, application_name, replay_lsn from pg_stat_replication where application_name = 'replica2';'

client_addr | application_name | replay_lsn

-------------+------------------+------------

10.0.0.40 | replica2 | 0/1C8BEC40

(2 rows)

Step 6: Dump only the data for table1 from replica2 as we already have the schema on the target node

1	postgres@ip-10-0-0-40:~$ pg_dump -d percona --data-only --table=table1 -Fc > table1_dmp.db

Step 7: Restore the data on the target node

1	pg_restore -h 10.0.0.221 -U postgres -d percona table1_dmp.db

NOTE: The custom compression format employed by pg_dump has several advantages, such as reducing the size of dump files and speeding up backup and restore processes. It also enables restoring multiple tables simultaneously by utilizing the jobs switch. However, when restoring data from a dump file, it’s essential to include constraints, if they exist, on the target tables after restoring the data. This can be done by using the section=post-data switch.

Step 8: Resume the logical replication worker process to capture continuous data changes for table1

On the primary node, there may be more data in table1 compared to replica2 and the target logical replication node. Once we enable logical replication, the delta data (i.e., the changes that occur after the logical replication is set up) will be replicated by the logical replication worker to the target server. This ensures that any new changes made to table1 on the primary node will be accurately replicated to the target server through logical replication.

postgres@ip-10-0-0-25:~$ psql -d percona -c 'select count(*) from table1'

count

---------

1000100

(1 row)

Step 9: On the target server, advance the replication origin on the target to the location from replay_lsn (gathered from replica2) in Step5 for the logical replica target

percona=# select roident, subname, roname from pg_subscription sub, pg_replication_origin ro where 'pg_' || sub.oid = ro.roname;

roident | subname | roname

---------+----------------------+----------

1 | my_pub_subscription | pg_16407

(1 row)

percona=# select pg_replication_origin_advance('pg_16407', '0/1C8BEC40');

pg_replication_origin_advance

-------------------------------

(1 row)

pg_replication_origin_advance is used for replication progress for the given node to the given location. This is primarily useful for setting up the initial location or setting a new location after configuration changes and similar.

Step 10: Enable the subscription

1 2	percona=# ALTER SUBSCRIPTION my_pub_subscription ENABLE; ALTER SUBSCRIPTION

After successfully enabling the subscription, we can observe that the changes in table1 are being logically replicated to the target server.

postgres@ip-10-0-0-221:~$ psql -d percona

psql (15.2 (Ubuntu 15.2-1.pgdg22.04+1))

Type "help" for help.

percona=# SELECT COUNT(*) FROM table1;

-[ RECORD 1 ]--

count | 1000100

This confirms that the data in table1 is being replicated from the source to the target server through logical replication. Going forward, any changes made to table1 on the source server will be replicated to the target server as well. It’s important to note that resuming replication for replica2 to ensure that the changes are properly applied. This can be done using the pg_wal_replay_resume() function in PostgreSQL:

postgres=# SELECT pg_wal_replay_resume();

pg_wal_replay_resume

----------------------

(1 row)

When considering the implementation of this method for sizable tables, it is essential to keep a close eye on the replication slot age to avert any potential wraparound complications. Furthermore, refrain from utilizing the count(*) function on extensive tables, as it can consume a lot of resources and take up a significant amount of time.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

v0112358

11 months ago

replay_lsn in the step 5 and step 9 is different.

Bhargav Kamineni

Author

Reply to v0112358

11 months ago

Hi Thanks for pointing this out, It appears to be a copy-paste error, corrected it!

Last edited 11 months ago by Bhargav Kamineni

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How To Start Logical Replication in PostgreSQL for Specific Tables Based on a pg_dump

Related

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

PostgreSQL Database Security Best Practices

Did MyDumper LIKE Triggers?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

How To Start Logical Replication in PostgreSQL for Specific Tables Based on a pg_dump

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

PostgreSQL Database Security Best Practices

Did MyDumper LIKE Triggers?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation