An Argument for Logical Failover Slots

These days, the typical PostgreSQL cluster consists not only of a multi-node replication cluster, which is ordinarily an asynchronous streaming replication model but can sometimes include a logical replication component.

logical replication postgresql

Recall that logical replication works by using the PUB/SUB mode, where individual tables are published and are then subscribed by remotely connected databases/tables. DML operations, such as INSERT, UPDATE, DELETE, TRUNCATE, etc., are then replicated from one or more publishers to one or more subscribers.

For the most part it’s a pretty straightforward operation. The only real limitation is that one can only publish tables from a read-write server such as the PRIMARY.

There’s a problem, however, when it comes to executing failovers. While a STANDBY/REPLICA host can quickly take over with a simple select pg_promote() SQL statement, the same cannot be said regarding logical replication. Instead, failing over is, in fact, not a single action but a collection of discrete steps that must be performed in an exact and precise manner.

Consider the following steps typifying a failover:

Promote the REPLICA, thus becoming the new PRIMARY.
When present, redirect other REPLICAs pointing to the new PRIMARY.
Failover the logical replication slot:
1. Block all logical replication from the failed PRIMARY.
2. Create a logical replication slot on the newly promoted PRIMARY.
3. Alter the subscription and point to the newly promoted PRIMARY.
4. Resume/restore logical replication between publisher and subscriber.

The challenge, of course, is reconstituting the logical replication process as quickly as possible, ideally without any data loss.

Two common solutions come to mind:

Block all incoming connections before promoting the STANDBY while simultaneously installing the logical replication slot.
Promote the STANDBY and then flush and recopy the entire contents of the subscribed table(s).

In both cases, these solutions make it problematic for a fast recovery, not to mention the additional effort required by the SRA/DBA/DEV preparing the requisite instructions.

failover postgresql

In an ideal world, the perfect implementation of a logical slot failover would have one already in place on the STANDBY, thus guaranteeing data consistency without fear of data loss no matter how fast or slow the actual promotion or altering the subscription’s connectivity parameters may take.

The good news is that there are hacks that can speed up the entire recovery process, although they all have their own quirky limitations. Patroni, for example, implements one such technique by copying the file named state, which is located in the data cluster’s subdirectory pg_repslot, over to the REPLICA’s data cluster in the same location.

# PGDATA on an Ubuntu install
/var/lib/postgresql/15/main/pg_replslot/pg3
└── state

# PGDATA on an Ubuntu install

/var/lib/postgresql/15/main/pg_replslot/pg3

└── state

However, the caveat is that it requires a full server restart to the REPLICA, in addition to being promoted, before the logical slot is fully active. And, of course, timing is critical.

failover desired state postgresql

So there you have it.

Too bad such a feature doesn’t exist… or does it?

Stay tuned folks; the game is afoot. 😉

For the curious, I’ve included some references in past blogs, courtesy of Jobin:

And here’s another set of references: a valiant attempt to incorporate this feature into PostgreSQL a few years ago.

PostgreSQL wiki: here
Part of a mail thread back in 2016: here
An old blog located on http://webarchive.org from a now-defunct company: here

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open source community in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Andrey

1 year ago

Did you try https://github.com/x4m/pg_tm_aux ? It allows you to recreate slots after faillover.

Robert Bernier

Reply to Andrey

1 year ago

Hi Andrey,
Apologies for not responding sooner but I was on vacation when this blog was published. I’ll certainly look into this project you so kindly provided!

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

An Argument for Logical Failover Slots

Related

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

An Argument for Logical Failover Slots

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation