Disaster Recovery for PostgreSQL on Kubernetes

Disaster recovery is not optional for businesses operating in the digital age. With the ever-increasing reliance on data, system outages or data loss can be catastrophic, causing significant business disruptions and financial losses.

With multi-cloud or multi-regional PostgreSQL deployments, the complexity of managing disaster recovery only amplifies. This is where the Percona Operators come in, providing a solution to streamline disaster recovery for PostgreSQL clusters running on Kubernetes. With the Percona Operators, businesses can manage multi-cloud or hybrid-cloud PostgreSQL deployments with ease, ensuring that critical data is always available and secure, no matter what happens.

In this article, you will learn how to set up disaster recovery with Percona Operator for PostgreSQL version 2.

Overview of the solution

Operators automate routine tasks and remove toil. For standby, Operator provides the following options:

pgBackrest repo-based standby
Streaming replication
Combination of (1) and (2)

We will review the repo-based standby as the simplest one:

1. Two Kubernetes clusters in different regions, clouds, or running in hybrid mode (on-prem + cloud). One is Main, and the other is Disaster Recovery (DR).

2. In each cluster, there are the following components:

1. Percona Operator
2. PostgreSQL cluster
3. pgBackrest
4. pgBouncer

3. pgBackrest on the Main site streams backups and Write Ahead Logs (WALs) to the object storage.

4. pgBackrest on the DR site takes these backups and streams them to the standby cluster.

Configure main site

Use your favorite method to deploy the Operator from our documentation. Once installed, configure the Custom Resource manifest so that pgBackrest starts using the Object Storage of your choice. Skip this step if you already have it configured.

Configure the backups.pgbackrest.repos section by adding the necessary configuration. The below example is for Google Cloud Storage (GCS):

spec:
  backups:
    configuration:
      - secret:
          name: main-pgbackrest-secrets
    pgbackrest:
      repos:
      - name: repo1
        gcs:
          bucket: MY-BUCKET

spec:

backups:

configuration:

- secret:

pgbackrest:

repos:

- name: repo1

gcs:

bucket: MY-BUCKET

main-pgbackrest-secrets contains the keys for GCS; please read more about the configuration in the backup and restore tutorial.

Once configured, apply the custom resource:

$ kubectl apply -f deploy/cr.yaml
perconapgcluster.pg.percona.com/main created

1 2	$ kubectl apply -f deploy/cr.yaml perconapgcluster.pg.percona.com/main created

The backups should appear in the object storage. By default, pgBackrest puts them into the pgbackrest folder.

Configure DR site

The configuration of the disaster recovery site is similar to the Main, with the only difference in standby settings.

The following manifest has standby.enabled set to true and points to the repoName where backups are (GCS in our case):

metadata:
  name: standby
spec: 
...
  backups:
    configuration:
      - secret:
          name: standby-pgbackrest-secrets
    pgbackrest:
      repos:
      - name: repo1
        gcs:
          bucket: MY-BUCKET
  standby:
    enabled: true
    repoName: repo1

metadata:

spec:

...

backups:

configuration:

- secret:

pgbackrest:

repos:

- name: repo1

gcs:

bucket: MY-BUCKET

standby:

enabled: true

repoName: repo1

Deploy the standby cluster by applying the manifest:

$ kubectl apply -f deploy/cr.yaml
perconapgcluster.pg.percona.com/standby created

1 2	$ kubectl apply -f deploy/cr.yaml perconapgcluster.pg.percona.com/standby created

Failover

In case of Main site failure or in other cases, you can promote the standby cluster. The promotion effectively allows writing to the cluster. This creates a net effect of pushing Write Ahead Logs (WALs) to the pgBackrest repository. It might create a split-brain situation where two primary instances attempt to write to the same repository. To avoid this, make sure the primary cluster is either deleted or shut down before trying to promote the standby cluster.

Once the primary is down or inactive, promote the standby by changing the corresponding section:

spec:
  standby:
    enabled: false

spec:

standby:

enabled: false

Now you can start writing to the cluster.

Split brain

There might be a case where your old primary comes up and starts writing to the repository. To recover from this situation, do the following:

Keep only one primary with the latest data running
Stop the writes on the other one
Take the new full backup from the primary and upload it to the repo

Automating the failover

Automated failover consists of multiple steps and is outside of the Operator’s scope. There are a few steps that you can take to reduce the Recovery Time Objective (RTO). To detect the failover, we recommend having a third site for monitoring both DR and Main. In this case, you can be sure that Main really failed, and it is not a network split situation.

Another aspect of automation is to switch the traffic for the application from Main to Standby after promotion. It can be done through various Kubernetes configurations and heavily depends on how your networking and application are designed. The following options are quite common:

Global Load Balancer – various clouds and vendors provide their solutions
Multi-cluster Services or MCS – available on most of the public clouds
Federation or other multi-cluster solutions

Conclusion

Percona Operator for PostgreSQL provides high availability for database clusters by design, making it a robust and production-ready solution for multi-AZ deployments. At the same time, business continuity protocols require disaster recovery plans in place where your vital processes and applications can survive regional outages. In this blog post, we saw how Kubernetes and Operators can simplify your DR design. Try it out yourself, and let us know your experience at the Community Forum.

For more information, visit Percona Operator for PostgreSQL v2 documentation page. For commercial support, please visit our contact page.

Try Percona Operator for PostgreSQL today!

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Disaster Recovery for PostgreSQL on Kubernetes

Overview of the solution

Configure main site

Configure DR site

Failover

Split brain

Automating the failover

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

How to Improve Database Performance: The Ultimate Guide

Switch PostgreSQL Environments Across AWS, GCP, and k3d Using Kubernetes Contexts

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Disaster Recovery for PostgreSQL on Kubernetes

Overview of the solution

Configure main site

Configure DR site

Failover

Split brain

Automating the failover

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

How to Improve Database Performance: The Ultimate Guide

Switch PostgreSQL Environments Across AWS, GCP, and k3d Using Kubernetes Contexts

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation