Our previous blogs discussed configuring and setting up backups using the pgBackRest solution. To briefly explain pgBackRest, it is an open source backup tool that takes FULL Backup, Incremental Backup, and Differential Backup for PostgreSQL databases.

Repository means the location/path on the server or the cloud where the actual copy of the backup will reside. In this blog, we will specifically discuss one of the important features of the pgBackRest called Multiple Repository (or, in short, Multi Repo). This attribute helps take the redundant copies of the databases at multiple locations remotely on different servers or locally on the same server.

Let’s discuss a few scenarios one by one.

Scenarios:
1. Behavior of pgBackRest with single repo
2. Behavior of pgBackRest with multiple repos
2.1. Configuring archives redundantly (async=y)
2.2. Taking backup locally with multiple repos
2.3. Taking backup locally and remotely on the cloud
2.4. Take backup locally and in multiple clouds

Pre-configured Setup:
>PostgreSQL installed and configured on the database host.
>pgBackRest is installed and configured on a dedicated backup and database host.

Scenario – 1: Behavior of pgBackRest with single repo

By default, pgBackRest takes backups in a single repository or location, which can reside locally on the same server, remote server, or cloud.

Below is the configuration file, which is normally configured for taking backups with single repo:

Backup Host:DB Host:
[global]
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186
[global]
repo1-host=172.31.54.194
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
log-level-console=info
log-level-file=debug
[pgstanza]
pg1-path=/var/lib/postgresql/15/main

 

This configuration file contains details about the global section having common parameters like repo details, log details, etc. Even though there is a single repository, the parameter’s name starts from “repo1” to accommodate as many repositories as possible. Here, the backup will be stored on the local backup host at the /var/lib/pgbackrest_repo1 path that is already created with the proper user (in this case, Postgres) and permissions.

[pgstanza] is the name of the stanza for which backup is taken. For the sake of simplicity, we are considering backup for one DB Server only.

Let’s take the backup using the pgbackrest command:

On the Backup Host:

postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup
2023-03-23 04:19:14.059 P00 INFO: backup command begin 2.44: --exec-id=157866-5813ef0e --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo1-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-03-23 04:19:15.315 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2023-03-23 04:19:15.822 P00 INFO: backup start archive = 000000010000000000000039, lsn = 0/39000028
2023-03-23 04:19:15.822 P00 INFO: check archive for prior segment 000000010000000000000038
2023-03-23 04:19:23.184 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
2023-03-23 04:19:23.386 P00 INFO: backup stop archive = 000000010000000000000039, lsn = 0/39000138
2023-03-23 04:19:23.394 P00 INFO: check archive for segment(s) 000000010000000000000039:000000010000000000000039
2023-03-23 04:19:23.711 P00 INFO: new backup label = 20230323-041915F
2023-03-23 04:19:23.790 P00 INFO: full backup size = 22.0MB, file total = 961
2023-03-23 04:19:23.790 P00 INFO: backup command end: completed successfully (9733ms)
2023-03-23 04:19:23.791 P00 INFO: expire command begin 2.44: --exec-id=157866-5813ef0e --log-level-console=info --log-level-file=debug --repo1-path=/var/lib/pgbackrest_repo1 --repo1-retention-full=2 --stanza=pgstanza
2023-03-23 04:19:23.792 P00 INFO: repo1: expire full backup 20230323-040330F
2023-03-23 04:19:23.806 P00 INFO: repo1: remove expired backup 20230323-040330F
2023-03-23 04:19:23.829 P00 INFO: repo1: 15-1 remove archive, start = 000000010000000000000035, stop = 000000010000000000000036
2023-03-23 04:19:23.830 P00 INFO: expire command end: completed successfully (39ms)

As shown below, the backup goes into the local directory, and inside that, we have two directories as below:

>backup – contains backup when FULL/INCR/DIFF backup is taken
>archive – contains archives that help in PITR

postgres@ip-172-31-54-194:~$ cd /var/lib/pgbackrest_repo1
postgres@ip-172-31-54-194:/var/lib/pgbackrest_repo1$ ls -ltr
total 8
drwxr-x--- 3 postgres postgres 4096 Feb 15 13:16 archive
drwxr-x--- 3 postgres postgres 4096 Feb 15 13:16 backup
postgres@ip-172-31-54-194:/var/lib/pgbackrest_repo1$

Scenario – 2: Behavior of pgBackRest with multiple repo

Multiple Repository (or multi-repo) functionalities of pgBackRest support different combinations of storing redundant backup copies. In this section, we have discussed a few of the most useful combinations where multiple backup copies can be stored.

2.1 Configuring Asynchronous archiving (archive-async=y)
Wal files redundancy in the two different repos is possible by using asynchronous archiving.
This (archive-async=y) parameter allows the archive-push and archive-get commands to work asynchronously.

When this parameter is enabled, the pgBackRest will copy the wal files into both the repos. The example below shows that the pgBackRest is archiving the wal files into two different repos.

--repo1-path=/var/lib/pgbackrest_repo1
--repo2-path=/var/lib/pgbackrest_repo2

If the first repo is unavailable and the second repo is available, then, in this case, the pgBackRest will continue copying/archiving the wal files to the second repo. Still, It will accumulate those wal files in the pg_wal directory not archived in the first repo.

A spool path (spool-path) parameter is necessary when asynchronous archiving is enabled. The current WAL archiving status is getting stored in the spool path.

We can track the activities of the asynchronous process in the [stanza]-archive-push-async.log file.

On the Backup Host:
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info check
2023-03-23 04:35:59.074 P00 INFO: check command begin 2.44: --exec-id=158656-d4a8f71e --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --stanza=pgstanza
2023-03-23 04:35:59.920 P00 INFO: check repo1 configuration (primary)
2023-03-23 04:35:59.921 P00 INFO: check repo2 configuration (primary)
2023-03-23 04:36:00.124 P00 INFO: check repo1 archive for WAL (primary)
2023-03-23 04:36:01.327 P00 INFO: WAL segment 00000001000000000000003C successfully archived to '/var/lib/pgbackrest_repo1/archive/pgstanza/15-1/0000000100000000/00000001000000000000003C-6aa2de4dca50db51592d139010bdfb7a8c2c45ce.gz' on repo1
2023-03-23 04:36:01.328 P00 INFO: check repo2 archive for WAL (primary)
2023-03-23 04:36:01.328 P00 INFO: WAL segment 00000001000000000000003C successfully archived to '/var/lib/pgbackrest_repo2/archive/pgstanza/15-1/0000000100000000/00000001000000000000003C-6aa2de4dca50db51592d139010bdfb7a8c2c45ce.gz' on repo2
2023-03-23 04:36:01.430 P00 INFO: check command end: completed successfully (2358ms)
postgres@ip-172-31-54-194:~$

2.2 Taking backup locally with multiple repos:
In this example, we have tried to create two repositories in the local backup server itself, namely pgbackrest_repo1 and pgbackrest_repo2. One can configure both repositories in different storage. In case one storage is unavailable, then another storage will still have a backup, which can be helpful.

Please find the example of pgbackrest.conf in the backup host with two local repositories:

Backup Host:DB Host:
[global]
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user = postgres
repo2-path=/var/lib/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user = postgres
archive-async=y
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186
[global]
repo1-host=172.31.54.194
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user = postgres
repo2-host=172.31.54.194
repo2-path=/var/lib/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user = postgres
archive-async=y
spool-path=/var/spool/pgbackrest
log-level-console=info
log-level-file=debug
[pgstanza]
pg1-path=/var/lib/postgresql/15/main

Let’s rename pgbackrest_repo1 so that it becomes inaccessible, and then let’s try to take the backup:

On the Backup Host:

ubuntu@ip-172-31-54-194:~$ sudo mv /var/lib/pgbackrest_repo1 /var/lib/pgbackrest_repo1_bkp
ubuntu@ip-172-31-54-194:~$ sudo su - postgres
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup
2023-02-10 12:40:27.160 P00 INFO: backup command begin 2.44: --exec-id=23422-c65cc1d9 --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-02-10 12:40:27.161 P00 INFO: repo option not specified, defaulting to repo1
ERROR: [055]: unable to load info file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info' or '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info.copy':
FileMissingError: unable to open missing file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info' for read
FileMissingError: unable to open missing file '/var/lib/pgbackrest_repo1/backup/pgstanza/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
2023-02-10 12:40:27.162 P00 INFO: backup command end: aborted with exception [055]
postgres@ip-172-31-54-194:~$

As shown above, it cannot take the backup and throws the error message highlighted, which is expected.

Let’s try to take a backup in repo=2 and check whether it allows us to do so.

On the Backup Host:

postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info --type=full backup --repo=2
2023-02-10 12:40:34.605 P00 INFO: backup command begin 2.44: --exec-id=23423-e840ad8d --log-level-console=info --log-level-file=debug --pg1-host=18.210.15.186 --pg1-path=/var/lib/postgresql/15/main --repo=2 --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza --start-fast --type=full
2023-02-10 12:40:35.949 P00 INFO: execute non-exclusive backup start: backup begins after the requested immediate checkpoint completes
2023-02-10 12:40:36.456 P00 INFO: backup start archive = 000000010000000000000028, lsn = 0/28000028
2023-02-10 12:40:36.456 P00 INFO: check archive for prior segment 000000010000000000000027
2023-02-10 12:40:43.993 P00 INFO: execute non-exclusive backup stop and wait for all WAL segments to archive
2023-02-10 12:40:44.195 P00 INFO: backup stop archive = 000000010000000000000028, lsn = 0/28000138
2023-02-10 12:40:44.201 P00 INFO: check archive for segment(s) 000000010000000000000028:000000010000000000000028
2023-02-10 12:40:45.521 P00 INFO: new backup label = 20230210-124035F
2023-02-10 12:40:45.579 P00 INFO: full backup size = 22.0MB, file total = 961
2023-02-10 12:40:45.580 P00 INFO: backup command end: completed successfully (10978ms)
2023-02-10 12:40:45.580 P00 INFO: expire command begin 2.44: --exec-id=23423-e840ad8d --log-level-console=info --log-level-file=debug --repo=2 --repo1-path=/var/lib/pgbackrest_repo1 --repo2-path=/var/lib/pgbackrest_repo2 --repo1-retention-full=2 --repo2-retention-full=2 --stanza=pgstanza
2023-02-10 12:40:45.592 P00 INFO: repo2: 15-1 remove archive, start = 000000010000000000000020, stop = 000000010000000000000025
2023-02-10 12:40:45.592 P00 INFO: expire command end: completed successfully (12ms)
postgres@ip-172-31-54-194:~$

Excellent… the backup was successful for repo2. Now, let’s check the info and see what it says for repo1.

On the Backup Host:

postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info info
stanza: pgstanza
status: mixed
repo1: error (missing stanza path)
repo2: ok
cipher: none
db (current)
wal archive min/max (15): 000000010000000000000026/000000010000000000000028
full backup: 20230210-123819F
timestamp start/stop: 2023-02-10 12:38:19 / 2023-02-10 12:38:28
wal start/stop: 000000010000000000000026 / 000000010000000000000026
database size: 22.0MB, database backup size: 22.0MB
repo2: backup set size: 2.9MB, backup size: 2.9MB
full backup: 20230210-124035F
timestamp start/stop: 2023-02-10 12:40:35 / 2023-02-10 12:40:44
wal start/stop: 000000010000000000000028 / 000000010000000000000028
database size: 22.0MB, database backup size: 22.0MB
repo2: backup set size: 2.9MB, backup size: 2.9MB
postgres@ip-172-31-54-194:~$
postgres@ip-172-31-54-194:~$ pgbackrest --stanza=pgstanza --log-level-console=info info --repo=1
stanza: pgstanza
status: error (missing stanza path)
postgres@ip-172-31-54-194:~$

2.3 Taking backup locally and remotely on the cloud:
Let’s consider a scenario where one repo is locally available on the dedicated backup server, and another repo is available on the cloud. The advantage here is that in case anyone repo from a local system or cloud is unavailable, it can be availed. This combination can help us to take advantage of the cloud and on-prem local machines.

Let’s check the main configuration needed in pgbackrest.conf:

Backup Host:DB Host:
[global]
## Repo1: Local
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
## Repo2: AWS S3
repo2-type=s3
repo2-path=/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user=postgres
repo2-s3-bucket=s3bucket
repo2-s3-endpoint=s3.us-east-1.amazonaws.com
repo2-s3-key=accessKey2
repo2-s3-key-secret=verySecretKey2
repo2-s3-region=us-east-1
archive-async=y
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186
[global]
## Repo1: Local
repo1-host=172.31.54.194
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user=postgres
## Repo2: AWS S3
repo2-type=s3
repo2-path=/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user=postgres
repo2-s3-bucket=s3bucket
repo2-s3-endpoint=s3.us-east-1.amazonaws.com
repo2-s3-key=accessKey2
repo2-s3-key-secret=verySecretKey2
repo2-s3-region=us-east-1
archive-async=y
spool-path=/var/spool/pgbackrest
log-level-console=info
log-level-file=debug
[pgstanza]
pg1-path=/var/lib/postgresql/15/main

 

As we can see, repo1 related options are specific to storing the backup in the local repository present in the dedicated backup host on the path – /var/lib/pgbackrest_repo1.A few important options for repo2, available in the AWS S3:repo-type is s3 indicating the AWS S3, and it could be azure for Azure Cloud, GCS for Google Cloud,repo2-s3-bucket, repo2-s3-endpoint, repo2-s3-key-secret, and repo2-s3-region attributes varies from cloud to cloud. A bucket or required repo with proper user and permission must be created before configuring pgBackRest backups. More information on the same can be found in pgBackRest User Guide.

2.4 Take backup locally and in multiple clouds:
Another very useful scenario is creating a repository on multiple clouds and one locally in the dedicated backup host. Even if one cloud provider is unavailable, a backup could be available from any other cloud or the local repository. In this case, a configuration could be like repo1-type, repo2-type, repo3-type, and so on.

PostgreSQL backup locally and in multiple clouds

In the above diagram, four repositories have been created where one repository is available locally, and other repositories are at different clouds viz AWS S3, Azure, and Google Cloud, respectively. In this case, the configuration on the backup host will be as follows.

On the Backup Host:

postgres@ip-172-31-54-194:~$ cat /etc/pgbackrest.conf
[global]
## Repo1: Local
repo1-path=/var/lib/pgbackrest_repo1
repo1-retention-full=2
repo1-host-user = postgres
## Repo2: AWS S3
repo2-type=s3
repo2-path=/pgbackrest_repo2
repo2-retention-full=2
repo2-host-user = postgres
repo2-s3-bucket=s3bucket
repo2-s3-endpoint=s3.us-east-1.amazonaws.com
repo2-s3-key=accessKey2
repo2-s3-key-secret=verySecretKey2
repo2-s3-region=us-east-1
## Repo3: Azure
repo3-type=azure
repo3-path=/pgbackrest_repo3
repo3-retention-full=2
repo3-azure-account=pgbackrest
repo3-azure-container=pgbackrest-container
repo3-azure-key=accessKey3
## Repo4: Google Cloud
repo4-type=gcs
repo4-path=/pgbackrest_repo4
repo4-retention-full=2
repo4-gcs-bucket=pgbackrest-bucket
repo4-gcs-key=/etc/pgbackrest/gcs-key.json
archive-async=y
log-level-console=info
log-level-file=debug
start-fast=y
[pgstanza]
pg1-path=/var/lib/postgresql/15/main
pg1-host=18.210.15.186

DB Host configurations will be set in the same way that has been mentioned in the earlier sections, along with the multiple repository details of the cloud.

Conclusion

To conclude, the major advantage of pgBackRest multi-repo functionality is that redundant backup copies can be taken. With the async=y option – archives will move to multiple repositories, and in case the default repo is unavailable, then the second repository will take care of archive files automatically.

The only limitation of this feature is that, by default, the backup will not go to repo=2 even though it is configured. We need to mention the repository number in case a backup needs to be taken in repo2. Also, one needs to take backup multiple times, mentioning the repo number explicitly so that backup can be taken in repositories other than the default repo. Eg – pgbackrest –stanza=pgstanza –log-level-console=info –type=full backup –repo=2.

Despite these limitations, multi-repo functionality can be used to take the backups on the secondary repo, even if the dedicated backup server is unavailable.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments