In Using Percona Backup for MongoDB in Replica Set and Sharding Environment: Part One, I demonstrated a basic Percona Backup for MongoDB (PBM) setup under the Replica Set and Sharding environment. Now, here we will see some advanced stuff and other backup/restore options available with PBM.

Let’s discuss each one.

Taking backups on remote storage (AWS S3/Google Buckets):

In order to take backups on remote cloud storage such as Google Bucket or S3, we can define the below configurations in the PBM configuration file:- [/etc/pbm_config.yaml].

Once we reload the configurations, we are good to take our backup on the cloud.

 

Bucket/storage:

Taking MongoDB backups on remote storage

Tweaking physical backup download/restore process:

For physical backups, we have some options to tweak them in order to make the download/restoration process a little faster based on our hardware resources.

In the PBM configuration file[/etc/pbm_config.yaml] we can define the below options in the restore section.

 

  • numDownloadWorkers – The number of workers to download data from the storage. By default, it equals the number of CPU cores.
  • maxDownloadBufferMb – The maximum size of the memory buffer to store the downloaded data chunks for decompression and ordering. It is calculated as numDownloadWorkers * downloadChunkMb * 16
  • downloadChunkMb – The size of the data chunk to download (by default, 32 MB)

 

Doing incremental backups:

Incremental backups are supported for physical type backups only. Also It works only with Percona Server for MongoDB (PSMDB) as the upstream MongoDB Community version doesn’t have support for physical backups yet. During incremental backups, Percona Backup for MongoDB saves only the data that was changed after the previous backup was taken.

In order to run the PBM incremental backups, we need to have a base incremental backup as a seed.

Now, we can take further Incremental backups as below.

The restore approach will be the same as we do for full backups. All we need to do is run the below command.

Note:- PBM automatically recognizes the backup type, finds the base Incremental backup, restores the data from it, and then restores the modified data from applicable incremental backups.

Additionally, we have to take a few considerations in the case of physical backup restoration. We have to perform below additional steps after the restoration.

  • Restart all mongod nodes and pbm-agents.
  • Resync the backup list from the storage using “pbm config –force-resync –file/etc/pbm_config.yaml”.
  • Start the balancer and the mongos nodes.

Doing PITR via oplog events:

PBM also supports PITR (point in time recovery) via oplog. When PITR is enabled, we can see the oplog slices based on the value of [oplogSpanMin], which by default is (10mins). So, the first chunk will appear after 10 min.

Let’s see how we can enable the PITR via the command line.

In the configuration file [/etc/pbm_config.yaml] we can define the same as below.

So we have now the below PITR chunks available.

In order to restore the PITR we can run the below steps.

A) Stop point-in-time recovery if enabled.

B) Restore the oplog as per the required point-in-time.

Also we can use the direct restore command by specifying the required point-in-time. This will automatically fetch the events based on the available oplog slices.

Once the restoration is complete, we can again enable the PITR  as below.

  • Perform a fresh backup to serve as the starting point for oplog updates.

  • Enable point-in-time recovery to resume saving oplog slices.

Performing partial backups (Technical Preview):

PBM also supports selective/partial backup specific to the collection.

So, here we are taking backup for the collection [emp] residing in [test] schema.

Also, we can take the entire collection backups inside a database using the below command.

We can restore the selective backup with the help of the below command.

Deciding the backup node or setting the node priority:

PBM backup, by default, will use the secondary nodes for backup based on election, and in case no secondaries respond, then the backup will be initiated on the primary.  We can also control the election behavior by defining a priority for Mongo nodes in the configuration file [/etc/pbm_config.yaml].

Then, apply the changes.

Note:-  The other remaining nodes will be automatically assigned priority 1.0. The node with the highest priority initiates the backup. If that node is unavailable, the next priority node is selected. If there are several nodes with the same priority, one of them is randomly elected to make the backup.

Hidden nodes will always have a higher priority in comparison to other secondary nodes if we do not set any priority explcitly.

With the help of the [describe-backup] command, we can also verify the node ran and kept the backup.

Output:

Using PBM snapshot-based physical backups (Technical Preview):

PBM also provided an easy interface/mechanism to perform snapshots OR point-in-time copies of physical files. Snapshot-based backups are useful in the case of large data sets with terabytes of data, as the restoration is quite fast and allows immediate access to data.

The flow of snapshot-based backup would be as below:

  • Preparing the database — done by PBM
  • Copying files — done by the user 
  • Completing the backup / restore — done by PBM.

Now, let’s see how we can perform the backup/restoration in case of snapshot-based backup.

Backup:

1) First, we will initiate/prepare a backup.

Output:

PBM does the following things behind the scenes:

  • Opens the $backupCursor
  • Prepares the database for file copy
  • Stores the backup metadata on the storage and adds it to the files to copy

2) Next, we can copy the MongoDB data directory contents to the target storage. In our case we used a simple copy command to the local storage as we had the complete setup on the local environment.

3) Now, we can close the running backup cursor.

Restoration:

Before we perform the restore steps we need to ensure the below things.

  • Shut down all mongos nodes. If you have set up the automatic restart of the database, disable it.
  • Stop the arbiter nodes manually since there’s no pbm-agent on these nodes to do that automatically.

1. Then, we can execute the restore command as below. Here PBM stops the database, cleans up data directories on all nodes, provides the restore name, and prompts you to copy the data.

Output:

So, post the event completion, the original data directory will be cleaned completely.

2. Now, we are good at copying that snapshot or physical file backup we took in the backup process.

Note:- Please also make sure the data directory has the [mongod] user permissions along with the Read/Write access.

3.  Post the completion of the data copy process we can close the restoration process as below.

Once all the above steps are done, we can perform the post-restoration steps.

  • Start all mongod nodes
  • Start all pbm-agents
  • Resync backup with storage via pbm config –force-resync
  • Start the balancer and start mongos nodes in case of a sharding environment.
  • Make a fresh backup to serve as the new base for future restores.

The database is accessible again successfully now once the service is up.

Conclusion

In part two, we have seen some other backup options available with PBM. Also, we talk about how we can perform point-in-time-recovery using Oplog events. Please note that Selective and Snapshot-based backups are still under the [Technical Review] phase, so it’s better to test them properly before considering using them in production.

Percona Distribution for MongoDB is a source-available alternative for enterprise MongoDB. A bundling of Percona Server for MongoDB and Percona Backup for MongoDB, Percona Distribution for MongoDB combines the best and most critical enterprise components from the open source community into a single feature-rich and freely available solution.

 

Download Percona Distribution for MongoDB Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments