MongoDB Database Backup: Best Practices & Expert Tips

This blog was originally published in September 2020 and was updated in May 2023.

In today’s data-driven world, losing critical data can be catastrophic for any organization. As a MongoDB user, it’s crucial to ensure that your data is safe and secure in the event of a disaster or system failure. That’s why it’s essential to implement the best practices and strategies for MongoDB database backups.

Why are MongoDB database backups important?

Regular database backups are essential to protect against data loss caused by system failures, human errors, natural disasters, or cyber-attacks. In the absence of a proper backup strategy, the data can be lost forever, leading to significant financial and reputational damage.

For organizations that rely on data to operate, database backups are critical for business continuity. With a robust backup and recovery plan in place, companies can restore their systems and data quickly and minimize downtime, which is essential to maintain customer trust and avoid business disruption.

In this blog, we will be discussing different MongoDB database backup strategies and their use cases, along with pros and cons and a few other useful tips.

What are the two different types of MongoDB database backups?

Generally, there are two types of backups used with databases technologies like MongoDB:

Logical backups
Physical backups

Additionally, when working with logical backups, we have the option of taking incremental backups as well, where we capture the deltas or incremental data changes made between full backups to minimize the amount of data loss in case of any disaster.

We will be discussing these two backup options, how to proceed with them, and which one suits better depending on requirements and environment setup.

Also, we will take a look at our open-source backup utility custom-built to help avoid costs and proprietary software – Percona Backup for MongoDB or PBM. PBM is a fully-supported community backup tool capable of performing cluster-side consistent backups in MongoDB for replica sets and sharded clusters.

Logical backups

These are the types of backups where data is dumped from the databases into the backup files. A logical backup with MongoDB means you’ll be dumping the data into a BSON-formatted file.

During the logical backups, using client API, the data gets read from the server and returned back to the same API, which will be serialized and written into respective “.bson”, “.json” or “.csv” backup files on disk depending upon the type of backup utilities used.

MongoDB offers the below utility to take the logical backups

Mongodump: Takes dump/backup of the databases into “.bson” format, which can be later restored by replaying the same logical statements captured in dump files back to the databases.

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --db=demo --collection=events --out=/opt/backup/mongodump-2011-10-24

1	mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --db=demo --collection=events --out=/opt/backup/mongodump-2011-10-24

Note: If we don’t specify the DB name or Collection name explicitly in the above “mongodump” syntax then the backup will be taken for the entire database or collections, respectively. If “authorization” is enabled, then we must specify the “authenticationDatabase”

Also, you should use “–oplog” to take the incremental data while the backup is still running. We can specify “–oplog” with mongodump. Keep in mind that it won’t work with –db and –collection since it will only work for entire databases backups

mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --oplog --out=/opt/backup/mongodump-2011-10-24

1	mongodump --host=mongodb1.example.net --port=27017 --username=user --authenticationDatabase=admin --oplog --out=/opt/backup/mongodump-2011-10-24

Pros of logical backups

It can take the backup at a more granular level, like a specific database or a collection which will be helpful during restoration.
It does not require you to halt writes against a specific node where you will be running the backup. Hence, the node would still be available for other operations.

Cons of logical backups

As it reads all data, it can be slow and will require disk reads too for databases that are larger than the RAM available for the WT cache—the WT cache pressure increases, which slows down the performance.
It doesn’t capture the index data in the metadata backup file. Thus while restoring, all the indexes have to be built again for each collection after the collection has been reinserted. This will be done serially in one pass through the collection after the inserts have finished, which can add a lot of time for big collection restores.
The speed of backup also depends on allocated IOPS and type of storage since lots of read/writes would be happening during this process.
Logical backups such as mongodump are, in general, very time-consuming for large systems.

Best practice tip: It is always advisable to use secondary servers for backups to avoid unnecessary performance degradation on the PRIMARY node.

As we have different types of environment setups, we should approach each one of them like below.

Replica set: Always preferred to run on secondaries.
Shard clusters: Take a backup of config server replicaset and each shard individually using the secondary nodes of them.

Since we are discussing distributed database systems like sharded clusters, we should also keep in mind that we want to have consistency in our backups at a point in time. ( Replica sets backups using mongodump are generally consistent using “–oplog” )

Let’s discuss this scenario where the application is still writing data and cannot be stopped for business reasons. Even if we take a backup of the config server and each shard separately, the backups of each shard will finish at different times because of data volume, data distribution, load, etc. Hence, while restoring, some inconsistencies might occur for the same reason.

Now comes the restoration part when dealing with logical backups. Same as for backups, MongoDB provides the below utilities for restoration purposes.

Mongorestore: Restores dump files created by “mongodump”. Index recreation will take place only after the data is restored, which causes the use of additional memory resources and time.

mongorestore --host=mongodb1.example.net --port=27017 --username=user  --password --authenticationDatabase=admin --db=demo --collection=events /opt/backup/mongodump-2011-10-24/events.bson

1	mongorestore --host=mongodb1.example.net --port=27017 --username=user --password --authenticationDatabase=admin --db=demo --collection=events /opt/backup/mongodump-2011-10-24/events.bson

For the restore of the incremental dump, we can add –oplogReplay in the above syntax to replay the oplog entries as well.

Best practice tip: The “–oplogReplay” can’t be used with –db and –collection flag as it will only work while restoring all the databases.

Learn how to restore a MongoDB logical backup

Percona Backup for MongoDB

It is a distributed, low-impact solution for achieving consistent backups of MongoDB sharded clusters and replica sets. Percona Backup for MongoDB helps overcome the issues around consistency while taking backups of sharded clusters. Percona Backup for MongoDB is an uncomplicated command-line tool by design that lends itself well to backing up larger data sets. PBM uses the faster “s2” library and parallelized threads to improve speed and performance if extra threads are available as resources.

Some main advantages of PBM include the following:

Enables Backups with replica set and sharded cluster consistency via oplog capture
Provides Distributed transaction consistency with MongoDB 4.2+
Back up anywhere – to the cloud (use any S3-compatible storage) or on-premise with a locally-mounted remote file system
It allows you to choose which compression algorithms to use. In some internal experiments, the “s2” library with snappy compression running parallelized with multiple threads was significantly faster than regular gzip. Caveat: Good as long as you have the additional resources available for running the parallel threads.
Records backup progress logging. If you would like to see the speed of the backup (upload MB/s rate), you can look at the pbm-agent node’s logs to see the current progress. If you have a large backup, you can track backup progress in pbm-agent logs. A line is appended every minute, showing bytes copied vs. total size for the current collection.
PBM Allows for Point-in-Time Recoveries – restoring a database up to a specific moment. P-I-T-R’s restore data from a backup and then replay all actions that happened to the data up to the specified moment from oplog slices.
PITR’s help you prevent data loss during a disaster such as crashed database, accidental data deletion or drop of tables, and unwanted updates of multiple fields instead of a single one.
PBM is optimized to allow backups and has minimal impact on your production performance.

Best practice tip: Use PBM to time huge backup sets. Many people don’t realize how long it takes to back up very large data sets. And they are generally very surprised at how long it takes to restore them! Especially if going into or out of storage types that may throttle bandwidth/network traffic.

Best practice tip: When running PBM from an unsupervised script, we recommend using a replica set connection string. A direct or stand-alone style connection string will fail if that MongoDB host happens to be unavailable or down temporarily.

When a PBM backup is triggered, it tails and captures the oplog from config server replica set and all the shards while the backup is still running, thus providing consistency once the backup is completed.

It has a feature of taking incremental backups as well, apart from complete database backup with “PITR” parameter enabled. It does all this by running “pbm-agent” on the DB (“mongod”) nodes of the cluster and is responsible for the backups and restore purposes.

As we can see below, the “pbm list” command shows the complete list of backups in the backup snapshots section and the incremental backups in the “PITR” section.

Below is the sample output:

$ pbm list

Backup snapshots:
     2020-09-10T12:19:10Z
     2020-09-14T10:44:44Z
     2020-09-14T14:26:20Z
     2020-09-17T16:46:59Z
PITR <on>:
     2020-09-14T14:26:40 - 2020-09-16T17:27:26
     2020-09-17T16:47:20 - 2020-09-17T16:57:55

$ pbm list

Backup snapshots:

2020-09-10T12:19:10Z

2020-09-14T10:44:44Z

2020-09-14T14:26:20Z

2020-09-17T16:46:59Z

PITR <on>:

2020-09-14T14:26:40 - 2020-09-16T17:27:26

2020-09-17T16:47:20 - 2020-09-17T16:57:55

If you have a large backup, you can track backup progress in pbm-agent logs. Let’s take a look at the output of “pbm-agent” as well while it is taking the backup.

Aug 19 08:46:51 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:46:51 Got command backup [{backup {2020-08-19T08:46:50Z s2} { } { 0} 1597826810}]
Aug 19 08:47:07 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:47:07 [INFO] backup/2020-08-19T08:46:50Z: backup started
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.891+0000        writing admin.system.users to archive on stdout
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.895+0000        done dumping admin.system.users (2 documents)
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.895+0000        writing admin.system.roles to archive on stdout
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.904+0000        done dumping admin.system.roles (1 document)
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.904+0000        writing admin.system.version to archive on stdout
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.914+0000        done dumping admin.system.version (5 documents)
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.914+0000        writing testmongo.col to archive on stdout
Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.942+0000        writing test.collC to archive on stdout
Aug 19 08:47:13 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:13.499+0000        done dumping test.collC (1146923 documents)
Aug 19 08:47:13 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:13.500+0000        writing test.collA to archive on stdout
Aug 19 08:47:27 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:27.964+0000        done dumping test.collA (389616 documents)
Aug 19 08:47:27 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:27.965+0000        writing test.collG to archive on stdout
Aug 19 08:47:54 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:54.891+0000        done dumping testmongo.col (13280501 documents)
Aug 19 08:47:54 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:54.896+0000        writing test.collF to archive on stdout
Aug 19 08:48:09 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:48:09 [........................]  test.collG    1533/195563   (0.8%)
Aug 19 08:48:09 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:48:09 [####################....]  test.collF  116432/134747  (86.4%)
Aug 19 10:01:09 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:01:09 [#######################.]  test.collG  195209/195563  (99.8%)
Aug 19 10:01:17 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:01:17 [########################]  test.collG  195563/195563  (100.0%)
Aug 19 10:01:17 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T10:01:17.650+0000        done dumping test.collG (195563 documents)
Aug 19 10:01:20 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:01:20 [INFO] backup/2020-08-19T08:46:50Z: mongodump finished, waiting for the oplog
Aug 19 10:11:04 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:11:04 [INFO] backup/2020-08-19T08:46:50Z: backup finished
Aug 19 10:11:05 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:11:05 [INFO] pitr: streaming started from 2020-08-19 08:47:09 +0000 UTC / 1597826829
Aug 19 10:29:37 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:29:37 [INFO] pitr: created chunk 2020-08-19T08:47:09 - 2020-08-19T10:20:59. Next chunk creation scheduled to begin at ~2020-08-19T10:31:05
Aug 19 10:39:34 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:39:34 [INFO] pitr: created chunk 2020-08-19T10:20:59 - 2020-08-19T10:30:59. Next chunk creation scheduled to begin at ~2020-08-19T10:41:05

Aug 19 08:46:51 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:46:51 Got command backup [{backup {2020-08-19T08:46:50Z s2} { } { 0} 1597826810}]

Aug 19 08:47:07 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:47:07 [INFO] backup/2020-08-19T08:46:50Z: backup started

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.891+0000 writing admin.system.users to archive on stdout

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.895+0000 done dumping admin.system.users (2 documents)

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.895+0000 writing admin.system.roles to archive on stdout

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.904+0000 done dumping admin.system.roles (1 document)

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.904+0000 writing admin.system.version to archive on stdout

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.914+0000 done dumping admin.system.version (5 documents)

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.914+0000 writing testmongo.col to archive on stdout

Aug 19 08:47:09 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:09.942+0000 writing test.collC to archive on stdout

Aug 19 08:47:13 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:13.499+0000 done dumping test.collC (1146923 documents)

Aug 19 08:47:13 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:13.500+0000 writing test.collA to archive on stdout

Aug 19 08:47:27 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:27.964+0000 done dumping test.collA (389616 documents)

Aug 19 08:47:27 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:27.965+0000 writing test.collG to archive on stdout

Aug 19 08:47:54 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:54.891+0000 done dumping testmongo.col (13280501 documents)

Aug 19 08:47:54 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T08:47:54.896+0000 writing test.collF to archive on stdout

Aug 19 08:48:09 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:48:09 [........................] test.collG 1533/195563 (0.8%)

Aug 19 08:48:09 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 08:48:09 [####################....] test.collF 116432/134747 (86.4%)

Aug 19 10:01:09 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:01:09 [#######################.] test.collG 195209/195563 (99.8%)

Aug 19 10:01:17 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:01:17 [########################] test.collG 195563/195563 (100.0%)

Aug 19 10:01:17 ip-172-30-2-122 pbm-agent[24331]: 2020-08-19T10:01:17.650+0000 done dumping test.collG (195563 documents)

Aug 19 10:01:20 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:01:20 [INFO] backup/2020-08-19T08:46:50Z: mongodump finished, waiting for the oplog

Aug 19 10:11:04 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:11:04 [INFO] backup/2020-08-19T08:46:50Z: backup finished

Aug 19 10:11:05 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:11:05 [INFO] pitr: streaming started from 2020-08-19 08:47:09 +0000 UTC / 1597826829

Aug 19 10:29:37 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:29:37 [INFO] pitr: created chunk 2020-08-19T08:47:09 - 2020-08-19T10:20:59. Next chunk creation scheduled to begin at ~2020-08-19T10:31:05

Aug 19 10:39:34 ip-172-30-2-122 pbm-agent[24331]: 2020/08/19 10:39:34 [INFO] pitr: created chunk 2020-08-19T10:20:59 - 2020-08-19T10:30:59. Next chunk creation scheduled to begin at ~2020-08-19T10:41:05

The last three lines of the above output mean that the full backup is completed, and the incremental backup is started with a sleep interval of 10 minutes. This is an example of the Backup Progress Logging mentioned above.

We will be discussing more about Percona Backup for MongoDB in an upcoming blog post. Until then, you can find more details on the Percona Backup for MongoDB Documentation page on our website.

Physical/filesystem backups

It involves snapshotting or copying the underlying MongoDB data files (–dbPath) at a point in time and allowing the database to cleanly recover using the state captured in the snapshotted files. They are instrumental in backing up large databases quickly, especially when used with filesystem snapshots, such as LVM snapshots, or block storage volume snapshots.

There are several general methods to take the filesystem level backup, also known as physical backups.

Manually Copying the entire data files (using Rsync → Depends on N/W bandwidth)
LVM based snapshots
Cloud-based disk snapshots (AWS / GCP / Azure or any other cloud provider)
Percona Server for MongoDB also includes an integrated open-source Hot Backup system that creates a physical data backup on a running server without notable performance and operating degradation. You can find more information about Percona Server for MongoDB Hot Backup here.

We’ll be discussing all these above options, but first, let’s look at the Pros and Cons of Physical Backups over Logical backups.

Pros of physical backups

They are at least as fast as, and usually faster than, logical backups.
Can be easily copied over or shared with remote servers or attached NAS.
Recommended for large datasets because of speed and reliability
Can be convenient while building new nodes within the same cluster or new cluster

Cons of physical backups

It is not possible to restore on a less granular level, such as a specific DB or Collection restore
Incremental backups cannot be achieved yet
A dedicated node is recommended for backup (it might be a hidden one) as it requires halting writes or shutting down “mongod” cleanly prior to the snapshot against the node to achieve consistency.

Below is the backup time consumption comparison for the same dataset:

DB Size: 267.6GB
Index Size: <1MB (since it was only on _id for testing)

demo:PRIMARY> db.runCommand({dbStats: 1, scale: 1024*1024*1024})
{
        "db" : "test",
        "collections" : 1,
        "views" : 0,
        "objects" : 137029,
        "avgObjSize" : 2097192,
        "dataSize" : 267.6398703530431,
        "storageSize" : 13.073314666748047,
        "numExtents" : 0,
        "indexes" : 1,
        "indexSize" : 0.0011749267578125,
        "scaleFactor" : 1073741824,
        "fsUsedSize" : 16.939781188964844,
        "fsTotalSize" : 49.98826217651367,
        "ok" : 1,
        ...
}
demo:PRIMARY>

demo:PRIMARY> db.runCommand({dbStats: 1, scale: 1024*1024*1024})

{

"db" : "test",

"collections" : 1,

"views" : 0,

"objects" : 137029,

"avgObjSize" : 2097192,

"dataSize" : 267.6398703530431,

"storageSize" : 13.073314666748047,

"numExtents" : 0,

"indexes" : 1,

"indexSize" : 0.0011749267578125,

"scaleFactor" : 1073741824,

"fsUsedSize" : 16.939781188964844,

"fsTotalSize" : 49.98826217651367,

"ok" : 1,

...

}

demo:PRIMARY>

=============================

Percona Server for MongoDB’s hot backup:

Syntax:

> use admin
switched to db admin
> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})
{ "ok" : 1 }

> use admin

switched to db admin

> db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"})

{ "ok" : 1 }

Best practice tip: The backup path “backupDir” should be absolute. It also supports storing the backups on filesystem and AWS S3 buckets.

[root@ip-172-31-37-92 tmp]# time mongo  < hot.js
Percona Server for MongoDB shell version v4.2.8-8
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("c9860482-7bae-4aae-b0e7-5d61f8547559") }
Percona Server for MongoDB server version: v4.2.8-8
switched to db admin
{
        "ok" : 1,
        ...
}
bye

real    3m51.773s
user    0m0.067s
sys     0m0.026s
[root@ip-172-31-37-92 tmp]# ls
hot  hot.js  mongodb-27017.sock  nohup.out  systemd-private-b8f44077314a49899d0a31f99b31ed7a-chronyd.service-Qh7dpD  tmux-0
[root@ip-172-31-37-92 tmp]# du -sch hot
15G     hot
15G     total

[root@ip-172-31-37-92 tmp]# time mongo < hot.js

Percona Server for MongoDB shell version v4.2.8-8

connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb

Implicit session: session { "id" : UUID("c9860482-7bae-4aae-b0e7-5d61f8547559") }

Percona Server for MongoDB server version: v4.2.8-8

switched to db admin

{

"ok" : 1,

...

}

bye

real 3m51.773s

user 0m0.067s

sys 0m0.026s

[root@ip-172-31-37-92 tmp]# ls

hot hot.js mongodb-27017.sock nohup.out systemd-private-b8f44077314a49899d0a31f99b31ed7a-chronyd.service-Qh7dpD tmux-0

[root@ip-172-31-37-92 tmp]# du -sch hot

15G hot

15G total

Notice the time taken by “Percona Hot Backup” was just 4 minutes approx.

This is very helpful when rebuilding a node or spinning up new instances/clusters with the same dataset. The best part is it doesn’t compromise performance with locking of writes or other performance hits.

Best practice tip: It is recommended to run it against the secondaries.

Filesystem snapshot:

The approx time taken for the snapshot to be completed was only four minutes.

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots  --query "sort_by(Snapshots, &StartTime)[-1].{SnapshotId:SnapshotId,StartTime:StartTime}"
{
    "SnapshotId": "snap-0f4403bc0fa0f2e9c",
    "StartTime": "2020-08-26T12:26:32.783Z"
}

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots --query "sort_by(Snapshots, &StartTime)[-1].{SnapshotId:SnapshotId,StartTime:StartTime}"

{

"SnapshotId": "snap-0f4403bc0fa0f2e9c",

"StartTime": "2020-08-26T12:26:32.783Z"

}

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots 
> --snapshot-ids snap-0f4403bc0fa0f2e9c
{
    "Snapshots": [
        {
            "Description": "This is my snapshot backup",
            "Encrypted": false,
            "OwnerId": "021086068589",
            "Progress": "100%",
            "SnapshotId": "snap-0f4403bc0fa0f2e9c",
            "StartTime": "2020-08-26T12:26:32.783Z",
            "State": "completed",
            "VolumeId": "vol-0def857c44080a556",
            "VolumeSize": 50
        }
    ]
}

[root@ip-172-31-37-92 ~]# aws ec2 describe-snapshots

> --snapshot-ids snap-0f4403bc0fa0f2e9c

{

"Snapshots": [

{

"Description": "This is my snapshot backup",

"Encrypted": false,

"OwnerId": "021086068589",

"Progress": "100%",

"SnapshotId": "snap-0f4403bc0fa0f2e9c",

"StartTime": "2020-08-26T12:26:32.783Z",

"State": "completed",

"VolumeId": "vol-0def857c44080a556",

"VolumeSize": 50

}

]

}

3. Mongodump:

[root@ip-172-31-37-92 ~]# time nohup mongodump -d test -c collG -o /mongodump/ &
[1] 44298

[root@ip-172-31-37-92 ~]# sed -n '1p;$p' nohup.out
2020-08-26T12:36:20.842+0000    writing test.collG to /mongodump/test/collG.bson
2020-08-26T12:51:08.832+0000    [####....................]  test.collG  27353/137029  (20.0%)

[root@ip-172-31-37-92 ~]# time nohup mongodump -d test -c collG -o /mongodump/ &

[1] 44298

[root@ip-172-31-37-92 ~]# sed -n '1p;$p' nohup.out

2020-08-26T12:36:20.842+0000 writing test.collG to /mongodump/test/collG.bson

2020-08-26T12:51:08.832+0000 [####....................] test.collG 27353/137029 (20.0%)

Results: As you can see from this quick example using the same dataset – both the file system level snapshot and Percona Server for MongoDB Hot Backup methods took only 3-5 minutes. However, “mongodump” took almost 15 minutes for just 20% of the dump to complete. Hence the speed to back up the data with mongodump is definitely very slow when compared to the other two options discussed. That is where the s2 compression and the parallelized threads of Percona Backup for MongoDB can help.

Learn more about physical backup support in Percona Backup for MongoDB

Key factors to consider when choosing a MongoDB database backup solution

In this section below, we will discuss the key factors to consider when choosing a MongoDB database backup solution.

Scalability

To ensure the longevity of a MongoDB database, a backup solution must be created with the database’s growth in mind. MongoDB is a flexible NoSQL database that can expand horizontally by incorporating additional servers or shards, as well as vertically by increasing the resources available on existing servers.

Furthermore, an effective MongoDB backup solution should incorporate scalable storage alternatives, such as cloud storage or distributed file systems. These storage solutions offer the ability to expand storage capacity without requiring significant alterations to your existing backup infrastructure.

Performance

MongoDB backup solutions can have a significant impact on database performance, particularly when you are backing up large databases or using them during peak usage hours. Here are some of the things to consider when choosing a backup solution to minimize its impact on your MongoDB database performance:

The type of backup solution: Full backups are time-consuming and resource-intensive, while incremental backups only save changes since the last backup and are typically faster and less resource-intensive.
Storage destination: Backups stored on the same disk as the database can impact read and write operations, while backups stored on a remote server can increase network traffic and cause latency.
Database size: The larger the database, the longer it will need to backup and restore.
Frequency of backups: Frequent backups consume more resources, while infrequent backups increase the risk of data loss. It is important to balance data protection and database performance to achieve optimal results.
Backup schedule: To minimize any effect on database users, schedule backups during off-peak hours.
Compression and security: Although compression and encryption can reduce the backup size and improve security, they may also impact database performance. Compression necessitates additional CPU resources, while encryption requires additional I/O resources, both of which can potentially affect database performance.

Security

Backing up your MongoDB database is critical to safeguard your data from unauthorized access, damage, or theft. Here are some ways in which a MongoDB backup solution can help:

Disaster recovery: In case of a natural disaster or a hacker, a backup solution helps you recover your data. Regularly backing up your MongoDB database ensures that you can restore your data to a previous state if it gets lost or corrupted.
Data encryption: Sensitive data can be kept secure with data encryption at rest and in transit via a backup solution.
Access control: A good backup solution lets you regulate data access and set up encryption and authentication protocols to ensure only authorized users have access.
Version control: Keeping track of different versions of your data is easier with a backup solution, enabling you to roll back to a previous version (or compare versions over time).
Offsite backup: Offsite backups protects data from physical theft or damage. It can also help you comply with any regulations requiring offsite backup storage.

Free your applications with Percona Distribution for MongoDB

Choosing a MongoDB backup solution

The best method for taking the backups depends on multiple factors like the type of infrastructure, environment, resources available, dataset size, load, etc. However, consistency and complexity also play a major role while taking backups of distributed database systems

In general, for smaller instances, simple logical backups via mongodump are fine. As you reach somewhat larger database sizes above around 100G, use backup methods like Percona Backup for MongoDB that include incremental backups and capture the oplogs in order to be able to perform Point-in-Time Recoveries and minimize potential data loss.

PBM allows you to backup to anywhere – in the cloud or on-prem, can handle your larger backups, and it is optimized to have minimal impact on your production performance. PBM is also faster due to the use of the “s2” compression method and using parallelized threads. Finally, PBM can overcome consistency issues often seen with replica set and sharded clusters by capturing the changes in the oplog.

For very large systems, aka once you reach around the 1TB+ range, you should look to utilize physical file system-level snapshot backups. One available tool for that is open-source – Percona Server for MongoDB has the integrated Hot Backup functionality built-in for the default WiredTiger storage engine and takes around the same time as other physical snapshots.

Download Percona Backup for MongoDB