If you ever had to make a quick ad-hoc backup of your MongoDB databases, but there was not enough disk space on the local disk to do so, this blog post may provide some handy tips to save you from headaches.

It is a common practice that before a backup can be stored in the cloud or on a dedicated backup server, it has to be prepared first locally and later copied to the destination.

Fortunately, there are ways to skip the local storage entirely and stream MongoDB backups directly to the destination. At the same time, the common goal is to save both the network bandwidth and storage space (cost savings!) while not overloading the CPU capacity on the production database. Therefore, applying on-the-fly compression is essential.

In this article, I will show some simple examples to help you quickly do the job.

Prerequisites for streaming MongoDB backups

You will need an account for one of the providers offering object storage compatible with Amazon S3. I used Wasabi in my tests as it offers very easy registration for a trial and takes just a few minutes to get started if you want to test the service.

A second need is a tool allowing you to manage the data from a Linux command line. The two most popular ones — s3cmd and AWS — are sufficient, and I will show examples using both.

Installation and setup will depend on your OS and the S3 provider specifics. Please refer to the documentation below to proceed, as I will not cover the installation details here.

* https://s3tools.org/s3cmd
* https://docs.aws.amazon.com/cli/index.html

Backup tools

Two main tools are provided with the MongoDB packages, and both do a logical backup.

Compression tool

We all know gzip or bzip2 are installed by default on almost every Linux distro. However, I find zstd way more efficient, so I’ll use it in the examples.

Examples

I believe real-case examples are best if you wish to test something similar, so here they are.

Mongodump & s3cmd – Single database backup

  • Let’s create a bucket dedicated to MongoDB data backups:

  • Now, do a simple dump of one example database using the –archive option, which changes the behavior from storing collections data in separate files on disk, to streaming the whole backup to standard output (STDOUT) using common archive format. At the same time, the stream gets compressed on the fly and sent to the S3 destination.
  • Note the below command does not create a consistent backup with regards to ongoing writes as it does not contain the oplog.

  • After the backup is done, let’s verify its presence in S3:

Mongorestore & s3cmd – Database restore directly from S3

The below mongorestore command uses archive option as well, which allows us to stream the backup directly to it:

Mongodump & s3cmd – Full backup

The below command provides a consistent point-in-time snapshot thanks to oplog option:

Mongodump & s3cmd – Full backup restore

By analogy, mongorestore is using the oplogReplay option to apply the log contained in the archived stream:

Mongoexport – Export all collections from a given database, compress, and save directly to S3

Another example is using the tool to create regular JSON dumps; this is also not a consistent backup if writes are ongoing.

Mongoimport & s3cmd – Import single collection under a different name

Mongodump & AWS S3 – Backup database

Mongorestore & AWS S3 – Restore database

In the above examples, I used both mongodump/mongorestore and mongoexport/mongoimport tools to backup and recover your MongoDB data directly to and from the S3 object storage type, while doing it the streaming and compressed way. Therefore, these methods are simple, fast, and resource-friendly. I hope what I used will be useful when you are looking for options to use in your backup scripts or ad-hoc backup tasks.

Additional tools

Here, I would like to mention that there are other free and open source backup solutions you may try, including Percona Backup for MongoDB (PBM), which now offers both logical and physical backups:

With the Percona Server for MongoDB variant, you may also stream hot physical backups directly to S3 storage:

https://docs.percona.com/percona-server-for-mongodb/6.0/hot-backup.html#streaming-hot-backups-to-a-remote-destination

It is as easy as this:

For a sharded cluster, you should use PBM rather for consistent backups.

Btw, don’t forget to check out the MongoDB best backup practices!

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB today!

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Chimera

could you pipe the output of the mongoexport job using the standard AWS s3 command instead, without compression, such as: mongoexport –db=db2 –archive | aws s3 cp – s3://mbackups/$(date +%Y-%m-%d.%H-%M).json