In this blog post, we will discuss how we can migrate data from MongoDB Atlas to self-hosted MongoDB. There are a couple of third-party tools in the market to migrate data from Atlas to Pecona Server for MongoDB (PSMDB), like MongoPush, Hummingbird, and MongoShake. Today, we are going to discuss how to use MongoShake and migrate and sync the data from Atlas to PSMDB.

NOTE: These tools are not officially supported by Percona.

MongoShake is a powerful tool that facilitates the migration of data from one MongoDB cluster to another. These are step-by-step instructions on how to install and utilize MongoShake for data migration from Atlas to PSMDB. So, let’s get started!

Prerequisites:

A MongoDB Atlas account. I created a test account (replica set) and loaded sample data with one click in Atlas:

  1. Create an account in Atlas.
  2. Create a cluster.
  3. Once a cluster is created, go to browse collections.
  4. It will ask for load sample data. Once you click on it, you will see the sample data like below.

An EC2 instance with PSMDB installed. I installed PSMDB on the EC2 machine:

Make sure Atlas and PSMDB both have the same DB version (I have also used this tool on MongoDB 4.2, which is already EOL).

PSMDB version:

MongoDB Atlas version:

To install MongoShake, follow these steps:

Step 1: Install Go
Ensure that Go is installed on your system. If not, download it from the official website and follow the installation instructions. I used Amazon Linux 2, so used the below command to install go:

Step 2: Install MongoShake
Open the terminal and run the following command to install MongoShake:

  1. Untar the file; it will create a folder with the name Mongoshake.
  2. cd MongoShake.
  3. Run ./build.sh file.

Once you have installed MongoShake, you need to configure it for the migration process. Here’s how:

  1. Configuration file (collector.conf) will be under conf dir under Mongoshake dir.
  2. In the config file, you can edit the URI for both RS or sharded clusters. Also, the tunnel (how you are migrating the data) method. If you are doing it directly, then the value will be direct. You can edit the log file path and log file name. Below are some important parameters:

    Sync_mode other options: all/full/incr.
  • All means full synchronization + incremental synchronization. (copy the data and apply the oplogs after sync completes). 
  • Full means full synchronization only. (only copy the data).
  • Incr means incremental synchronization only. (only apply the oplog).

There are other parameters as well in the configuration file, which you can tune as per your needs. For example, if you want to read data from the Secondary node and do not want to overwhelm the Primary with the reads, you can set below parameter:

Step 3: Once you are done with the configuration, run MongoShake in a screen session like the one below:

Step 4: Monitor the log file in the log directory to check the progress of migration.

Below is the sample log when you start MongoShake:

You will see the below log once full sync is completed, and incr will start (incr means it will start syncing live data via oplog):

You will see the logs like this when both nodes are in sync (when lag is 0, i.e., tps=0):

Once the full data replication process is complete and both clusters are in sync, you can stop pointing the application to Atlas. Check the logs of MongoShake, and when the lag is 0, as we can see in the above logs, stop the replication/sync from Atlas or stop MongoShake. Verify that the data has been successfully migrated to PSMDB. You can use MongoDB shell or any other client to connect to the PSMDB instance to verify this.

MongoDB Atlas databases and their collection count:


PSDMB databases and their collection count:

Above, you can see we have verified data in PSMDB. Now, update the connection string of the application to point to PSMDB.

NOTE: Sometimes, during the migration process, it is possible for some indexes to replicate. So, during the data verification process, please verify the indexes, and if an index is missing, create that index before the cutover time.

Conclusion

MongoShake simplifies the process of migrating MongoDB data from Atlas to self-hosted MongoDB. Percona experts can assist you with migration as well. By following the steps outlined in this blog, you can seamlessly install, configure, and utilize MongoShake for migrating your data from MongoDB Atlas.

To learn more about the enterprise-grade features available in the license-free Percona Server for MongoDB, we recommend going through our blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered? 

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments