Many MongoDB clusters use storage-level snapshots to provide fast and reliable backups. In this blog post, you’ll learn how to restore such a snapshot from a traditional VM-based sharded MongoDB cluster to a freshly deployed Percona Operator for MongoDB cluster on Kubernetes.

Background story

I recently worked with a company running a large four-shard MongoDB Enterprise Server database cluster on VMs on premises that decided to migrate to the Google Cloud Platform. After careful consideration and pros-and-con evaluation, the customer chose to go with Percona Distribution for MongoDB run on Kubernetes – specifically using Percona Operator for MongoDB. There are four main factors that contributed to this decision:

  1. The total cost of ownership: K8s resources are significantly cheaper than running a popular DBaaS on the same cloud
  2. Ease of deployments: Percona Operator for MongoDB makes day-1 and day-2 operations a breeze
  3. Preference for open-source tools and cloud-native solutions: applications are already migrated to Kubernetes – running the entire solution that way simplifies operations. 
  4. This approach frees the deployment from cloud (and any other vendor) lock-in.

To start realistic compatibility and performance testing, we needed to restore the filesystem snapshot backup that was stored on NetApp volume in GCP into a Google Kubernetes Engine (GKE) cluster running Percona Operator for MongoDB. It’s not a trivial task due to the requirements for restoring a sharded cluster backup into a new cluster. I will show you why and how we did it.

Requirements

Let’s look at the overall requirements for the cluster where I want to restore the snapshot to:

  1. The same major (ideally minor, too) MongoDB version
  2. The same number of shards
  3. The same name of all shards
  4. The same name of the Config RS
  5. The same hostnames of all nodes (this is how Config RS connects to the specific shard)
  6. The same MongoDB configuration with regards to how files are stored on the disk

This seems to be straightforward; however, there are a couple of special considerations to make when it comes to the environment controlled by Percona Operator for MongoDB, specifically:

  • You can’t control the name of Config RS
  • You can’t change the hostnames and those will certainly be different within Kubernetes
  • Percona Operator needs specific users to be present in your cluster in order to control it – and those won’t be present in your backup.

All of the above makes it impossible to simply adjust the Operator configuration and copy all the files from your snapshot backup in the specific volumes.

Plan

The high-level plan consists of the following steps:

  1. Deploy the cluster on K8s
  2. Restore snapshot files
    1. Pause the cluster on K8s
    2. Mount storage volumes to a separate VM
    3. Copy snapshot files to respective volumes
  3. Prepare each replica set in the standalone mode (hostnames, sharding configuration, users)
  4. Start the cluster on K8s and initialize each replica set

The following approach is applicable regardless of what “flavor” of MongoDB your source environment uses. This can be MongoDB Enterprise Server, MongoDB Community Edition, or Percona Server for MongoDB.

Step one: Deploy the cluster on K8s

In order to deploy Percona Server for MongoDB (PSMDB) on the K8s cluster, follow the documentation. Before you execute the last step (deploying cr.yaml), however, make sure to adjust the following elements of the configuration. This will make the cluster “fit” the one that we took the backup from.

  1. Set spec.image to a specific image version. It needs to be the version that matches the major version of the source cluster, for example:

  2. Create as many shard replica sets as in the source cluster. Copy the default replica set definition (entire section) in spec.replsets[]. For example, if your source cluster has two shards with the names “shard1” and “shard2” (names must match the ones from the source cluster):


    Unfortunately, you can’t set the name of Config RS. We’ll deal with that later.
  3. If your source cluster WiredTiger configuration is different than default, adjust the mongod configuration of each replica set to match it. Specifically, two MongoDB configuration items are critical: storage.directoryPerDB and storage.wiredTiger.engineConfig.directoryForIndexes. You can do it in the following way:

  4. Save changes and start the cluster using the modified cr.yaml file

Step two: Restore snapshot files

Your cluster should be started at this point and it is important that all PersistentVolumes required for it be created.  You can check your cluster state with kubectl get psmdb, see all deployed pods with kubectl get pods, or check PVs with kubectl get pv. In this step, you need to mount volumes of all your database nodes to an independent VM as we’ll make changes in MongoDB standalone mode. You need those VMs temporarily to perform the required operations.

  1. Pause the PSMDB cluster on K8s by setting spec.pause: true in your cr.yaml file. That will cause the entire cluster to stop and make it possible to mount volumes elsewhere.
  2. Check zones where your PerstistentVolumes were created. You can use kubectl describe pv pv_name or find it in the cloud console.
  3. Create a VM in each zone, then mount volumes corresponding to PersistentVolumes to a VM in its zone. It’s critical that you can easily identify the volume’s purpose (which replica set in your MongoDB cluster it refers to and which node – primary or secondary). As an example, this is how you can mount a volume to your VM using gcloud API:


     
  4. Delete all files from each volume mounted to your VMs – primaries, and secondaries.
  5. Copy files from the snapshot to the respective volume, directly to the main directory of the mounted volume. Do it just for volumes related to primary nodes (leave secondaries empty).
  6. Install Percona Server for MongoDB on each of your VMs. Check installation instructions and install the same version as your Mongo cluster on K8s. Don’t start the server yet!

Step three: Prepare each replica set in the standalone mode

Now, you need to start PSMDB for each volume with data (each primary of each replica set, including Config RS) separately. We will then log in to mongo shell and edit the cluster configuration manually so that when we bring volumes with data back to Kubernetes, it can start successfully.

Execute the steps below for each replica set, including Config RS:

  1. On the VM, when your Config RS primary volume is mounted, edit /etc/mongod.conf. Specifically, adjust storage.dbPath (to the directory where you mounted the volume), storage.directoryPerDB, and storage.wiredTiger.engineConfig.directoryForIndexes.
  2. Start PSMDB with sudo mongod –config /etc/mongod.conf
  3. Connect to PSMDB with mongo command (authentication is not required). Once you successfully log in, we can start making changes to the cluster configuration.
  4. Delete the local replica set configuration with the following commands:

  5. [This step applies only to Config RS]
    Replace shards configuration. Execute the following command for each shard separately (you can list them all with db.shards.find() ). Replace in the following string: shard_name, cluster_name, namespace_name with values specific to your cluster:

  6. [This step applies to shard RS only] Clear shard metadata

  7. [This step applies to shard RS only] Replace Config RS connection string in shardIdentity with the following command. Replace cluster_name, namespace_name with your values.

  8. Percona MongoDB Operator requires system-level MongoDB users to be present in a database. We must create those users for the operator, as our backup doesn’t have them. If you haven’t changed default secrets.yaml during the deployment of the cluster, you can find default passwords either in the file or here in the documentation. To create required users and roles, use the Mongo commands below:

  9. Shut down the server now using db.shutdownServer();
  10. The operator runs mongod process with “mongodb” user (not as a standard “mongod”!). Therefore, we need to fix permissions before we unmount the volume. Add the following line to your /etc/passwd file:

  11. Set permissions:

  12. Unmount the volume and detach it from the VM. As an example, this is how you can unmount a volume from your VM using gcloud API:


     

Step four: Start the cluster on K8s and initialize each replica set

You’re ready to get back to K8s. Start the cluster (it will start with previously used volumes). It will be in a pending state because we broke (intentionally) replica sets. Only one pod per Replica Set will start. You must initialize replica sets one by one.

To unpause the PSMDB cluster set spec.pause: false in your cr.yaml file and apply it with kubectl. Then, repeat the steps below for all replica sets, starting with Config RS.

  1. Login to the shell of “pod 0” of the replica set with kubectl exec –stdin –tty cluster_name-cfg-0 — /bin/bash (for Config RS)
  2. Login to PSMDB with mongo command
  3. Authenticate as clusterAdmin using the following command (assuming you used default passwords):

  4. Initialize the replica set as below. Replace cluster_name and namespace_name and shard_name with your own values.

  5. After a few seconds, your node will become PRIMARY. You can check the health of the replica set using the rs.status() command. Remember that if your dataset is large, the initial synchronization process may take a long time (as with any MongoDB deployment).

That’s it! You now successfully restored a snapshot backup into Percona Server for MongoDB deployed on K8s with Percona Operator. To verify that you have successfully done that run kubectl get pods or kubectl get psmdb – the output should be similar to the one below.

 

The Percona Kubernetes Operators automate the creation, alteration, or deletion of members in your Percona Distribution for MySQL, MongoDB, or PostgreSQL environment.

Learn More About Percona Kubernetes Operators

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments