Backup Databases on Kubernetes With VolumeSnapshotsDatabases on Kubernetes continue their rising trend. We see the growing adoption of our Percona Kubernetes Operators and the demand to migrate workloads to the cloud-native platform. Our Operators provide built-in backup and restore capabilities, but some users are still looking for old-fashioned ways, like storage-level snapshots (i.e., AWS EBS Snapshots).

In this blog post, you will learn:

  1. How to back up and restore from storage snapshots using Percona Operators
  2. What the risks and limitations are of such backups

Overview

Volume Snapshots went GA in Kubernetes 1.20. Both your storage and Container Storage Interface (CSI) must support snapshots. All major cloud providers support them but might require some steps to enable it. For example, for GKE, you must create a VolumeSnapshotClass resource first.

At the high level, snapshotting on Kubernetes looks like this:

As PersistentVolume is represented by the real storage volume, VolumeSnapshot is the Kubernetes resource for volume snapshot in the cloud.

Getting ready for backups

First, we need to be sure that VolumeSnapshots are supported. For the major clouds, read the following docs:

Once you have CSI configured and Volume Snapshot Class is in place, proceed to create a backup.

Take the backup

Identify the PersistentVolumeClaims (PVC) that you want to snapshot. For example, for my MongoDB cluster, I have six PVCs: three x replica set nodes and three x config server nodes.

Each PVC will have its own VolumeSnapshot. Example for mongod-data-my-cluster-name-cfg-0:

I have listed all my VolumeSnapshots objects in one YAML manifest here.

VolumeSnapshotContent is created and bound to every VolumeSnapshot resource. Its status can tell you the name of the snapshot in the cloud and check if a snapshot is ready:

  • snapshot-0e67c3b5-551f-495b-b775-09d026ea3c8f is the snapshot I have in GCP for the volume.
  • readyToUse: true – indicates that the snapshot is ready

Restore

The restoration process, in a nutshell, looks as follows:

  1. Create persistent volumes using the snapshots. The names of the volumes must match the standard that Operator uses.
  2. Provision the cluster

Like any other backup, it must have secrets in place: TLS and users.

You can use this restoration process to clone existing clusters as well, just make sure you change the cluster, PVCs, and Secret names.

Create persistent volumes from snapshots. It is the same as the creation of regular PersistentVolumeClaim, but with a dataSource section that points to the snapshot:

Once done, spin up the cluster as usual. The volumes you created earlier will be used automatically. Restoration is done.

Risks and limitations

Storage support

Both storage and the storage plugin in Kubernetes must support volume snapshots. This limits the choices. Apart from public clouds, there are open source solutions like Ceph (rook.io for k8s) that can provide snapshotting capabilities.

Point-in-time recovery

Point-in-time recovery (PITR) allows you to reduce your Point Recovery Objective by restoring or rolling back the database to a specific transaction or time.

Volume snapshots in the clouds store data in increments. The first snapshot holds all the data, and the following ones only store the changes. This significantly reduces your cloud bill. But snapshots cannot provide you with the same RPO as native database mechanisms.

Data consistency and corruption

Snapshots are not data-aware. When a snapshot is taken, numerous transactions and data modifications can happen. For example, heavy write activity and simultaneous compound index creation in MongoDB might lead to snapshot corruption. The biggest problem is that you will learn about data corruption during restoration.

Locking or freezing a filesystem before the snapshot would help to avoid such issues. Solutions like Velero or Veeam make the first steps towards data awareness and can create consistent snapshots by automating file system freezes or stopping replication.

Percona Services teams use various tools to automate the snapshot creation safely. Please contact us here to ensure data safety.

Cost

Public clouds store snapshots on cheap object storage but charge you extra for convenience. For example, the AWS EBS snapshot is priced at $0.05/GB, whereas S3 is only $0.023. It is a 2x difference, which for giant data sets might significantly increase your bill.

Time to recover

It is not a risk or limitation but a common misconception I often see: recovery from snapshots takes only a few seconds. It does not. When you create an EBS volume from the snapshot, it takes a few seconds. But in reality, the volume you just created does not have any data. You can read more about the internals of EBS snapshots in this nice blog post.

Conclusion

Volume Snapshots on Kubernetes can be used for databases but come with certain limitations and risks. Data safety and consistency are the most important factors when choosing a backup solution. For Percona Operators, we strongly recommend using built-in solutions which guarantee data consistency and minimize your recovery time and point objectives.

Learn More About Percona Kubernetes Operators

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments