When you do a database restore, you want to have it done as soon as possible. In the case of disaster recovery, the situation is stressful enough on its own. And the database is unusable until the restore is done. So every minute matters. That becomes especially crucial with big datasets.

Bringing physical backups in Percona Backup for MongoDB (PBM) was a big step toward the restoration speed. A physical restore is essentially copying data files to the target nodes and starting a database with that data catalog, while logical means copying data and running insert operations on the database, which brings overhead on parsing data, building indexes, etc. Our tests showed physical database restores up to 5x faster than the logical ones. But can we do better? Let’s try.

The speed of the physical restoration comes down to how fast we can copy (download) data from the remote storage. So we decided to try parallel (concurrent) download. In physical backups, PBM stores WiredTiger files pretty much the same as they are in the data directory, just adding extra compression. So what if you want to download different files in parallel? It won’t exactly work as each MongoDB collection’s data is stored in one file. So data doesn’t spread evenly across the files. And we would have bottlenecks in case of big collections. So the better approach is to download each file concurrently.

PBM already downloads files in chunks, but it’s done solely for retries. So in case of a network failure, we’d have to retry a recent chunk rather than the whole file. The idea is to download these chunks concurrently. Here’s the problem: Reading out-of-order, we cannot write it straight to the file (with a seek offset), as we have to decompress data first (data in the backup is compressed by default). Hence, although we can read data out-of-order, we must write it sequentially. For that, we made a special memory buffer. Chunks can be put there concurrently and out-of-order, but consumers always read data in order.

The final design

The downloader starts the number of workers, which equals the concurrency (number of CPU cores by default). There is preallocated arena in the arenas pool for each worker. The arena basically is a bytes buffer with the free slots bitmap. Each arena is split into spans. The span size is equal to the download chunk size. When a worker wants to download a chunk, it acquires free span from its arena first and downloads data in there. When a consumer reads this data, the span is marked as free and can be reused for the next chunk. The worker doesn’t wait for the data to be read, and once it has downloaded a chunk, it takes another chunk from the task queue, acquires the next free span, and downloads data. To prevent uncontrolled memory consumption, the number of spans in each arena is limited, and the worker would have to wait for a free span to download the next chunk if all are busy. 

On the other hand, we keep track of what was given to the consumer, the number of the last written byte, for each file. And if the downloaded chunk is out-of-order, it’s being pushed to the heap or given to the consumer otherwise. On the next iteration (the next downloaded chunk), we check the top of the heap, pop chunks out, and give it back to the consumer if/until chunks are in order.

See the commit with changes for more details.

Config options

A few new options were added to the PBM config file to tweak concurrent downloads. 

numDownloadWorkers – sets concurrency. Default: number of CPUs

downloadChunkMb – the size of the chunks to download in Mb. Default: 32

maxDownloadBufferMb – the upper limit of the memory that can be used for download buffers (arenas) in Mb. Default: numDownloadWorkers * downloadChunkMb * 16. If set, chunk size might be changed to fit the max requirements. It doesn’t mean that all of this memory will be used and actually allocated in the physical RAM.

Results

PBM supports different storage types, but for this implementation, we decided to start with the most widely used – S3 compatible. We aim to port it to Azure Blob and FileSystem storage types in subsequent releases.

Our tests on AWS S3 show up to 19x improvements in the restore speed:

Instances
Backup size
Concurrency
Span size
Restore time
Concurrent download
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
4
   32Mb
    45 min
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
8
   32Mb
    32 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
12
   32Mb
    168 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
24
   32Mb
    117 min
Release v2.0.3
i3en.xlarge (4vCPU,16Gb RAM)
   500Gb
    227 min
i3en.3xlarge (12vCPU,96GB RAM)
   5Tb
    ~2280 min

 

AWS S3 MongoDB backup

* Tests were made on AWS i3en instances with the S3 storage in the same region.

** We didn’t wait for 5Tb restore on v2.0.3 to finish and used the “time to uploaded Gb” ratio for results extrapolation.

Try Percona Backup for MongoDB for faster restores

This is a significant improvement that comes among the other features with the new Percona Backup for MongoDB (PBM) release. Give it a try, and leave your feedback!

 

Get Percona Backup for MongoDB

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments