Speeding Up Restores in Percona Backup for MongoDB

When you do a database restore, you want to have it done as soon as possible. In the case of disaster recovery, the situation is stressful enough on its own. And the database is unusable until the restore is done. So every minute matters. That becomes especially crucial with big datasets.

Bringing physical backups in Percona Backup for MongoDB (PBM) was a big step toward the restoration speed. A physical restore is essentially copying data files to the target nodes and starting a database with that data catalog, while logical means copying data and running insert operations on the database, which brings overhead on parsing data, building indexes, etc. Our tests showed physical database restores up to 5x faster than the logical ones. But can we do better? Let’s try.

The speed of the physical restoration comes down to how fast we can copy (download) data from the remote storage. So we decided to try parallel (concurrent) download. In physical backups, PBM stores WiredTiger files pretty much the same as they are in the data directory, just adding extra compression. So what if you want to download different files in parallel? It won’t exactly work as each MongoDB collection’s data is stored in one file. So data doesn’t spread evenly across the files. And we would have bottlenecks in case of big collections. So the better approach is to download each file concurrently.

PBM already downloads files in chunks, but it’s done solely for retries. So in case of a network failure, we’d have to retry a recent chunk rather than the whole file. The idea is to download these chunks concurrently. Here’s the problem: Reading out-of-order, we cannot write it straight to the file (with a seek offset), as we have to decompress data first (data in the backup is compressed by default). Hence, although we can read data out-of-order, we must write it sequentially. For that, we made a special memory buffer. Chunks can be put there concurrently and out-of-order, but consumers always read data in order.

The final design

The downloader starts the number of workers, which equals the concurrency (number of CPU cores by default). There is preallocated arena in the arenas pool for each worker. The arena basically is a bytes buffer with the free slots bitmap. Each arena is split into spans. The span size is equal to the download chunk size. When a worker wants to download a chunk, it acquires free span from its arena first and downloads data in there. When a consumer reads this data, the span is marked as free and can be reused for the next chunk. The worker doesn’t wait for the data to be read, and once it has downloaded a chunk, it takes another chunk from the task queue, acquires the next free span, and downloads data. To prevent uncontrolled memory consumption, the number of spans in each arena is limited, and the worker would have to wait for a free span to download the next chunk if all are busy.

On the other hand, we keep track of what was given to the consumer, the number of the last written byte, for each file. And if the downloaded chunk is out-of-order, it’s being pushed to the heap or given to the consumer otherwise. On the next iteration (the next downloaded chunk), we check the top of the heap, pop chunks out, and give it back to the consumer if/until chunks are in order.

See the commit with changes for more details.

Config options

A few new options were added to the PBM config file to tweak concurrent downloads.

numDownloadWorkers – sets concurrency. Default: number of CPUs

downloadChunkMb – the size of the chunks to download in Mb. Default: 32

maxDownloadBufferMb – the upper limit of the memory that can be used for download buffers (arenas) in Mb. Default: numDownloadWorkers * downloadChunkMb * 16. If set, chunk size might be changed to fit the max requirements. It doesn’t mean that all of this memory will be used and actually allocated in the physical RAM.

Results

PBM supports different storage types, but for this implementation, we decided to start with the most widely used – S3 compatible. We aim to port it to Azure Blob and FileSystem storage types in subsequent releases.

Our tests on AWS S3 show up to 19x improvements in the restore speed:

Instances	Backup size	Concurrency	Span size	Restore time
Concurrent download
i3en.xlarge (4vCPU,16Gb RAM)	500Gb	4	32Mb	45 min
i3en.xlarge (4vCPU,16Gb RAM)	500Gb	8	32Mb	32 min
i3en.3xlarge (12vCPU,96GB RAM)	5Tb	12	32Mb	168 min
i3en.3xlarge (12vCPU,96GB RAM)	5Tb	24	32Mb	117 min
Release v2.0.3
i3en.xlarge (4vCPU,16Gb RAM)	500Gb			227 min
i3en.3xlarge (12vCPU,96GB RAM)	5Tb			~2280 min

AWS S3 MongoDB backup

* Tests were made on AWS i3en instances with the S3 storage in the same region.

** We didn’t wait for 5Tb restore on v2.0.3 to finish and used the “time to uploaded Gb” ratio for results extrapolation.

Try Percona Backup for MongoDB for faster restores

This is a significant improvement that comes among the other features with the new Percona Backup for MongoDB (PBM) release. Give it a try, and leave your feedback!

Get Percona Backup for MongoDB

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Speeding Up Restores in Percona Backup for MongoDB

The final design

Config options

Results

Instances

Backup size

Concurrency

Span size

Restore time

Concurrent download

i3en.xlarge (4vCPU,16Gb RAM)

500Gb

4

32Mb

45 min

i3en.xlarge (4vCPU,16Gb RAM)

500Gb

8

32Mb

32 min

i3en.3xlarge (12vCPU,96GB RAM)

5Tb

12

32Mb

168 min

i3en.3xlarge (12vCPU,96GB RAM)

5Tb

24

32Mb

117 min

Release v2.0.3

i3en.xlarge (4vCPU,16Gb RAM)

500Gb

227 min

i3en.3xlarge (12vCPU,96GB RAM)

5Tb

~2280 min

Try Percona Backup for MongoDB for faster restores

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation