The life of a jumbo chunk

MongoDB marks a chunk as “jumbo” when it grows past the configured maximum chunk size. This value defaults to 128 MB since MongoDB 6.0 (it used to be 64 MB before).

The most common reason for jumbo chunks to appear is when the auto-splitter process cannot find a way to split a chunk. This can happen, for example, when all documents in the chunk contain the same shard key.

As time goes by, it is possible that a chunk might “lose weight” if some data is deleted. So the chunk might go below the maximum chunk size at some point. Unfortunately, MongoDB does not remove the jumbo flag automatically in this case.

The problem is that jumbo chunks are ignored by the balancer process. Eventually, the data on the shards might become imbalanced, with some shards having more data than others. Also, until recently, the jumbo flag wasn’t properly cleared after a successful chunk split.

Clearing the jumbo flag

To prevent the situation described above, we should check for jumbo chunks and remove the jumbo flag for chunks that are below the maximum chunk size.

The idea is to go through each document on the config.chunks collection with the { jumbo:true } property, and run the dataSize command to check the actual chunk size. If the actual size is below the maximum, we clear the jumbo flag on the chunk.

The following script can be used on MongoDB 5.0 (or newer) to find all such chunks in a specific namespace and print the commands to remove the jumbo flag:

The script does a few things to reduce the overhead and process faster. First of all, we run the data size estimation on a secondary member to avoid impacting the primary.

In certain use cases, the document size varies greatly across documents. Running the dataSize command with { estimate: true } will calculate a bad estimation if the document size is not uniform, as it relies on the average document size.

On the other hand, running dataSize with { estimate: true } for all chunks could be slow and costly in terms of resources, so we can devise a compromise.

We run the dataSize command with { estimate: true } first, which is fast. Only if the estimated chunk size is less than twice the maximum chunk size we go calculate the real chunk size by scanning all the documents in the chunk.

Closing thoughts

We have seen how chunks might silently lose the “jumbo” status and how this might impact the balancing process. We have seen a way to detect and remediate this situation.

On a related note, an interesting fact is that chunks can grow beyond the maximum size and not be marked jumbo.

It is a good practice to check for jumbo chunks periodically and remove the jumbo flag if a chunk has effectively gone below the maximum chunk size.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments