Dealing With Chunks That "Lost Weight" in MongoDB

The life of a jumbo chunk

MongoDB marks a chunk as “jumbo” when it grows past the configured maximum chunk size. This value defaults to 128 MB since MongoDB 6.0 (it used to be 64 MB before).

The most common reason for jumbo chunks to appear is when the auto-splitter process cannot find a way to split a chunk. This can happen, for example, when all documents in the chunk contain the same shard key.

As time goes by, it is possible that a chunk might “lose weight” if some data is deleted. So the chunk might go below the maximum chunk size at some point. Unfortunately, MongoDB does not remove the jumbo flag automatically in this case.

The problem is that jumbo chunks are ignored by the balancer process. Eventually, the data on the shards might become imbalanced, with some shards having more data than others. Also, until recently, the jumbo flag wasn’t properly cleared after a successful chunk split.

Clearing the jumbo flag

To prevent the situation described above, we should check for jumbo chunks and remove the jumbo flag for chunks that are below the maximum chunk size.

The idea is to go through each document on the config.chunks collection with the { jumbo:true } property, and run the dataSize command to check the actual chunk size. If the actual size is below the maximum, we clear the jumbo flag on the chunk.

The following script can be used on MongoDB 5.0 (or newer) to find all such chunks in a specific namespace and print the commands to remove the jumbo flag:

var clearJumbo = function(ns){
    var db1 = db.getSiblingDB(ns.split(".")[0]) 
    db1.getMongo().setReadPref("secondary");
    var uuid = db1.getCollectionInfos({ name: ns.split(".")[1] })[0].info.uuid
    var chunks = db.getSiblingDB("config").chunks.find({"uuid" : uuid, "jumbo": true}).sort({min:1}).noCursorTimeout(); 
    var key = db.getSiblingDB("config").collections.findOne({_id: ns}).key;
    var totalChunks = 0;
    var totalJumbo = 0;
    var startTime = new Date();
 
    print(startTime);
    chunks.forEach( 
        function printChunkInfo(chunk) { 
          var dataSizeResult = db1.runCommand({datasize:ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:true});
          if(dataSizeResult.size < 134217728) {
            var dataSizeResult2 = db1.runCommand({datasize:ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:false});
            if(dataSizeResult2.size < 67108864) {
              totalJumbo++;
              print('db.getSiblingDB("admin").runCommand({ clearJumboFlag: "' + ns + '", bounds: [ { "'+  JSON.stringify(chunk.min) + '"}, {"' +  JSON.stringify(chunk.max) + '"}] })');
            }
          }
          totalChunks++;
        }
    )
    var endTime = new Date();
    print("***********Summary Chunk Information***********");
    print("Total Jumbo Chunks: "+totalChunks);
    print("Total Jumbo Flags Removed: "+totalJumbo);
    print("Total Duration: "+((endTime - startTime)/1000) + " seconds");
}

var clearJumbo = function(ns){

var db1 = db.getSiblingDB(ns.split(".")[0])

db1.getMongo().setReadPref("secondary");

var uuid = db1.getCollectionInfos({ name: ns.split(".")[1] })[0].info.uuid

var chunks = db.getSiblingDB("config").chunks.find({"uuid" : uuid, "jumbo": true}).sort({min:1}).noCursorTimeout();

var key = db.getSiblingDB("config").collections.findOne({_id: ns}).key;

var totalChunks = 0;

var totalJumbo = 0;

var startTime = new Date();

print(startTime);

chunks.forEach(

function printChunkInfo(chunk) {

var dataSizeResult = db1.runCommand({datasize:ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:true});

if(dataSizeResult.size < 134217728) {

var dataSizeResult2 = db1.runCommand({datasize:ns, keyPattern:key, min:chunk.min, max:chunk.max, estimate:false});

if(dataSizeResult2.size < 67108864) {

totalJumbo++;

print('db.getSiblingDB("admin").runCommand({ clearJumboFlag: "' + ns + '", bounds: [ { "'+ JSON.stringify(chunk.min) + '"}, {"' + JSON.stringify(chunk.max) + '"}] })');

}

totalChunks++;

}

)

var endTime = new Date();

print("***********Summary Chunk Information***********");

print("Total Jumbo Chunks: "+totalChunks);

print("Total Jumbo Flags Removed: "+totalJumbo);

print("Total Duration: "+((endTime - startTime)/1000) + " seconds");

}

The script does a few things to reduce the overhead and process faster. First of all, we run the data size estimation on a secondary member to avoid impacting the primary.

In certain use cases, the document size varies greatly across documents. Running the dataSize command with { estimate: true } will calculate a bad estimation if the document size is not uniform, as it relies on the average document size.

On the other hand, running dataSize with { estimate: true } for all chunks could be slow and costly in terms of resources, so we can devise a compromise.

We run the dataSize command with { estimate: true } first, which is fast. Only if the estimated chunk size is less than twice the maximum chunk size we go calculate the real chunk size by scanning all the documents in the chunk.

Closing thoughts

We have seen how chunks might silently lose the “jumbo” status and how this might impact the balancing process. We have seen a way to detect and remediate this situation.

On a related note, an interesting fact is that chunks can grow beyond the maximum size and not be marked jumbo.

It is a good practice to check for jumbo chunks periodically and remove the jumbo flag if a chunk has effectively gone below the maximum chunk size.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

Download Percona Distribution for MongoDB Today!

0 Comments

Inline Feedbacks

View all comments

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Dealing With Chunks That “Lost Weight” in MongoDB

The life of a jumbo chunk

Clearing the jumbo flag

Closing thoughts

Related

Related Blog Articles

RECOMMENDED ARTICLES

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Dealing With Chunks That “Lost Weight” in MongoDB

The life of a jumbo chunk

Clearing the jumbo flag

Closing thoughts

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

Did MyDumper LIKE Triggers?

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation