Indexes in MongoDB play a crucial role in improving the query performance and increasing the overall database efficiency. Without indexes, MongoDB will perform a collection scan, i.e., it will read all the documents to find if they match the condition specified in the query and cause low response time. Indexes help in navigating directly to the concerned docs by avoiding collection scans and are handy when a collection has many documents and wants to improve the query performance, application responses, and enhance the user experiences.

Here, we will discuss the different stages of the index build in MongoDB. In the process of building an index, it went through multiple phases.

Below are the stages of index build

  1. X lock
  2. Initialization
  3. IX lock
  4. Collection scan
  5. Process side writes tables
  6. Vote and commit quorum
  7. S lock
  8. Finish the processing of temporary side writes table
  9. X lock
  10. Drop-side write table
  11. Process constraint violation table
  12. Set the index as ready-to-use

X lock: Upon receiving a request for an index build, MongoDB acquires an exclusive X lock on that collection. It will stop all read/write actions on the collection as well as any replicated write ops or metadata command that uses that collection.

Initialization: At this phase, MongoDB builds these three data structures:

  • An initial index metadata entry.
  • Side writes table, a temporary table that keeps the keys created from writes during the index build process.
  • Constraint violation table, another temporary table for all docs that may cause a key generation error.  A key generation error happens when a document has an invalid key for the indexed field. It might be a document with duplicate field values when creating a unique index.

IX lock: Now, MongoDB downgrades the exclusive X lock from the collection to an Intent Exclusive IX lock. MongoDB acquires the IX lock at regular intervals for read/write operations.

Collection scan: MongoDB scans the collection and sorts the index keys in memory or temporary disk files. MongoDB yields to interleaving read/write ops during this stage. MongoDB produces a key for every doc in the collection and dumps that key into an external sorter. However, if MongoDB finds a key generation error while producing a key during the collection scan, it keeps that key in a constraint violation table to process after that. Then MongoDB dumps the sorted keys into the index after it completes the collection scan.

The key generation errors log due to constraint violations looks like this:

Process side writes tables: Now, Mongo drains the side write table using FIFO. MongoDB produces a key for every doc written to the collection throughout the index build process and puts it in the side write table to process it later. MongoDB utilizes a snapshot system to set a limit for processing the keys.

Vote and wait for CommitQuorum: This phase will be skipped if the MongoDB instance is not a part of the replica set. From MongoDB v4.4 onwards, it lodges a “vote” to the primary for committing the index. Mongod logs the “vote” to an internal replicated collection on the primary. To learn more about the CommitQuorum in index creation, please refer to the blog CommitQuorum in Index Creation.

If the node is a primary, then it waits till it gets a commitquorum of votes prior to continuing the index build process. If it’s a secondary node, then it holds till it replicates either a “commitIndexBuild” or abortIndexBuild oplog entry:

  • If it replicates the “commitIndexBuild” oplog entry, then it completes the draining of the side writes table and moves to the next phase of the index build process. 
  • Otherwise, it replicates the “abortIndexBuild” oplog entry to abort the index build and discard the index build task. 

S lock: Now, on that collection, MongoDB promotes the Intent Exclusive IX lock to a shared S lock. It will block all the write ops to the collection plus the app of any replicated write ops or metadata commands that use that specific collection.

Finish the processing of the temporary side writes table: At this stage, MongoDB extends the draining of remaining records in the side writes table. If the process finds a key generation error during the process of keys in the side writes table, then it keeps those keys in the constraint violation table to process it later.

X lock: At this stage, MongoDB promotes the shared S lock to an exclusive X lock on that specific collection. It halts all the read/write ops on the collection and the app’s replicated ops or metadata command that uses that particular collection.

Drop side write table: At this phase, MongoDB executes any unfinished operation in the side writes table before dropping it. If Mongod finds a key generation error during the keys process in the side writes table, it keeps those keys in the constraint violation table to process it later. If any other error occurs while processing the keys, the index building will fail with an error. After this, the indexes, including data, have been written to the collection.

Process constraint violation table: At this stage, 

  • If the MongoDB node is primary, then it drains the constraint violation table by the FIFO method. If any key in the constraint violation table creates a key generation error or the table is full of keys, then the index build will fail with the error ‘E11000 duplicate key error collection‘. The Primary node adds an entry in oplog of ‘abortIndexBuild’ to inform the secondaries that they should abort and discard the index build task.

  • Else, MongoDB drops the constraint violation table and adds a “commitIndexBuild” oplog entry. The secondary nodes complete the related index build after replicating the oplog entry. 

Set the index as ready: MongoDB now sets the index[es] as available to use by updating the index metadata. Then, it releases the exclusive X lock from the collection.

Summary

I hope this blog helps you understand the index-building processes and the different stages of creating indexes. As stated earlier, having proper indexes in a database improves performance and overall database efficiency. To learn more about index creation, please go through the blog Index building in Replica Set and Sharded cluster.

Percona Server for MongoDB offers all of the functionality of MongoDB Enterprise edition with a non-licensed model. This means no need to worry about purchasing licenses for production or non-production environments. You can ensure consistent deployment across all environments by utilizing non-licensed, open-source software, all while ensuring that the security standards required by your organization are being met. If support is needed, Percona has you covered there as well.

To know more about what Percona Server for MongoDB covers, please visit the blog “Why Pay for Enterprise When Open Source Has You Covered?”.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments