Esat Erkec

SQL Server Clustered Indexes internals with examples

October 14, 2020 by

In this article, we will learn the SQL Server clustered index concept and some internal details. Indexes are the database objects that accelerate the performance of data accessing when are designed properly. A clustered index is one of the main index types in SQL Server and the working principle is a bit complicated but in the next sections of this article, we are going to simply learn the clustered index working principle and uncover the secrets.

What is a clustered index?

SQL Server clustered index creates a physical sorted data structure of the table rows according to the defined index key.

    Secret:
  • The physical word is mostly used in clustered index descriptions, but the clustered index does not guarantee the physical order of the rows

This sorted data structure is called a B-tree (balanced tree). B-tree structure enables us to find the queried rows faster to using the sorted key value(s). Table data can be sorted physically in only one direction for this reason we can define only one clustered index per table. The following image illustrates a logical structure of the clustered index.

Structure of the clustered index

The root and intermediate levels contain the index key values and page pointers. The page pointers point to the previous and subsequent index pages of their own. These two levels don’t store any row data. At the same time, index pages hold information about the ahead and behind index page numbers.B-tree structure based-on three different levels:

  • Root level: The top level of the B-tree is called as root level. The root level is the starting point of the data searching
  • Intermediate level: This level provides a connection between root and leaf levels. SQL Server does not create an intermediate level when the amount of data rows are too small
  • Leaf Level: This level is the lowest level of the clustered index and all records are stored at this level

For example, when we want to query a row that’s record number 10, the storage engine will travel across the following red dotted path. The searching mechanism begins its travel at the root level and reaches the data row at the leaf level.

SQL Server clustered index record searching mechanism

As we can see, the searched row was found with the minimum read operation by the storage engine otherwise it has to read whole table rows.

SQL Server Clustered Index and Singleton Seek

After briefly discussing the clustered index structure, let’s go into the details of how a record in a table is searched by the clustered index. Firstly, we will create a Cars table and then we will populate some sample data into it.

We will define a unique clustered index to the Cars table.

SQL Server performs a clustered index seek process when it accesses data to using the B-tree structure. This operation type is represented by the following icon in the execution plans.

Clustered index seek operator icon

When we execute the following query, it will perform a clustered index seek operation. The clustered index seek operation uses the structure of the B-tree structure very efficiently and easily finds the qualified row(s).

Seek predicates details in an execution plan

In this execution plan, the seek predicates indicate that the storage engine uses the B-tree structure and finds the leaf level that stores the data rows. For this query, the uniqueness of the clustered index is very important because this constraint guarantees that only one row will return from the query. This data searching concept is called singleton seek.

    Secret:
  • Defining a clustered index as unique can gain performance improvements when the indexed column(s) is used after the WHERE clause

SQL Server Clustered Index and Range Scan

When we explain the structure of the clustered index, we mentioned an interconnection between the index pages and its backward and forward pages. This connection is very useful for queries that have upper or lower boundaries or have both of them. For example, the following query will perform singleton seek firstly and reaches the leaf level that contains the data row (Id=12). According to retrieve the index keys, the range scans operation has been performed either in forward or backward directions.

Range scan and seek predicates

In the execution plan, the ScanDirection attributes show the direction of the range seek process.

Scan direction of a query

The following image illustrates how the range scan process works.

Range scan working principle and SQL Server clustered index

Most of the time, we can hear that the SQL Server clustered index seek operator is super faster than the other data searching operations. However, this myth may not be exactly true for some queries that perform a range scan. For example, we create a table and insert some synthetic data.

Now we execute the following query and examine the execution plan.

It has read 49 pages and has performed a table scan.

Table scan operator and logical read

In this step, we will add a unique non-clustered index and re-execute the previous query.

SQL Server clustered index and performance

For this query, the clustered index seek operator does not change the performance of the query and both of them have read the same amount of the data. The problem of this query is it reads whole leaf pages of the index so it doesn’t make any difference in performance between the table scan and index seek search. On the other hand, if we repeat the same test for the following query, we will not obtain the same result.

Without the clustered index:

Table scan and logical read

With the clustered index:

Clustered index seek and logical read

Besides that, the singleton seeks principle can also work multiple times for some queries as we can see in the below query.

Multiple singleton seeks

The following image illustrates how the multiple singleton seek process works:

Multiple singleton seeks and SQL Server clustered index

The following query is another example of the multiple singleton seeks:

Multiple singleton seeks example

SQL Server Clustered Index and Primary Key

The primary key ensures that the values of a column in the table are unique so that all rows are identified with a unique key. By default when we create a primary key SQL Server creates a unique clustered index. However, we can create a primary key without a clustered index because the only mandatory requirement is uniqueness for the primary key. So the main differences between the primary key and clustered index are:

  • A primary key is a logical structure and it provides the uniqueness for a table
  • A clustered index is a physical structure and it provides the physical order of the records on the storage

Perhaps, this question is the right one to ask:

Why we use the primary key and clustered indexes on the same key column?

SQL Server adds a 4-byte uniquefier value for every duplicate index key on the non-unique clustered indexes.

At first, we will create a very basic table, it will include only two-column and then we create a non-unique clustered index for this table.

Now, we will insert a row to this table and querying the size of max and minimum record size of the index pages with help of the sys.dm_db_index_physical_stats view.

Record sizes of the clustered indexes

We will add a duplicate row and re-examine the max record size column.

dm_db_index_physical_stats DMV usage

Now let’s look at the metadata of the index page. At first, we will determine the page id of the clustered index page.

Finding the SQL Server clustered index  page number

DBCC PAGE command shows the contents of the data and index page. We will use this command to find out detailed information about the index page.

Metadata of a clustered index

As we can see the first index page uniquifier field value is 0 which means there isn’t a uniquifier defined for this index page. However, the second index page includes an uniquifier field value is 1 which means SQL Server adds an additional KeyHashValue.

Conclusion

In this article, we have uncovered some secrets of SQL Server clustered indexes. Actually clustered indexes are widely used by the SQL developers or DBAs but some details can go unnoticed. Singleton seek and range scan have to be considered for the performance of the clustered index.

Esat Erkec
Latest posts by Esat Erkec (see all)
Indexes, Performance

About Esat Erkec

Esat Erkec is a SQL Server professional who began his career 8+ years ago as a Software Developer. He is a SQL Server Microsoft Certified Solutions Expert. Most of his career has been focused on SQL Server Database Administration and Development. His current interests are in database administration and Business Intelligence. You can find him on LinkedIn. View all posts by Esat Erkec

168 Views