Esat Erkec
Database Auto Update Statistics

Deep Dive into SQL Server Statistics

August 23, 2021 by

In this article, we will go through some details about SQL Server statistics.

The first step into SQL Server statistics

The query plan is a set of instructions that describes how a query will be executed by the database engine. The query plans are created by the query optimizer in SQL Server and it requires some inputs to generate a query plan before executing a query. At this point, SQL Server statistics play a key role in the query plan generation process because it is one of the most important inputs as they store the distribution of the column data of the tables in a histogram. This statistical meta-data is used to estimate how many rows will be returned by the executed query and according to this estimation, the I/O, CPU, and memory resource desired from the database engine.

Auto Create Statistics and Auto Update Statistics

SQL Server allows us to create and update statistics operations automatically. When we create an index, SQL Server automatically creates statistics for the indexed columns. At the same time, the query optimizer can decide to create statistics for the single columns during the execution of the query. This option can be easily enabled or disabled using SQL Server Management Studio (SSMS) on the database level. We can find out this feature on the Options tab of the database properties.

Database Auto Create Statistics

On the other hand, statistics will be out of date (stale) after the data modification operations because data modifications will change the data value distributions of the columns. In this case, the statistics will need to be updated so that the query optimizer can generate more accurate execution plans. The statistics update operation can be completed either manually or automatically. Auto Update Statistics feature can be enabled or disabled under the Options tab of the database properties.

Database Auto Update Statistics

The query optimizer will update the SQL Server statistics automatically when the following circumstances occur:

  • The table row number has gone from 0 rows to more than 0 rows
  • The table had 500 rows or less when the statistics were last sampled and since had more than 500 modifications
  • The table had more than 500 rows when the statistics were last updated and the number of row modifications is more than MIN ( 500 + (0.20 * n), SQRT(1,000 * n) ) formula result after the statistics were last sampled

What’s going on behind the scenes

As we stated, SQL Server has the ability to automatically create or update statistics during the execution of the query. However, the SQL Server database engine performs some tasks when it automatically creates or updates SQL Server statistics. In order to understand this concept better, we will show an example. As a first step, we will create a sample table in the Adventureworks database and will populate it with some data.

After that, we will create an extended event that will capture:

  • When an automatic statistics create operation is performed
  • When an automatic statistics update operation is performed

The extended event which will be created includes the Auto Stats event and this event can capture index and column statistics updates automatically. At the same time, this event can capture when SQL Server statistics are being loaded for use by the optimizer. Now we will execute the following query and look at the captured events.

  • Tip: We will add 1=(SELECT 1) expression add the end of the query to avoid trivial query plans

After executing the query, we will see two captured events. The first one describes that new statistics are created by the query optimizer and the second event specifies that the created SQL Server statistics are loaded and used by the query optimizer.

Track the SQL Server statistics events

At the same time, the performed events can be tracked with SQL Server Profiler but don’t forget that profiler is a deprecated tool. The profiler clearly shows all events during the execution of the query.

Track the statistics events with a profiler

As can be seen from the illustration above, during the execution of the query, the SQL Server query optimizer decides to create the statistics, and also it recompiles the query. The creation of the new statistic process is related to the Auto Create Statistics option. After creating or updating statics, the query optimizer recompiles the executed query, assuming there will be more up-to-date histogram data.

To simulate the Auto Update Statistics statistics option we will insert some new rows into the TestNewProduction table and the table rows number will reach over 500 so the query optimizer will update the statistic when we execute the select statement.

After inserting the new rows into the table, we will execute the same query and re-check the result of the captured event list.

Database Auto Update Statistics events track

As we can see, the statistic has updated during the execution of the query, and in this context, this question may appear in your minds:

“Do creating or updating statistic operations affect the query performance?”

We will answer this question with a little test. For this test, I have created an extended version of the TestNewProduction and have populated 70 million rows into this table. The statistic creation process has taken about 500 (500.000 microseconds) milliseconds on my laptop.

Creating statistics operations affect the query performance

However, the performance impact of this operation will also strictly depend on how many rows are used to calculate the histogram data. The rows_sampled column indicates how many rows are used in the last statistics create or update operation and it can be found out in the statistic header.

Statistics and rows sampled

For our example, the sampled number of rows 503.654, and it corresponds to %0,7. In order to create a more advanced histogram, the statistics can be computed by using all rows in the table. The FULL SCAN expression helps to create or update SQL Server statistical operations by reading all the data of the relevant column or columns. This time, creating the statistic operation will lock the table for a long time and process duration will also increase. Another point that we need to consider is that more I/O may occur updating statistics with FULL SCAN expression because SQL Server will read whole data of the column or columns.

Create statistics with FULL SCAN

Auto Update Statistics Asynchronously

Auto Update Statistics Asynchronously is another option about the SQL Server statistics that can be configured at the database level. When we enable the Auto Update Statistics Asynchronously option query optimizer does not wait for the update of statistics like the synchronous mode and uses the outdated statistics. On the other hand, another thread starts to update statistics operations. In this option, query processing does not wait for the time that will be spent to update statistic histogram but the generated query plan will use the stale statistic data. We can enable or disable under the Options tab of the database properties.

Auto Update Statistics Asynchronously

After enabling this feature, we will add some new rows into the TestNewProduction so that the data modification counter exceeds the update statistics threshold.

As a last, we will execute our sample select query and it leads to update SQL Server statistics asynchronously.

All statistics events will be captured by the Capture_StatsUpdate extended event.

Tracking events of the auto update statistics asynchronously

When we look at the captured events, we can see that the firstly updated statistics process assigned a queue and then it is processed in the queue. The sys.dm_exec_background_job_queue_stats dynamic management view (DMV) help to monitor this queue.

sys.dm_exec_background_job_queue_stats result

Conclusion

In this article, we have explored some secret tips about SQL Server statistics. Statistics are game-changers for query performance because query optimizers use statistics to estimate how many rows will be returned from the executed queries.

Esat Erkec
Latest posts by Esat Erkec (see all)
168 Views