COUNT(*) vs COUNT(col) in MySQLLooking at how people are using COUNT(*) and COUNT(col), it looks like most of them think they are synonyms and just use what they happen to like, while there is a substantial difference in performance and even query results. Also, we find a difference in execution on InnoDB and MyISAM engines.

NOTE: All tests were applied for MySQL version 8.0.30, and in the background, I ran every query three to five times to make sure that all of them were fully cached in the buffer pool (for InnoDB) or by the filesystem (for MyISAM).

Count function for Innodb engine:

Let’s have look at the following series of examples for InnoDB engine:

In this InnoDB engine, we can see that it requires some time to get COUNT(*) and COUNT(val_no_null) of rows for the table, and as we will see further, MyiSAM is significantly faster compared to InnoDB table in the sense of getting an answer for COUNT(*).

But why we can’t just cache the actual number of the rows? InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction. By the way, we can use information schema to instantly get approximately the number of rows of the table in question:

As you can see it’s not the exact number of rows. However, sometimes a rough count might be sufficient.

Let’s take a look into COUNT(val_with_nulls); 

And there, as you can see we have differences in the result of COUNT(*) vs COUNT(val_with_nulls)

Why? Because the val_with_nulls column is not defined as NOT NULL there can be some NULL values in it and so MySQL has to perform a table scan to find out. This is also why the result is different for the second query

So COUNT(*) and COUNT(col) queries not only could have substantial performance differences but also ask different questions.

Let’s have another round of queries, there let’s take a look at how InnoDB manages to do COUNT(*), COUNT(val_no_null), COUNT(val_with_nulls) with the same WHERE clause:

We can see the performance of the query is equal for both cases, and it has only differences of 10%, and if you pay closer attention to EXPLAIN for COUNT(*) query, you will notice Using index. This means that MySQL can use only the index and does not touch the rest table data, which might be sufficient to get the count of rows for huge tables.

You might want to use columns that already have an index to speed up the query for huge tables.

Will we have any surprises with  COUNT(val_with_nulls)? Let’s see:

No surprises; we can see the performance of the query is pretty even across all COUNT(*), COUNT(val_with_nulls), COUNT(val_with_nulls).

Count function for MyISAM engine:

Now let’s take a look into COUNT() function for the MyISAM engine:

What flash speed we saw there!

As this is a MyISAM table, we have cached the number of rows inside of the table, this is how the MyISAM engine works. That is why it can instantly answer COUNT(*) and COUNT(val_no_null) queries.

Please, pay attention to the difference between engines: InnoDB is a transaction engine, and MyISAM is a non-transactional storage engine.

But when it comes to COUNT(val_with_nulls) for MyISAM table we can see that’s a slower InnoDB in 7 times; what a huge difference. Also, we can see the same behavior for COUNT(val_with_nulls), as NULL values obviously will not be considered. MySQL Optimizer does a good job in this case, doing a full table scan only if it is needed because the column can be NULL.

Now let’s try a few more queries for MyISAM table with WHERE clause:

As you can see, even if you have a WHERE clause, performance for COUNT(*) and COUNT(col) can be significantly different. In fact, this example shows a five times performance difference because all data fits in memory (for your information, as it’s the MyISAM engine, caching of data happens in the filesystem cache level). For IO-bound workloads, you frequently can see even a 100 times performance difference in this case.

The COUNT(*) query can use a covering index even while COUNT(col) can’t. Of course, you can extend the index to be (id,val_with_nulls) and get the query to be index covered again, but I would use this workaround only if you can’t change the query (ie, it is a third-party application) or case of when the column name is in the query for a reason, and you need a count of non-NULL values.

It is worth to note in this case, MySQL Optimizer does not do a good job of optimizing the query. One could notice (val_no_null) column is not null, so COUNT(val_no_null) is the same as COUNT(*), and so the query could be run as an index-covered query. It does not, and both queries have to perform row reads in this case.

As you can see, extending the index helps improve COUNT(val_with_nulls) query for null values about seven times compared to COUNT(val_with_nulls) without index. But also, you can see that COUNT(*) becomes around 0,6 times slower, probably because the index becomes about two times longer in this case.

At last, I want to dispel some of the delusions about COUNT(0) and COUNT(1).

As you can see, the performance and explain of the queries are the same, and it does not really matter what number you will put inside brackets in COUNT() function. It can be whatever number you want and it will be fully equal to COUNT(*) by performance and by the actual output of this query.

Subscribe
Notify of
guest

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Zonglei Dong

It is worth to note in this case, MySQL Optimizer does not do a good job of optimizing the query. One could notice (val_with_nulls) column is not null, so COUNT(val_with_nulls) is the same as COUNT(*), and so the query could be run as an index-covered query. It does not, and both queries have to perform row reads in this case.

I think it should be “COUNT(val_no_null)“, not COUNT(val_with_nulls)