Historically, MySQL does not require explicit primary key defined on tables, and it’s like that by default till this day (MySQL version 8.3.0). Such a requirement is imposed through two replication methods, though: Group Replication and Percona XtraDB Cluster (PXC), where using tables without a primary key is not allowed by default. There are many well-known negative performance implications for tables lacking a primary key, where the most painful is terrible replication speed.

Today, I would like to make a quick point about one more reason to have a primary key: data size on disk!

Let’s consider this very simple example table:

Populated with 10M test rows, it takes 748M on disk. Now, given that my test table has column “a” with unique values:

I will change the (secondary) index type to primary:

The table was re-created as a result, and its size on disk was reduced to 588M, quite significantly! Why has this happened? We have exactly the same data, and both columns are indexed in both cases! Let’s check more details about the table before and after the change.

Before, without PK, when both columns were indexed via a secondary key, we could see the following:

So, there is a third index! As seen in a more detailed view of each index via the innodb_ruby tool, it’s the biggest one in terms of size (id=230):

This is how the InnoDB engine works; if no explicit PK is defined, it will add an internal one named GEN_CLUST_INDEX. As it contains the whole data row, it’s size overhead is significant.

After replacing the secondary index with the explicit primary key, the hidden one is no longer needed. Therefore, we’re left with two indexes in total:

Hidden (internal) clustered index (GEN_CLUST_INDEX) vs. generated invisible primary key (GIPK)

Every InnoDB table has a clustered key, so not defining one will not save any disk space, and sometimes, on the contrary, as demonstrated above. So, even if none of the existing columns of your problematic table is unique, it is still best to add another unique column to act as the primary key. The internal GEN_CLUST_INDEX one is not exposed to the upper MySQL layer, and only the InnoDB engine is aware of it, hence it is useless for replication speed. Therefore, an explicit PK is always a better solution.
However, if adding a new PK column is out of the question due to a legacy application problem, you should still enforce the primary key by using an invisible one. This way, you will gain performance benefits, and at the same time, the change is transparent to applications.

Let’s see how it works in practice:

So, our application is not aware of the new column at all. But we can still use it if needed, for example, to easily split table reads or writes into predictable chunks:

Please note that for existing schemas lacking a primary key, before you enforce the sql_require_primary_key variable, best to first enable the sql_generate_invisible_primary_key and re-create the data using logical dump and restore. A simple table optimization will not add the invisible PK. In any case, having an invisible PK should be a win-win solution for legacy applications.

To summarize:
• It may be worth checking if changing the index type can save disk space!
• If adding a primary key wasn’t possible due to the application limits, consider using an invisible one!

Percona Distribution for MySQL is a complete, stable, scalable, and secure open source MySQL solution, delivering enterprise-grade database environments for your most critical business applications. Deploy anywhere and implement easily with one-to-one compatibility with MySQL Community Edition.

 

Try Percona Distribution for MySQL today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments