In a previous blog, One Index, Three Different PostgreSQL Scan Types: Bitmap, Index, and Index Only, we reviewed three ways PostgreSQL can use an index to retrieve data as quickly as possible. In the same blog, we saw the fastest and most efficient method is the Index-Only Scan because this saves time by avoiding the second read to the heap (table pages) and getting all the data requested by the query from a single read to the index only.

By definition, the queries leveraging on the index-only scan are faster than other methods, but the benefits are more visible when discussing transaction volume. The more query workload your systems generate, the more benefit you will get when going through the index-only scan.

Workload tests

To see this more clearly, consider the same table from the previous blog, this time with 500K rows. We will perform the following query:

We will run it first with a “regular” index on the age column, which will have to perform the index and table reads to retrieve the register_date and is_active columns data. Then, we will repeat the exercise with a covering index, which is declared over the age column but includes the other columns. 

The first index looks as follows:

The second one (covering index) has the following definition:

To verify the consistency of the results, we will run the load with different numbers of concurrent users: 20, 100, and 200. The results are plotted in the following graphs:

leverage index-only scans postgresql index-only scans

We got a consistent result through all three scenarios. In all of them, the workload levering on the index-only scan performed about 2.25x more transactions per second (tps) than the workload going via the index scan.

Index-only scan requirements

Using covering or composite indexes for most of our workload can look like a good idea. However, as we saw in the previous blog, a few conditions have to be met to get these benefits:

  • The index type has to support the index-only scan method. The B-tree indexes always do; the GiST and SP-GiST support the index-only scan operations for some operators, but not all, and any other index type doesn’t support it.
  • The query must select only columns covered by the index. In our previous tests, the query asked for the columns age, register_date, and is_active, which are included in the covering index. If any other column is in the SELECT, the planner won’t choose the index-only scan. 
  • PostgreSQL scans must ensure retrieved rows are “visible” to the query’s snapshot. 

Reference: https://www.postgresql.org/docs/current/indexes-index-only-scans.html#INDEXES-INDEX-ONLY-SCANS

The visibility condition

The first two conditions are easy to understand; we should be ready if we use an index type that supports the index-only scan feature and our queries select only those columns covered by the index. However, the third condition demands some extra caution to ensure we will get the benefits. Let’s expand on the third condition.

The goal of the index-only scan is to avoid the second read access to the table pages (the heap) to save that access time and retrieve the data directly from the index. Still, PostgreSQL must ensure that the index data is up-to-date with the data in the heap. 

The visibility information is not stored in the index; it belongs to the table pages, so the table reads are unavoidable at first glance. However, there is a workaround: the visibility map. If all the rows in a given page are old enough to be visible to the current and future transactions, a bit for that page is marked in the visibility map. 

Verifying the visibility of the row pages from the visibility map is much cheaper than reading the heap because it is small and mostly cached all the time. 

In other words, the more the data changes in a table, the lower the ratio of visibility in the map it will have. So, to ensure we will benefit from the index-only scan feature, we need to verify this condition.

How to check the table’s page visibility

As described in the PostgreSQL documentation, the visibility map stores only two bits for every table page, one to indicate the visibility and the other to mark if the rows have been frozen. This file can not be examined as a traditional file; luckily, we can count on the pg_visibility extension for this purpose.

This extension is delivered with the contrib modules of any base PostgreSQL installation, so adding it to a database is as easy as performing the following statement:

Working with the same table as the previous examples, we can check the table person pages as follows:

So, we can see the values set for the two bits of every table page, which can be false or true.

As soon as a table starts performing updates in its data, the visibility bit changes for the corresponding page. For example, with the following, we can know the page (blkno) where the row with id 10000 is:

So, we can check the visibility map for page 204, then update that row and check again, i.e.:

After updating a value in the row with id 10000, the visibility map shows the all_visible column as false for the corresponding page. 

As the update operations keep going against the table, the visibility map will change, and the number of pages marked as all_visible will reduce.

Checking the visibility percentage

Knowing why the visibility map changes and how to verify its contents, we can check the percentage of visible pages and thus keep the index-only scan feature working. 

The following query will show the percentage of pages marked as all visible or not:

In a freshly loaded table with no updates or just after vacuuming it (the vacuum is the process in charge of updating the visibility map), we can get an output similar to the following:

When the updates happen, the percentage of pages marked as visible will change as well:

PostgreSQL will keep trying to perform the index-only scan, but when the pages are marked as “no visible” in the visibility map, it will have to switch to a different technique and read from the heap, losing all the performance gains. And, the more pages are marked as “no visible,” the fewer index-only scan attempts will succeed.

In this case, after vacuuming the table, we will see all the pages marked as visible again:

Conclusion

An index-only scan is undoubtedly a fantastic approach to getting the best possible performance for our queries; however, before adding composite or covering indexes, we also need to pay attention to how the tables behave regarding their updates and page visibility.

With the pg_visibility extension and some queries like the one we saw in this blog, we can monitor our tables and decide if they are good to work with the index-only scan or if some tuning for the vacuum cadence or workload type can be implemented.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together. Run PostgreSQL in your production and mission-critical environments and easily deploy and orchestrate reliable PostgreSQL in Kubernetes.

 

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments