Since Postgres table bloat degrades database performance, we can improve its performance by removing the table bloat. We can use the pgstattuple extension to identify the bloated tables.

This extension provides several functions for obtaining tuple-level statistics. Because the pgstattuple functions produce extensive page-level information, access to them is, by default, limited. Only the pg_stat_scan_tables role has the ability to EXECUTE the pgstattuple functions by default. The pgstattuple functions are not restricted to the superuser.

Using the pgstattuple function, we can list the tables with a high percentage of dead tuples and run a manual VACUUM to reclaim the space occupied by the dead tuples.

In this blog, we will discuss the pgstattuple extension, which provides various functions to obtain tuple-level statistics.

First, create the pgstattuple extension. The latest version of pgstattuple is 1.5.

By default, only the superuser has access to the pgstattuple functions; however, you can grant non-superuser access by granting the pg_stat_scan_tables role to the non-superuser.

Next, before using the pgstattuple functions, let’s create a table and index for demonstration.

pgstattuple functions

pgstattuple(regclass): This function returns a relation’s physical length, percentage of “dead” tuples, and other info. This may help users to determine whether a vacuum is necessary or not. The argument is the target relation’s name (optionally schema-qualified) or OID. 

For example:

pgstattuple only obtains a read lock on the relation. As a result, the pgstattuple output does not represent an instantaneous snapshot. The concurrent updates will change the output of the pgstattuple.

The example below shows the tuple statistics of the workshop table.

Below are the pgstattuple output columns and their description.

ColumnTypeDescription
table_lenbigintPhysical relation length in bytes
tuple_countbigintNumber of live tuples
tuple_lenbigintTotal length of live tuples in bytes
tuple_percentfloat8Percentage of live tuples
dead_tuple_countbigintNumber of dead tuples
dead_tuple_lenbigintTotal length of dead tuples in bytes
dead_tuple_percentfloat8Percentage of dead tuples
free_spacebigintTotal free space in bytes
free_percentfloat8Percentage of free space

 

Let’s check the tuple statistics when we delete or update the rows. Below are the current tuple statistics for the reference.

Delete the few ROWS from the table.

In the below output, the dead_tuple_count shows that Postgres marked the rows as deleted but did not remove them from the table, as the length of the table is the same after the deletion of the rows.

Now execute the VACUUM on the table; after running the plain VACUUM, we see that the:

  1. free_space has increased. Vacuum has reclaimed the space and made it available for re-use at the database level.
  2. Dead rows have been removed. 
  3. The table_len is the same as O.S. level space is not reclaimed.

After running the VACUUM FULL, we can see that table_len has decreased. It shows that O.S. level space is reclaimed after running the VACUUM FULL.

pgstattuple queries to check table bloat

We can list the tables where the dead tuple percentage is high by using the query below.

The below query will show you the tuple statistics of the tables in detail where the dead tuple percentage is high.

pgstatindex(regclass): This function returns a record showing information about a B-tree index. 

For example:

The output columns and their description.

ColumnTypeDescription
versionintegerB-tree version number
tree_levelintegerTree level of the root page
index_sizebigintTotal index size in bytes
root_block_nobigintLocation of root page (zero if none)
internal_pagesbigintNumber of “internal” (upper-level) pages
leaf_pagesbigintNumber of leaf pages
empty_pagesbigintNumber of empty pages
deleted_pagesbigintNumber of deleted pages

 

pgstatginindex(regclass): This function returns a record showing information about a GIN index.

For example:

pgstathashindex(regclass): This function returns a record showing information about a HASH index. 

For example:

In this blog, we have seen how the pgstattuple can be used for tuple-level analysis and to find bloated tables or vacuuming candidates. https://www.postgresql.org/docs/current/pgstattuple.html has some additional information on the extension.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Yeshas Siddi MS

Hi Shivam,

Thank you for the blog, really helpful. I tried pgstattuple and noticed that the iops consumption goes through the roof as the size of the tables increases, which made me curious about pgstattuple_approx(regclass). This function does not do a full table scan, with the caveat that the table should be vacuumed frequently so that the visibility map provides up to date info about visible tuples.

Regards,
Yeshas Siddi M S