This blog was originally published in June 2022 and was updated in April 2024.

PostgreSQL’s built-in autovacuum – the housekeeper – is improving, version after version. It is becoming more capable while reducing its overhead and addressing edge cases. I think there is no PostgreSQL version that comes out without any autovacuum improvement, and no doubt that it is good enough for the small and not performance-critical systems.

But the autovacuum is far from a perfect fit for many environments. While working with many of the customer environments we keep seeing cases where the built-in logic is not sufficient. As I keep doing fixes for many Percona customers, I thought of noting down important points for everyone.

Most Common Problems and Limitations in PostgreSQL Autovacuum

Autovacuum algorithm/strategy : DML activity is the only trigger

There are some serious limitations to the algorithm/strategy used by autovacuum.
As discussed in the blog post: Tuning Autovacuum in PostgreSQL and Autovacuum Internals, the tables become candidates for autovacuum based on parameters like:

  • autovacuum_vacuum_scale_factor 
  • autovacuum_vacuum_threshold
  • autovacuum_analyze_scale_factor
  • autovacuum_analyze_threshold
  • autovacuum_vacuum_insert_scale_factor
  • autovacuum_vacuum_insert_threshold

 

Basically, all these parameters are based on the DML operations performed on the table. The first four parameters are based on UPDATEs and DELETEs on the table because they cause the “dead” tuples. The 5th and 6th parameters are obviously based on INSERTs.
Considering only the DML activities for triggering autovacuum can only solve part of the problem because autovacuum has a lot more things to do than just handling dead tuples. The most important and crucial thing is the FREEZE operation / Table age. So it is quite normal to see tables get aged until it hit the autovacuum_freeze_max_age limit, triggering the aggressive autovacuum to prevent the wraparound. Subsequently, they are heavy on the system and cause other side effects on unexpected times, which is not what we want to see in production systems.

Candidate tables are ordered alphabetically

This is not something very widely discussed. This has another serious side effect: few tables always get priority, and few others never get a chance to get vacuumed. They starve until they reach the limit autovacuum_freeze_max_age. Again the aggressive autovacuum kicks in with aggression and side effects.

Tables become candidates for autovacuum during peak hours

The autovacuum settings are based on scale factors/thresholds and Indirectly based on DML Operations. The chance that the table crosses these limits is high when there is a high number of transactions on the table – which is the peak hours. Effectively it gets kicked in at the very wrong time.

No consideration for long starving tables

It is very common to see a few tables become candidates for vacuuming too frequently and occupy all workers repeatedly. While other tables down the list of candidature remain unvacuumed for a long. There is no way to give some weightage for tables which are not considered for long.

No way to control the throttle of autovacuum workers dynamically

This is probably the worst. Even if there is an informed DBA, who wants to adjust the autovacuum_vacuum_cost_limit based on need or time window and signal the PostgreSQL.
For example:

ALTER SYSTEM set autovacuum_vacuum_cost_limit = 2000;
select pg_reload_conf();

This has no effect on the currently running autovacuum workers. Only the next worker starting will consider this setting, so it can not be used to address the problem.

The attempt by DBAs to tune parameters often backfires

After seeing the aged tables and starving tables, desperate DBAs keep aggressive settings and a higher number of workers. Many times this pushes the system way beyond its limit because everything gets in the wrong time with high aggression when the system has already a high number of active sessions. Multiplied by the maintenance_work_mem allocations by workers. system performance suffers to a great extent. The worst I have seen is autovacuum workers occupying up to 50% of server resources.

Autovacuum during the active time window defeats its own purpose

The autovacuum worker will be referring to an old xid/snapshot if it takes time to complete during the high activity window. So effectively it won’t be cleaning the dead tuples generated during the same duration, which is against the very purpose of autovacuum.

Many Starved tables reach the wraparound prevention condition at the same time

It is very common to see that the tables which are starved for a longer duration of autovacuum reach autovacuum_freeze_max_age and wraparound prevention aggressive vacuum get triggered concurrently, making the situation much worse. Due to such ineffectiveness, we keep seeing DBAs tending to disable the autovacuum altogether and invite a bigger set of problems and even outages. At the very least, my request to anyone who is new to PostgreSQL is, please never try to turn off the autovacuum. That’s not the way to address autovacuum-related issues.

Tuning PostgreSQL Autovacuum

Tuning PostgreSQL autovacuum is obviously the first line of action. Autovacuum settings can be adjusted at the Global level or at the table level.

Global level settings

The parameters autovacuum_vacuum_cost_limit and autovacuum_vacuum_cost_delay are the main two parameters to control the throttle of the autovacuum workers. autovacuum_max_workers controls how many workers will be working at a time on different tables. By default, autovacuum_vacuum_cost_limit will be disabled (-1) which means the value of the other parameter vacuum_cost_limit will be in effect. So the very first thing suggestible is to set a value for autovacuum_vacuum_cost_limit which will help us to control the autovacuum workers alone.

One common mistake I see across many installations is that autovacuum_max_workers is set to a very high value like 15!. assuming that this makes the autovacuum run faster. Please remember that autovacuum_vacuum_cost_limit is divided among all workers. So higher the number of workers, each worker runs slower. And slower workers mean ineffective cleanup as mentioned above. Moreover, each of them can occupy up to maintenance_work_mem In general, the default value of autovacuum_max_workers, which is 3 will be sufficient. Please consider increasing it only if it is an absolute necessity.

Table-level settings

The blanket setting at the Instance level might not work great for at least a few tables. These outliers need special treatment and tuning the settings at table level might become unavoidable. I would start with those tables which become candidates too frequently for autovacuum.

PostgreSQL with log_autovacuum_min_duration  the setting gives great details of those tables which are frequently becoming candidates and those autovacuum runs which took considerable time and effort. Personally, I prefer this as the starting point. A summary of autovacuum runs can be obtained by comparing the autovacuum_count of pg_stat_all_tables taken in two different timestamps also. We need to consider is the HOT (Heap Only Tuple) updates and fillfactor. Hot update information can be analyzed using the n_tup_hot_upd of the same view (pg_stat_all_tables), Tuning this can bring down the vacuum requirements drastically.

Equipped with all this information analysis, specific table-level settings can be adjusted.

For example:

Tools/Scripts like pg_gather can be handy to assess the autovacuum frequency and suggest settings like fillfactor as well as table level autovacuum settings.

Supplementary Scheduled Vaccum Job

Our aim is not to disable the autovacuum, but to supplement the autovacuum with our knowledge about the system. It need not be complex at all. The simplest we can have is to run a ‘VACUUM FREEZE’ on tables which are having maximum age for itself or its TOAST.

For example, we can have vaccumjob.sql file with the following content

The query gets 100 aged tables that are not currently undergoing autovacuum and run a “VACUUM FREEZE” on them. (The gexec at the end executes the query output)

This can be scheduled using corn for a low activity window like:

If there are multiple low-impact windows, all of them can be made use of using multiple schedules.

A copy of the above SQL script is available in GitHub for easy download

Practically, we have seen that the supplementary, scheduled vacuum jobs, based on the table age approach, have the following positive effects

  1. The chance of those tables becoming candidates again during the peak times is drastically reduced.
  2. Able to achieve very effective utilization of server resources during the off-peak times for the vacuum and freeze operation.
  3. Since the candidature was selected based on totally different criteria (age of table) than the default (scale factor and threshold), the chance of a few tables starving forever is eliminated. Moreover, that removes the possibility of the same table becoming a candidate for vacuum again and again.
  4. In customer/user environments, the wraparound prevention autovacuum is almost never reported again.

Best Practices for PostgreSQL Autovacuum Settings

It is not rare to see systems where autovacuum remains untuned or poor settings are used at the instance level till the table level. Just want to summarize that:

  1. Default settings may not work great in most of the systems. Repeated autovacuum runs on a few tables while other tables starve for autovacuum is very common.
  2. Poor settings can result in autovacuum workers taking a considerable part of the server resources with little gain.
  3. Autovacuum has the natural tendency to start at the wrong time when a system undergoes heavy transactions.
  4. Practically, a scheduled vacuum job becomes necessary for those systems that undergo heavy transactions and with a large number of transaction tables, and are expected to have spikes, and peak time periods of load.

Clear analysis and tuning are important. It is always highly recommended to have a custom vacuum job that takes up your knowledge about the specific system and time windows of the least impact.

Learn how to use Vacuum for Better PostgreSQL Performance in our Free Webinar

When done correctly, vacuuming can greatly speed up performance and optimize dead space. However, vacuuming can also degrade performance when done incorrectly. Our expert-led webinar, “Using Vacuum to Clean Up PostgreSQL for Performance” will teach you how to vacuum your PostgreSQL database without negatively impacting performance. You’ll learn why you need to vacuum, how to schedule vacuuming, and tools you can leverage to monitor for related performance issues.

 

Watch the Recording

FAQs

What is PostgreSQL autovacuum?

In order to preserve database efficiency and reclaim space, PostgreSQL’s autovacuum is a background process that automatically cleans up database tables by removing dead tuple rows. It helps in preventing table bloat and ensures that the database performs optimally without manual intervention.

What is the difference between vacuum and Autovacuum in PostgreSQL?

The primary difference between VACUUM and autovacuum in PostgreSQL is that VACUUM is a manual maintenance operation performed by the user to reclaim space after data deletions, while autovacuum is an automated process managed by the PostgreSQL system to perform the same task automatically. Autovacuum keeps an eye on database activity in order to trigger VACUUM and ANALYZE operations as necessary.

How do I turn on autovacuum in PostgreSQL?

To turn on autovacuum in PostgreSQL, ensure that the autovacuum parameter is set to on in the PostgreSQL configuration file (postgresql.conf). This setting is enabled by default in recent PostgreSQL versions. After this, a database server restart or reload may be required for the changes to take effect.

Is vacuum necessary in PostgreSQL?

Yes, maintaining the database’s health and performance requires running VACUUM in PostgreSQL. It cleans up unused space from deleted or updated rows, known as dead tuples, and helps to prevent table bloat. This process is crucial for databases with frequent data modifications, as it ensures efficient space utilization and maintains the accuracy of PostgreSQL’s statistics for query optimization.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments