Attempts to compress PostgreSQL WAL at different levels have always been around since the beginning. Some of the built-in features (wal_compression) have been there since 2016, and almost all backup tools do the WAL compression before taking it to the backup repository. But it is time to take another look at the built-in wal_compression because PostgreSQL 15 has much more to offer. There will be a considerable gain if we combine this with other sets of great improvements in WAL archiving in PostgreSQL 15, as discussed in previous posts New WAL Archive Module/Library in PostgreSQL 15 and Speed Up of the WAL Archiving in PostgreSQL 15.

I am not planning to look at WAL segment file compression by the backup tools because it is external to PostgreSQL and is primarily a feature of backup tooling. Even if we are not using specialized backup tools, we can still compress the WAL segment files as part of archiving, which can give great rewards.

Let’s look at what PostgreSQL offers internally for WAL compression. The WAL-compression feature inside the PostgreSQL kicks in (if we opt-in) while doing Full Page Writes into WAL, which can save a lot of I/O overhead. Reduced WAL segment size has further gains in replication and backup because less data need to be transmitted.

What is Full Page Writes?

Novice users might be wondering what is “Full Page Writes” , and we should remember that PostgreSQL uses 8k pages.

But the host machine might be dealing with pages of smaller size, say 4k pages.

PostgreSQL deals with 8k pages as the “atomic unit” for reading and writing. But since the host machine has a smaller page size, it will split the 8k page and treat each OS page as an atomic unit. This causes a problem if there are abrupt disruptions; a part of the 8k page can be saved, while another part might be lost because the host machine may not treat the latter as part of the original piece. This is generally referred to as “partial page writes” or “torn pages.”

Such “torn pages” are corruptions from the database point of view. If a datafile with such torn page exists, PostgreSQL loses the consistency of the page. This is not a problem for PostgreSQL alone; every database software needs to deal with this problem. For example, MySQL/InnoDB deals with this problem using a Doublewrite buffer from where a copy of the non-corrupt page can be obtained during the recovery. PostgreSQL’s approach to the problem is slightly different. PostgreSQL writes a copy of the full page, which is modified for the first time after a checkpoint into WAL logs. Since WALs are synced frequently, and PostgreSQL can decide on the point up too which the recovery should happen, it is a safe place to save the copy of “full page.”

The performance penalty of Full Page Writes

As mentioned above, when the database page is modified for the first time after a checkpoint, that needs to be safely written to WAL as the trustable reference. So during crash-recovery, PostgreSQL can safely apply the consistent pages from the WAL logs. But this comes with a considerable performance implication.

Now we know there is a high chance of too many full-page writes immediately after a checkpoint. This will be clearly visible in PostgreSQL performance benchmarks as a “Sawtooth wave” pattern observed by Vadim in his tests:

Sawtooth wave PostgreSQL

As we can see, the throughput suddenly drops after every checkpoint due to heavy WAL writing and gradually picks up until the next checkpoint.

Full Page compression and recent improvements

PostgreSQL 14 and older

The Full Pages are much bigger and fully contained. This allows one to compress those full pages before writing into WAL segments. This feature landed in PostgreSQL way back in PostgreSQL 9.5; it uses the built-in LZ compression implementation, often referred to as “pglz”. But it was not too popular because of the CPU overhead. So it has never been part of the general advice for optimizations. Those users who opt-in for FP compression after analyzing their workload could just set the parameter wal_compression on all supported PostgreSQL versions and signal the server as superuser.

PostgreSQL 15+

Modern compression algorithms started offering much better compression while taking fewer CPU cycles. Lz4 is a good example. PostgreSQL 15 modifies the same parameter wal_compression to take enumerated values like pglz, lz4, and zstd, in addition to on and off, which are used for backward compatibility. The boolean equivalent values like on, true, yes, and 1 are equivalent to “pglz.”

Unlike the pglz, which is a built-in compression library in PostgreSQL, the new compression functions are provided by external libraries. So they need to be enabled during the build time. The configuration flags –with-lz4 and –with-zstd for lz4 and zstd, respectively.

Official builds from the Percona distribution have these configuration flags enabled during the build process. But packages/binaries from other repositories or source codes can be checked as below. It’s always good to check before attempting to use.

Why compression of FP writes only?

Individual processes generate WAL records, and latency is very crucial for transactions. So a coordinated effort for compressing WAL record generation by multiple sessions may not add much value. But there are areas like indexing or bulk data load which could potentially benefit from WAL compression if such a feature is included in the future.

A quick look at compression options

When it is about compression, automatically, the question will be about the savings in I/O. As we know, compression comes at the expense of CPU usage. My objective is to quickly check whether there is any benefit when there is already high CPU utilization and whether there is any adverse (negative) effect on overall TPS.

There is considerable saving if there are a lot of full-page writes. I could artificially trigger checkpoints to see the end effect on total WAL generation.

WAL generation postgresql

Lz4 gives a compression comparable to the default pglz at a much less CPU cost. Zstd can provide the highest compression (30% more compared to lz4).

In a system already on the brink of too much WAL generation, the uncompressed WAL can trigger many more checkpoints, resulting in even more WAL generation.

Summary

Some of the key points/takeaways I have from the discussion in the community and as well as in my simple tests:

  • The compression method pglz available in the older version was not very efficient. It uses more CPU and can affect the TPS for specific workloads and machine configuration.
  • Modern compression algorithms and libraries are excellent. They do a much better job than what was available in PostgreSQL (pglz).
  • I couldn’t see any adverse effect on the TPS on quick tests. Instead, I could observe 10-15% better throughput with compression enabled, maybe because of less I/O wait.
  • Lz4 compression can be the choice if the database workload is CPU bound because it is light on the CPU. It can give compression very close to pglz without causing high CPU overhead.
  • Zstd can be chosen if the server load is not CPU bound because it can give us better compression at the expense of more CPU utilization.
  • An indirect benefit of WAL compression is it reduces the chance for checkpoints triggered by the volume of WALs generated (max_wal_size).

The actual benefit of compression depends on many factors. Configured checkpoint frequency, the chance of checkpoints triggered by WAL generation, storage performance, acceptable CPU overhead, type of CPU architecture, and many other factors. With the introduction of new options, the entry barrier is significantly reduced.

Word of caution

If you are using lz4 or zstd compression for all its greatness, please make sure that the PostgreSQL binaries on standby are also capable of doing it. This warning may mainly apply to users building their PostgreSQL from source code.

Thanks to community

Thanks to Michael Paquier, who took the major part in the development of PG 15 improvements,  Justin Pryzby, who played the key role in this development, Andrey Borodin, who initiated the discussion, and many others who participated in the discussions.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments