PostgreSQL traditionally uses shell commands to achieve continuous WAL archiving, which is essential for backups and stable standby replication.  In the past, we blogged about the inefficiency in that design and how some of the backup tools like PgBackRest solve that problem. It is a well-known problem in the PostgreSQL community, and many discussions happened in the past about the same.

It is also unfortunate that PostgreSQL documentation gives an unsafe command string, employing a combination of the test and cp commands as an example:

Even though the documentation mentions it as an example to explain how the archive_command executes shell scripts/commands, we keep seeing many users copying this approach, falling into this documentation trap, and running into archiving issues.

The good news is that PostgreSQL 15 came up with a set of features to address problems associated with WAL archiving. PostgreSQL can now do WAL archiving using loadable modules. Please refer to the following commit id for the implementation details: https://git.postgresql.org/gitweb/?p=postgresql.git;h=5ef1eefd7 for details. The ability to use an archive module/library is a revolutionary step in this area.

Specifying the archive_library

Specifying the archive_library can be performed when the system is up and running. Just send a signal SIGHUP afterward to get the configuration reloaded::

It is important to keep an eye on PostgreSQL logs because anything going wrong will not be displayed on the psql session, but it will be there in the PostgreSQL logs:

We can remove the library specification if we want to use the shell commands as it was in PostgreSQL 14 or older.

And make sure that no archive_library specification is not in effect.

At this stage, PostgreSQL will use the plain old archive_command for WAL archiving. And you may specify the same.

So, in summary, the archive_command will be considered only if the archive_library spec is empty.

basic_archive: The sample archive library

PostgreSQL 15 provides a simple, sample archive library for a reference architecture, which is part of the contrib modules. If the contrib modules are already present, we can specify the archive_library like:

But this library needs further input about where to place the archive files. Which can be specified using the module-specific parameter basic_archive.archive_directory.

Otherwise, we may start seeing messages as follows in PostgreSQL logs:

You may want to refer to the official documentation also.

However, this archive library will be loaded only by the archiver process, not by the backend process of regular sessions. So any attempt to set the parameter or see the parameter can give you errors.

But regular sessions can LOAD the library if they want.

Here is a couple of important observations:

1) Instead of loading the library to a specific session, one might use other methods like shared_preload_libraries, but not advisable

2) The module-specific settings may not show up in pg_settings unless the module is loaded on the session.

If there are multiple archive libraries available, we can switch from one library to another when the system is live. But the archiver process will restart behind the scenes, and the new library specified will be loaded by the new archiver process. A message might appear in the PostgreSQL log like:

if the attempt to archive the WAL by the basic_arhive fails, there will be corresponding ERROR entries in the PostgreSQL log.

One who is closely observing the above error message might be wondering what this file: archtemp.00000001000000000000007B.2323.1669289071330, which the error message mentions, is. The name doesn’t appear to be a regular WAL segment file.  This happens because the basic_archive  module first creates a temporary file in the destination with a name like that, then it fsyncs the file, and then moves the file to its final name. This is to ensure better protection during crashes.

When the archiver is terminated by a signal (other than SIGTERM that is used as part of a server shutdown) or an error by the shell with an exit status greater than 125 (such as command not found), the archiver process aborts and it gets restarted by the postmaster. In such cases, the failures will be recorded in the pg-stat-archiver view. The errors in archive library execution also can be monitored using pg-stat-archiver.

Advantage of basic_archive

The basic_archive module does pretty much the same thing as the shell command:

But in a much better way. We already discussed one of the advantages:

It creates a temporary file, and it is fsynced and durably moved to the final archive file copy in the destination. This durability is something cp cannot provide. This drastically reduces the chance that a broken file in the archive destination can result in archive failures and sometimes even database recoverability.

Another important functional advantage of basic_archive is that it has a built-in feature to compare the file in the source (pg_wal) and archive destination when the same file exists. It compares the content of the files and verifies that they are exactly the same, and report back to the archiver a “success” so that the archiver process can continue to the next WAL segment. This also reduces the chance of archive failures. Because if the file is archived by the module, but the server crashes before recording it, PostgreSQL will try to archive the same WAL segment again. The new sample basic_archive module silently succeeds the second attempt if the file is copied and has the same content.

Additionally, the basic_archive module has a custom exception handler. This allows the archiver to treat an ERROR as a normal failure and avoid restarting the archiver again.

Potential risks/problems of using archive library

Since the archive modules are loaded into PostgreSQL, unlike shell commands executed by archive_command, they have access to server resources. So please pay attention and avoid modules from unknown sources and sources that are not trustable enough as they possess risk.

Module authors must also remember that the archiving callback runs in its transient memory context.  If the module needs to palloc() something that needs to stick around for a while, it will need to do so in a different memorycontext. One might refer to basic developer documentation and the source of base_archive source code.

The sample module: basic_archive may leave archtemp.* files behind in the archive directory if there were crashes or due to other strange failure cases. If such things happen, users may have to clean them before starting PostgreSQL again.

What can be expected in future

We can see a bright future with all these improvements, as the default archive_command  can become yet another archive module in future versions of PostgreSQL.

I wish the basic_archiveor new modules will come into existence with more features and configuration options like compression because Compression of PostgreSQL WAL Archives is Becoming More Important.

I would expect PostgreSQL support vendors to develop many more powerful libraries and add to the PostgreSQL ecosystem.

Database as a Service (DBaaS) vendors will be one of the biggest beneficiaries as they will have a very specific place and method for WAL archiving, which a module can handle in a much more efficient way.

I would expect the backup tools and methods to evolve. Now a backup tool can remain in demon mode, waiting for messages from the PostgreSQL archiver. walg_archive is an example in this direction.

I think it’s okay to expect restore_libary support also in future PostgreSQL.

Thanks to Community

Special thanks to Nathan Bossart, who took the major role in improving WAL archiving in PostgreSQL 15, and others like Robert Haas for active participation from the beginning till the final commit.

Thanks to Many others who were involved in the discussions and reviews, namely Julien Rouhaud, Andrey Borodin, Robert Haas, Stephen Frost, Magnus Hagander, David Steele, Fujii Masao, Michael Paquier, Alvaro Herrera, Tom Lane, Benoit Lobréau, Peter Eisentraut, Bharath Rupireddy, Kyotaro Horiguchi, Ian Lawrence Barwick, and many others.

Great to see that PostgreSQL 15 started addressing many of the areas of inefficiency, starting from the removal of the stats collector and progress in asynchronous I/O.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments