PostgreSQL Checkpoints, Buffers, and WAL Usage with Percona Monitoring and Management

In this blog post, we will discuss how to extend Percona Monitoring and Management (PMM) to get PostgreSQL metrics on checkpointing activity, internal buffers, and WAL usage. With this data, we’ll be able to better understand and tune our Postgres servers.

We’ll assume there are working PostgreSQL and PMM environments set up already. You can search the blog for more information on this if needed. For generating load, we used pgbench with the following commands:

shell> pgbench -i -s 100 sbtest
shell> pgbench -c 8 -j 3 -T $((60*60*4)) -N -S -P 1 sbtest

1 2	shell> pgbench -i -s 100 sbtest shell> pgbench -c 8 -j 3 -T $((60604)) -N -S -P 1 sbtest

Creating a custom query collector

Note: This step will not be needed with soon-to-come postgresql_exporter PMM versions!

The first step is to include a new query that will gather data on WAL usage because this is not yet collected by default on the latest version of PMM. However, do note that only three graphs will use these custom metrics, so if you want to try out the dashboard without doing this customization, it’s also possible for you to do it (at the expense of having a bit less data).

For this, it’s as easy as executing the following (in the PMM client node that is monitoring the Postgres servers):

cd /usr/local/percona/pmm2/collectors/custom-queries/postgresql/high-resolution/
cat<<EOF >queries-postgres-wal.yml
# ######################################################
# This query gets information related to WAL LSN numbers in bytes, 
# which can be used to monitor the write workload.
# Returns a COUNTER: https://prometheus.io/docs/concepts/metric_types/
# ######################################################
pg_wal_lsn:
  query: "SELECT pg_current_wal_lsn() - '0/0' AS bytes, pg_walfile_name(pg_current_wal_lsn()) as file, pg_current_wal_insert_lsn() - '0/0' AS insert_bytes, pg_current_wal_flush_lsn() - '0/0' AS flush_bytes"
  metrics:
    - bytes:
        usage: "COUNTER"
        description: "WAL LSN (log sequence number) in bytes."
    - file:
        usage: "COUNTER"
        description: "WAL file name being written to."
    - insert_bytes:
        usage: "COUNTER"
        description: "WAL insert LSN (log sequence number) in bytes."
    - flush_bytes:
        usage: "COUNTER"
        description: "WAL flush LSN (log sequence number) in bytes."
EOF
chown pmm-agent:pmm-agent queries-postgres-wal.yml
chmod 660 queries-postgres-wal.yml

cd /usr/local/percona/pmm2/collectors/custom-queries/postgresql/high-resolution/

cat<<EOF >queries-postgres-wal.yml

# ######################################################

# This query gets information related to WAL LSN numbers in bytes,

# which can be used to monitor the write workload.

# Returns a COUNTER: https://prometheus.io/docs/concepts/metric_types/

# ######################################################

pg_wal_lsn:

query: "SELECT pg_current_wal_lsn() - '0/0' AS bytes, pg_walfile_name(pg_current_wal_lsn()) as file, pg_current_wal_insert_lsn() - '0/0' AS insert_bytes, pg_current_wal_flush_lsn() - '0/0' AS flush_bytes"

metrics:

- bytes:

usage: "COUNTER"

description: "WAL LSN (log sequence number) in bytes."

- file:

usage: "COUNTER"

description: "WAL file name being written to."

- insert_bytes:

usage: "COUNTER"

description: "WAL insert LSN (log sequence number) in bytes."

- flush_bytes:

usage: "COUNTER"

description: "WAL flush LSN (log sequence number) in bytes."

EOF

chown pmm-agent:pmm-agent queries-postgres-wal.yml

chmod 660 queries-postgres-wal.yml

This will result in the following new metrics on PMM:

pg_wal_lsn_bytes
pg_wal_lsn_file
pg_wal_lsn_insert_bytes
pg_wal_lsn_flush_bytes

pg_wal_lsn_bytes

pg_wal_lsn_file

pg_wal_lsn_insert_bytes

pg_wal_lsn_flush_bytes

Importing the custom PMM dashboard

For viewing these new metrics, we will import a new custom PMM dashboard that will provide this information in a structured and easy-to-understand way. This can be done in two easy steps (since I’ve uploaded it to Grafana Labs).

1. Click on the Dashboards -> Import menu:

Percona Monitoring and Management Dashboard

2. Import via grafana.com with ID number (19600):

Import via Grafana

3. Select the PostgreSQL folder and Metrics, and import it:

Import dashboard from Grafana

After this, you’ll be taken to the dashboard, which I generally “star” for easy access later on:

Using the new dashboard

The dashboard is divided into three main areas, all of which have relevant configurations at the top and graphs following them. The main idea is to be able to easily detect what the server is doing at any point in time so we can better understand how it reacts to the workload and tune accordingly. These graphs and panels not only show changes in usage and workload patterns but also changes in configuration, which can help during root cause analysis and performance reviews.

Checkpointing section

Here, we will find everything related to checkpointing: when are checkpoints started (and due to what) and when they finish. Checkpointing has a lot of impact on the write throughput/utilization by Postgres, so it’s important to make sure that it’s not being triggered before needed.

In the following example, we can see how at the beginning, there were many instances of forced checkpointing, not only because the checkpointer was running more often than checkpoint_timeout seconds but because of the kind of metric we can see in the Checkpoints graph (requested vs. scheduled). We can’t see it yet because the WAL graphs are at the bottom, but this was fixed after we increased the value for max_wal_size.

Checkpointing PostgreSQL

Additionally, after the metrics refresh, we can see that the Checkpoint Timeout stat panel shows another change closer to the end of our currently selected time range:

This means that we have also changed the checkpoint_timeout value. This was done only after our workload was more stable and there were no more forced checkpoints being issued.

Buffers section

This section holds configurations and metrics related to reads to and writes from the shared buffers. Again, demonstrating it by example, let’s see how much of an impact correctly sizing the shared_buffers has on our test workload. It’s set at 128Mb by default, which is hardly enough for any serious workload and will mean that we should see a lot of churn in both reads and writes until we increase it. The only downside is that modifying shared_buffers needs a restart, so before doing that, we can increase bgwriter_lru_maxpages (which doesn’t need a restart) to avoid the writer stopping each round and get a bit more performance out of it at the expense of I/O.

We can clearly see how, at first, there were constant reads into the shared buffers, and the background writer was doing most of the writes. Additionally, we can see the positive impact on increasing bgwriter_lru_maxpages because that process is not being throttled anymore. After we increased shared_buffers and restarted Postgres, the reads decreased to almost having to read nothing from the page cache or disks, and the writes are no longer done by the background writer but by the checkpointer.

WAL usage section

The last section pertains to write-ahead logs, which are another source of potential performance bottlenecks. Note that (for now) this is the only section that needs data coming from the custom query collector we added at the beginning, so you can use the rest of the dashboard even if you decide not to include it. PMM already collects the configurations, so we will only miss data from the graphs.

The WAL Writes Per Checkpoint Timeout graph will group the number of bytes that were written in chunks of checkpoint_timeout seconds so we can have a clear view of the expected max_wal_size value. If the chunks (shown as orange bars) exceed the max_wal_size (shown as a red line), it means that there will be forced checkpointing. We can easily dimension max_wal_size knowing that it should be larger than any orange bar, and we can tell when exactly it was changed, following changes in the red line. The WAL Writes graph uses the same metric but is shown as a rate, which can help us pinpoint heavy write times more granularly. Lastly, the WAL Locations graph is included for completeness and shows the different locations for WAL pointers (such as insertion and flush locations), which, according to the documentation, are mainly used for debugging purposes.

Hints and tips

Most graphs contain additional information and links on either the configuration variables or the metrics they show, so they are useful in case we need a bit of help interpreting them. Just hover the mouse over the “i” icon on each of them:

Checkpointing PostgreSQL

Getting even more data

Starting from PostgreSQL 15, we have a new view on WAL metrics: pg_stat_wal.

The metrics used in this custom collector and dashboard will work for older versions, but in the future, it will be nice to see what data we can extract from this new view and how we can use it to tune our servers.

Conclusion

This is another example of how tunable and powerful PMM is. Not only because we can add our custom metrics easily but because it already collects many metrics we can use out of the box. Additionally, it’s easy to share new PMM dashboards via the Grafana Labs page by simply publishing it and sharing the ID number. With all this new information at hand, we can now better tune our PostgreSQL instances!

Lastly, we are working closely with the PMM Dev team to include this new dashboard in the PMM server itself, so all these custom steps won’t be needed in future releases. Let us know if you have any comments on it.

Percona Monitoring and Management is a best-of-breed open source database monitoring solution. It helps you reduce complexity, optimize performance, and improve the security of your business-critical database environments, no matter where they are located or deployed.

Download Percona Monitoring and Management Today

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

mahesh

5 months ago

Can we use pmm for capacity planning for pg database like
1) db size growth
2) table growth /etc.

Also can we extract data from the pmm and send email as a weekly report ?

Agustín

Author

Reply to mahesh

4 months ago

Hi Mahesh,
You could do that with custom collectors. There are many blogs here that show how to use them.
There is no functionality like weekly reports in PMM yet, but you can manually go to the different dashboards, select the time range needed, and take your own reports.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

PostgreSQL Checkpoints, Buffers, and WAL Usage with Percona Monitoring and Management

Creating a custom query collector

Importing the custom PMM dashboard

Using the new dashboard

Checkpointing section

Buffers section

WAL usage section

Hints and tips

Getting even more data

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

PostgreSQL Checkpoints, Buffers, and WAL Usage with Percona Monitoring and Management

Creating a custom query collector

Importing the custom PMM dashboard

Using the new dashboard

Checkpointing section

Buffers section

WAL usage section

Hints and tips

Getting even more data

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation