This PoC demonstrates how to install and configure pg_stat_monitor in order to extract useful and actionable metrics from a PostgreSQL database and display them on a Grafana dashboard.

About the environment

  • Grafana: version 10.0.0
  • Grafana database backend: Prometheus version 2.15.2+d
  • PostgreSQL version 13
  • pgbench version 13

In order to investigate the potential opportunities for implementing constructive and useful metrics derived from PostgreSQL into Grafana, it is necessary to generate loading using pgbench.

Configuring Grafana

For our purposes, the Grafana datasource used in this PoC is also the Postgres data cluster that is generating the data to be monitored.

Grafana Pgbench

pg_stat_monitor

About

pg_stat_monitor is a Query Performance Monitoring tool for PostgreSQL. It collects various statistics data such as query statistics, query plan, SQL comments, and other performance insights. The collected data is aggregated and presented in a single view. 

pg_stat_monitor takes its inspiration from pg_stat_statements. Unlike pg_stat_statements, which aggregates its metrics from the last time it was zeroed, pg_stat_monitor possesses the ability to bucket its output within a set number of aggregated results, thus saving user efforts from doing it themselves.

pg_stat_monitor tracks the following operations:

  • statements
  • queries
  • functions
  • stored procedures and other non-utility statements

Features

  • Time Interval Grouping: Instead of supplying one set of ever-increasing counts, pg_stat_monitor computes stats for a configured number of time intervals – time buckets. This allows for much better data accuracy, especially in the case of high-resolution or unreliable networks.
  • Multi-Dimensional Grouping: While pg_stat_statements groups counters by userid, dbid, queryid, pg_stat_monitor uses a more detailed group for higher precision. This allows a user to drill down into the performance of queries.
  • Capture Actual Parameters in the Queries: pg_stat_monitor allows you to choose if you want to see queries with placeholders for parameters or actual parameter data. This simplifies debugging and analysis processes by enabling users to execute the same query.
  • Query Plan: Each SQL is now accompanied by its actual plan that was constructed for its execution. That’s a huge advantage if you want to understand why a particular query is slower than expected.
  • Tables Access Statistics for a Statement: This allows us to easily identify all queries that accessed a given table. This set is at par with the information provided by the pg_stat_statements.
  • Histogram: Visual representation is very helpful as it can help identify issues. With the help of the histogram function, one can now view a timing/calling data histogram in response to an SQL query. And yes, it even works in PostgreSQL.

Installation (example: CENTOS8, pg14)

The simplest way to get pg_stat_monitor is to install it via Percona Distribution for PostgreSQL.

The following instructions demonstrate installing Percona Distribution for PostgreSQL and  pg_stat_monitor on a CENTOS8 OS Linux distribution:

Create extension

The pg_stat_monitor extension can be created in any database, but for the purposes of this PoC, it is placed in the database pgbench.

About pgbench

pgbench is a simple program executing benchmark tests on PostgreSQL by running the same sequence of SQL commands over and over. pgbench is capable of executing multiple concurrent database sessions and can calculate the average transaction rate (TPS) at the end of a run. Although the default configuration simulates loading based loosely upon TPC-B, it is nevertheless easy to test other use cases by writing one’s own transaction script files.

Querying the data

While it is reasonable to create panels showing real-time load in order to explore better the types of queries that can be run against pg_stat_monitor, it is more practical to copy and query the data into tables after the benchmarking has completed its run.

Table: pg_stat_monitor_archive

Save the data generated from a recently completed benchmark run into an archive table:

Table: pg_stat_monitor_qry

Extract this metric of interest, i.e., time vs total execution time:

Table: pg_stat_monitor_shared_blk_io

Extract this metric of interest, i.e., time vs shared_blk io:

Table: pg_stat_monitor_blk_io

Note: this metric requires runtime parameter track_io_timing to be set on.

Extract this metric of interest, i.e., time vs. blk io:

Table: pg_stat_monitor_uniq_id

Save a copy of all unique query IDs in order to parse out future queries from the view. 

Column pgsm_query_id identifies the query in such a manner that one can still identify the same query even when generated on other platforms under different loading conditions with  different data:

This is an example set of queries generated by pgbench. Note the numbers in column pgsm_query_id are always the same values irrespective of hosts or environments:

Benchmarking

Two types of performance monitoring are profiled:

  1. Real-time loading performance
  2. Aggregate performance over a specific time period, i.e., a snapshot.

Although the results of the benchmarking can be viewed by querying the view pg_stat_monitor you will note, as demonstrated by the bash script and SQL statements below, that the contents of the view is immediately copied and saved into a collection of tables. This is because the data will disappear over time as pg_stat_monitor cycles through its allotted number of buckets.

A script executing a benchmarking run:

Dashboard example 1: Querying saved data

Top panel (Query execution time vs. DML)

Five (5) SQL statements are used to create this panel:

Bottom panel (Query execution time vs. shared blocks)

Analysis

Here are some example patterns that can be discerned:

  1. The SELECT statements are the fastest DML operations (top panel).
  2. Although SQL statement UPDATE 1 (top panel) takes up the most time, its contents do not have much presence in the shared buffer relative to the other update statements.
  3. Inserts (top) are the 2nd slowest set of statements, yet they have very little execution time performing inserts compared to the UPDATES in the shared buffer (bottom).

Dashboard example 2: Monitoring in real time

These two panels show read/write IO performance to the persistent storage while benchmarking a live run.

Top panel (Execution time vs. DML)

Bottom panel (Time vs. IO)

Analysis

It’s quite easy to observe that SQL statement UPDATE 1 represents the bulk of the read operations.

Interestingly, writes are not as significant as reads.

Conclusion

I’m excited about pg_stat_monitor. Not only can it be used in Grafana, but it’s easily implemented in any monitoring solution, including our own Percona Monitoring and Management. It’s also incorporated in our latest version of Percona Operator for PostgreSQL.

Pg_stat_monitor is an obvious, common sense improvement over pg_stat_statement’s greatest limitation i.e., its inability to bucket metrics over time intervals. And to be frankly honest, I can see the pg_stat_monitor extension eventually replacing pg_stat_statement as the defacto extension monitoring Postgres when it comes to real-time analysis.

Happy monitoring!

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
pcpg

Wondering where is prometheus used or what is it used for, because it seems that the datasource is PG itself.

Dima

+1
And PMM uses VictoriaMetrics instead of Prometheus anyway.

Robert Bernier

That is correct pg_stat_monitor is capable of standalone servicing since the metrics reside on the production machine itself.

Hope this helps