Please note: This blog refers to Percona Monitoring and Management v1. For information on v2, please visit Understanding Processes Running on Linux Host with Percona Monitoring and Management.

A few months ago I wrote a blog post on How to Capture Per Process Metrics in PMM. Since that time, Nick Cabatoff has made a lot of improvements to Process Exporter and I’ve improved the Grafana Dashboard to match.

I will not go through installation instructions, they are well covered in original blog post.  This post covers features available in release 0.4.0 Here are a few new features you might find of interest:

Used Memory

Memory usage in Linux is complicated.  You can look at resident memory, which shows how much space is used in RAM. However, if you have a substantial part of process swapped out because of memory pressure, you would not see it. You can also look at virtual memory–but it will include a lot of address space which was allocated and never mapped either to RAM or to swap space.   Especially for processes written in Go, the difference can be extreme. Let’s look at the process exporter itself: it uses 20MB of resident memory but over 2GB of virtual memory.

top processes by resident memory

top processes by virtual memory

Meet the Used Memory dashboard, which shows the sum of resident memory used by the process and swap space used:

used memory dashboard

There is dashboard to see process by swap space used as well, so you can see if some processes that you expect to be resident are swapped out.

Processes by Disk IO

processes by disk io

Processes by Disk IO is another graph which I often find very helpful. It is the most useful for catching the unusual suspects, when the process causing the IO is not totally obvious.

Context Switches

Context switches, as shown by VMSTAT, are often seen as an indication of contention. With contention stats per process you can see which of the process are having those context switches.

top processes by voluntary context switches

Note: while large number of context switches can be a cause of high contention, some applications and workloads are just designed in such a way. You are better off looking at the change in the number of context switches, rather than at the raw number.

CPU and Disk IO Saturation

As Brendan Gregg tells us, utilization and saturation are not the same. While CPU usage and Disk IO usage graphs show us resource utilization by different processes, they do not show saturation.

top running processes graph

For example, if you have four CPU cores then you can’t get more than four CPU cores used by any process, whether there are four or four hundred concurrent threads trying to run.

While being rather volatile as gauge metrics, top running processes and top processes waiting on IO are good metrics to understand which processes are prone to saturation.

These graphs roughly provide a breakdown of “r” and “b”  VMSTAT columns per process

Kernel Waits

Finally, you can see which kernel function (WCHAN) the process is sleeping on, which can be very helpful to access processes which are not using a lot of CPU, but are not making much progress either.

I find this graph most useful if you pick the single process in the dashboard picker:

kernel waits for sysbench

In this graph we can see sysbench has most threads sleeping in unix_stream_read_generic  which corresponds to reading the response from MySQL from UNIX socket – exactly what you would expect!

Summary

If you ever need to understand what different processes are doing in your system, then Nick’s Process Exporter is a fantastic tool to have. It just takes few minutes to get it added into your PMM installation.

If you enjoyed this post…

You might also like my pre-recorded webinar MySQL troubleshooting and performance optimization with PMM.