Analyzing Amazon Aurora Slow Logs with pt-query-digest

Amazon Aurora MySQL slow query logs with pt-query-digest slow In this blog post we shall discuss how you can analyze slow query logs from Amazon Aurora for MySQL, (referred to as Amazon Aurora in the remaining blog). The tools and techniques explained here apply to the other MySQL compatible services available under Amazon Aurora. However, we’ll focus specially on analyzing slow logs from Amazon Aurora version 2 (MySQL 5.7 compatible) using pt-query-digest. We believe there is a bug in Aurora where it logs really big numbers for query execution and lock times for otherwise really fast queries.

So, the main steps we need are:

Enable slow query logging on your Amazon Aurora DB parameter group, apply the change when appropriate.
Download the slow log(s) that match the time that you are interested to investigate, and optionally concatenate them.
Run pt-query-digest on the downloaded logs and check the results.

Enable slow query logging

For our testing we decided to capture all the SELECT queries that were hitting our Amazon Aurora instance, mainly because we had a sysbench OLTP read only workload and that wouldn’t really have a lot of slow queries. An easy way to do so is to enable the capture of slow query logs and set long_query_time to 0 — you will need to enable slow query logging. To achieve that, we created a new DB parameter group and applied it to our test Aurora instance with the following three parameters set as below:

slow_query_log=1
long_query_time=0
min_examined_row_limit=0

slow_query_log=1

long_query_time=0

min_examined_row_limit=0

Once you have the above configuration applied to Amazon RDS, you will be able to see slow query logs being created in the Amazon RDS console.

Download the log file

You can download the log file of your choice using either the Amazon RDS console OR you can use the following AWS CLI command to achieve the same:

$ aws rds download-db-log-file-portion --db-instance-identifier perconasupport  --starting-token 0 --output text --log-file-name slowquery/mysql-slowquery.log.2018-09-03.09 > mysql-slowquery.log.2018-09-03.09

1	$ aws rds download-db-log-file-portion --db-instance-identifier perconasupport --starting-token 0 --output text --log-file-name slowquery/mysql-slowquery.log.2018-09-03.09 > mysql-slowquery.log.2018-09-03.09

Depending on the size of the chosen log file, the above command will take some time to complete the download.

Run pt-query-digest on the log file

Once the file has been downloaded you can analyse that using the following pt-query-digest command.

$ pt-query-digest --group-by fingerprint --order-by Query_time:sum mysql-slowquery.log.2018-09-03.09

1	$ pt-query-digest --group-by fingerprint --order-by Query_time:sum mysql-slowquery.log.2018-09-03.09

On our Aurora test slow log file, the initial results didn’t look right so we had to apply a workaround. Here is the header of the initial results from pt-query-digest:

# 456.2s user time, 2.5s system time, 43.80M rss, 141.48M vsz
# Current date: Tue Sep 4 15:54:21 2018
# Hostname: aahmed-GL503VD
# Files: mysql-slowquery.log.2018-09-03.09
# Overall: 5.13M total, 60 unique, 1.43k QPS, 507.43Gx concurrency _______
# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 1826227663297288s 1us 18446744073710s 355917782s 761us 80127878922s 93us
# Lock time 1401952549601936s 0 18446744073710s 273229812s 44us 70205933577s 23us
# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99
# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99
# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69

# Profile
# Rank Query ID Response time Calls R/Call 
# ==== ====================== =========================== ======= ========
# 1 0xE81D0B3DB4FB31BC5... 1346612317380813.0000 73.7% 3194111 421592210.5966 18... SELECT sbtest?
# 2 0x9934EF6887CC7A638... 147573952589685.0625 8.1% 319381 462062403.8051 18... SELECT sbtest?
# 3 0x8D589AFA4DFAEEED8... 110680464442264.1094 6.1% 319411 346514254.1812 18... BEGIN
# 4 0xFF7C69F51BBD3A736... 92233720368565.1875 5.1% 319388 288782673.0139 18... SELECT sbtest?
# 5 0xFFFCA4D67EA0A7888... 73786976294861.9844 4.0% 321238 229695665.8143 18... COMMIT
# MISC 0xMISC 55340232221335.8281 3.0% 657509 84166501.4796 0.0 <43 ITEMS>

# 456.2s user time, 2.5s system time, 43.80M rss, 141.48M vsz

# Current date: Tue Sep 4 15:54:21 2018

# Hostname: aahmed-GL503VD

# Files: mysql-slowquery.log.2018-09-03.09

# Overall: 5.13M total, 60 unique, 1.43k QPS, 507.43Gx concurrency _______

# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03

# Attribute total min max avg 95% stddev median

# ============ ======= ======= ======= ======= ======= ======= =======

# Exec time 1826227663297288s 1us 18446744073710s 355917782s 761us 80127878922s 93us

# Lock time 1401952549601936s 0 18446744073710s 273229812s 44us 70205933577s 23us

# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99

# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99

# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69

# Profile

# Rank Query ID Response time Calls R/Call

# ==== ====================== =========================== ======= ========

# 1 0xE81D0B3DB4FB31BC5... 1346612317380813.0000 73.7% 3194111 421592210.5966 18... SELECT sbtest?

# 2 0x9934EF6887CC7A638... 147573952589685.0625 8.1% 319381 462062403.8051 18... SELECT sbtest?

# 3 0x8D589AFA4DFAEEED8... 110680464442264.1094 6.1% 319411 346514254.1812 18... BEGIN

# 4 0xFF7C69F51BBD3A736... 92233720368565.1875 5.1% 319388 288782673.0139 18... SELECT sbtest?

# 5 0xFFFCA4D67EA0A7888... 73786976294861.9844 4.0% 321238 229695665.8143 18... COMMIT

# MISC 0xMISC 55340232221335.8281 3.0% 657509 84166501.4796 0.0 <43 ITEMS>

What’s wrong with the above results is that the total query Exec time and Lock time are very large numbers. Digging deeper into the logs revealed a problem with the slow logs themselves that had very large numbers for Query time & Lock time for some queries. For instance in our case, of 5.13 million queries in the log file, only 111 had the anomaly. Even so, it was enough to skew the results.

# Time: 2018-09-03T08:41:47.363522Z
--
SELECT c FROM sbtest1 WHERE id=24278;
# Time: 2018-09-03T08:41:49.363224Z
# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20869
# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1
SET timestamp=1535964109;
SELECT c FROM sbtest2 WHERE id=989322;
# Time: 2018-09-03T08:41:49.363296Z
--
BEGIN;
# Time: 2018-09-03T08:41:53.362947Z
# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20873
# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1
SET timestamp=1535964113;
SELECT c FROM sbtest1 WHERE id=246889;
# Time: 2018-09-03T08:41:53.363003Z

# Time: 2018-09-03T08:41:47.363522Z

SELECT c FROM sbtest1 WHERE id=24278;

# Time: 2018-09-03T08:41:49.363224Z

# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20869

# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1

SET timestamp=1535964109;

SELECT c FROM sbtest2 WHERE id=989322;

# Time: 2018-09-03T08:41:49.363296Z

BEGIN;

# Time: 2018-09-03T08:41:53.362947Z

# User@Host: perconasupport[perconasupport] @ [172.30.2.111] Id: 20873

# Query_time: 18446744073709.550781 Lock_time: 18446744073709.550781 Rows_sent: 1 Rows_examined: 1

SET timestamp=1535964113;

SELECT c FROM sbtest1 WHERE id=246889;

# Time: 2018-09-03T08:41:53.363003Z

Incorrect logging

The above two queries are, in fact, really fast, but for some reason the execution time & lock times are wrongly logged in the slow query log. Since the number of such query log records is statistically negligible compared to the total number of queries, we decided to ask pt-query-digest to ignore them using the command line parameter –attribute-value-limit . The default value of this parameter is 0. We decided to increase that to 2^32, and make it ignore the large numbers from the slow query log. So, the pt-query-digest command became:

$ pt-query-digest --group-by fingerprint --order-by Query_time:sum --attribute-value-limit=4294967296 mysql-slowquery.log.2018-09-03.09

1	$ pt-query-digest --group-by fingerprint --order-by Query_time:sum --attribute-value-limit=4294967296 mysql-slowquery.log.2018-09-03.09

This caused the 111 queries with the bad log times to be ignored and the results looked good. In our case, the ignored queries were bad variants of queries for which good versions existed. You can tell this because the number of unique queries remained the same as before after the bad variants were ignored. However, this may not always hold true and one should expect to lose some fidelity, especially if you are analyzing a smaller slow log.

# 441s user time, 450ms system time, 38.19M rss, 111.76M vsz
# Current date: Tue Sep 4 16:23:33 2018
# Hostname: aahmed-GL503VD
# Files: mysql-slowquery.log.2018-09-03.09
# Overall: 5.13M total, 60 unique, 1.43k QPS, 0.30x concurrency __________
# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 1096s 1us 198ms 213us 761us 431us 93us
# Lock time 180s 0 103ms 34us 44us 161us 23us
# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99
# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99
# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69

# Profile
# Rank Query ID Response time Calls R/Call V/M Ite
# ==== =========================== ============== ======= ====== ===== ===
# 1 0xE81D0B3DB4FB31BC558CAE... 400.1469 36.5% 3194111 0.0001 0.00 SELECT sbtest?
# 2 0xF0C5AE75A52E847D737F39... 161.4065 14.7% 319453 0.0005 0.00 SELECT sbtest?
# 3 0xFFFCA4D67EA0A788813031... 155.8740 14.2% 321238 0.0005 0.00 COMMIT
# 4 0x8D589AFA4DFAEEED85FFF5... 107.9827 9.9% 319411 0.0003 0.00 BEGIN
# 5 0x9934EF6887CC7A6384D1DE... 94.1002 8.6% 319381 0.0003 0.00 SELECT sbtest?
# 6 0xFF7C69F51BBD3A736EEB1B... 79.9279 7.3% 319388 0.0003 0.00 SELECT sbtest?
# 7 0xA729E7889F57828D3821AE... 75.3969 6.9% 319398 0.0002 0.00 SELECT sbtest?
# MISC 0xMISC 21.1212 1.9% 18658 0.0011 0.0 <41 ITEMS>

# Query 1: 1.27k QPS, 0.16x concurrency, ID 0xE81D0B3DB4FB31BC558CAEF5F387E929 at byte 358647353
# Scores: V/M = 0.00
# Time range: 2018-09-03T08:00:04 to 2018-09-03T08:42:00
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 62 3194111
# Exec time 36 400s 10us 198ms 125us 332us 300us 80us
# Lock time 74 134s 0 26ms 42us 49us 154us 27us
# Rows sent 3 3.01M 0 1 0.99 0.99 0.11 0.99
# Rows examine 1 3.01M 0 1 0.99 0.99 0.11 0.99
# Query size 57 112.37M 32 38 36.89 36.69 0.53 36.69
# String:
# Databases perconasupport
# Hosts 172.30.2.111
# Users perconasupport
# Query_time distribution
# 1us
# 10us ################################################################
# 100us ##############
# 1ms #
# 10ms #
# 100ms #
# 1s

# 441s user time, 450ms system time, 38.19M rss, 111.76M vsz

# Current date: Tue Sep 4 16:23:33 2018

# Hostname: aahmed-GL503VD

# Files: mysql-slowquery.log.2018-09-03.09

# Overall: 5.13M total, 60 unique, 1.43k QPS, 0.30x concurrency __________

# Time range: 2018-09-03T08:00:04 to 2018-09-03T09:00:03

# Attribute total min max avg 95% stddev median

# ============ ======= ======= ======= ======= ======= ======= =======

# Exec time 1096s 1us 198ms 213us 761us 431us 93us

# Lock time 180s 0 103ms 34us 44us 161us 23us

# Rows sent 94.71M 0 100 19.35 97.36 37.62 0.99

# Rows examine 216.26M 0 300 44.19 299.03 84.74 0.99

# Query size 196.24M 5 1.24k 40.08 72.65 18.90 36.69

# Profile

# Rank Query ID Response time Calls R/Call V/M Ite

# ==== =========================== ============== ======= ====== ===== ===

# 1 0xE81D0B3DB4FB31BC558CAE... 400.1469 36.5% 3194111 0.0001 0.00 SELECT sbtest?

# 2 0xF0C5AE75A52E847D737F39... 161.4065 14.7% 319453 0.0005 0.00 SELECT sbtest?

# 3 0xFFFCA4D67EA0A788813031... 155.8740 14.2% 321238 0.0005 0.00 COMMIT

# 4 0x8D589AFA4DFAEEED85FFF5... 107.9827 9.9% 319411 0.0003 0.00 BEGIN

# 5 0x9934EF6887CC7A6384D1DE... 94.1002 8.6% 319381 0.0003 0.00 SELECT sbtest?

# 6 0xFF7C69F51BBD3A736EEB1B... 79.9279 7.3% 319388 0.0003 0.00 SELECT sbtest?

# 7 0xA729E7889F57828D3821AE... 75.3969 6.9% 319398 0.0002 0.00 SELECT sbtest?

# MISC 0xMISC 21.1212 1.9% 18658 0.0011 0.0 <41 ITEMS>

# Query 1: 1.27k QPS, 0.16x concurrency, ID 0xE81D0B3DB4FB31BC558CAEF5F387E929 at byte 358647353

# Scores: V/M = 0.00

# Time range: 2018-09-03T08:00:04 to 2018-09-03T08:42:00

# Attribute pct total min max avg 95% stddev median

# ============ === ======= ======= ======= ======= ======= ======= =======

# Count 62 3194111

# Exec time 36 400s 10us 198ms 125us 332us 300us 80us

# Lock time 74 134s 0 26ms 42us 49us 154us 27us

# Rows sent 3 3.01M 0 1 0.99 0.99 0.11 0.99

# Rows examine 1 3.01M 0 1 0.99 0.99 0.11 0.99

# Query size 57 112.37M 32 38 36.89 36.69 0.53 36.69

# String:

# Databases perconasupport

# Hosts 172.30.2.111

# Users perconasupport

# Query_time distribution

# 1us

# 10us ################################################################

# 100us ##############

# 1ms #

# 10ms #

# 100ms #

# 1s

That number looks familiar

The really big number 18446744073709.550781 seemed to ring a bell. A quick web search revealed that it could be a regression of an old bug in MySQL’s code. The following bugs were found to have the same value being reported for query exec time & query lock time.

Once slow logs were enabled, we used this sysbench command to generate the workload for the Amazon Aurora instance. You might like to try it yourselves. Please note that this used sysbench version 1.0.14.

$ sysbench --db-driver=mysql --mysql-user=perconasupport --mysql-host=perconasupport-1234567.cgmobiazycdv.eu-west-1.rds.amazonaws.com --mysql-password=XXXXXXX  --mysql-db=perconasupport --range_size=100 --table_size=1000000 --tables=2 --threads=6 --events=0 --time=600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run

$ sysbench --db-driver=mysql --mysql-user=perconasupport --mysql-host=perconasupport-1234567.cgmobiazycdv.eu-west-1.rds.amazonaws.com --mysql-password=XXXXXXX --mysql-db=perconasupport --range_size=100 --table_size=1000000 --tables=2 --threads=6 --events=0 --time=600 --rand-type=uniform /usr/share/sysbench/oltp_read_only.lua run

If you are an Amazon Aurora user, have you found any problems analyzing slow query logs? You are welcome to use the comments section, below, to let me know.

Percona Toolkit

pt-query-digest is part of Percona Toolkit, a collection of advanced open source command-line tools, developed and used by the Percona technical staff. Percona Toolkit is open source and free to download and use.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Rahul Sinha

4 years ago

Hi ,I have done same but output is blank.
Is pt-query-digest is not work with procedures.call procedure_name doesnot captured by this.
below is the details.

Complete!
[root@ip-172-31-6-146 tmp]# pt-query-digest –group-by fingerprint –order-by Query_time:sum slowquery.log

# 220ms user time, 20ms system time, 37.44M rss, 232.91M vsz
# Current date: Thu Aug 1 11:05:47 2019
# Hostname: ip-172-31-6-146.ap-southeast-1.compute.internal
# Files: slowquery.log
# Overall: 0 total, 1 unique, 0 QPS, 0x concurrency ______________________
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Query size 1.03M 1.03M 1.03M 1.03M 1.03M 0 1.03M

# Profile
# Rank Query ID Response time Calls R/Call V/M Item
# =========== =========== =========== =========== =========== ===== ======
[root@ip-172-31-6-146 tmp]#

below is the slow log output
[root@ip-172-31-6-146 tmp]# head -20 slowquery.log
b/34888351875705522600949060125748799650296161739669241886 f/34893005952025475147876458366857132852441865462075490304
EVENTS 1564447805084 # Time: 190730 0:50:05
# User@Host: mxusr_lsqapps[mxusr_lsqapps] @ ec2-35-154-42-9.ap-south-1.compute.amazonaws.com [35.154.42.9] Id: 760584
# Query_time: 2.056237 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 215758
SET timestamp=1564447805;
CALL mxradon_config.udsp_SetSubscriptionSentTimeStamp(‘d0dc6b17-94c1-11e9-8ffe-02ceb8e07d66’); 1564447805000
EVENTS 1564447815084 # Time: 190730 0:50:10
# User@Host: mxusr_lsqapps[mxusr_lsqapps] @ ec2-52-66-172-24.ap-south-1.compute.amazonaws.com [52.66.172.24] Id: 760634
# Query_time: 2.052549 Lock_time: 0.000100 Rows_sent: 0 Rows_examined: 215758
SET timestamp=1564447810;
CALL mxradon_config.udsp_SetSubscriptionSentTimeStamp(‘00620051-659a-11e9-8ffe-02ceb8e07d66’); 1564447810000
EVENTS 1564447815084 # Time: 190730 0:50:11
# User@Host: mxusr_lsqapps[mxusr_lsqapps] @ ec2-35-154-42-9.ap-south-1.compute.amazonaws.com [35.154.42.9] Id: 760633
# Query_time: 2.074477 Lock_time: 0.000084 Rows_sent: 0 Rows_examined: 215758
SET timestamp=1564447811;
CALL mxradon_config.udsp_SetSubscriptionSentTimeStamp(‘50964ae7-a15a-11e9-8ffe-02ceb8e07d66’); 1564447811000
EVENTS 1564447815084 # User@Host: mxusr_lsqapps[mxusr_lsqapps] @ ec2-52-66-172-24.ap-south-1.compute.amazonaws.com [52.66.172.24] Id: 760623
# Query_time: 2.197263 Lock_time: 0.000000 Rows_sent: 0 Rows_examined: 215758
SET timestamp=1564447811;
CALL mxradon_config.udsp_SetSubscriptionSentTimeStamp(‘eae30df7-40b2-11e9-8ffe-02ceb8e07d66’); 1564447811000

Andres

3 years ago

I’ve used it for years for slow logs, but I just tried with 3.2.0 to read Aurora MySQL 5.7 general logs and it always returns: No events processed. I’ve read things about MySQL log format change, but not sure why it works for slow and not general.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Analyzing Amazon Aurora Slow Logs with pt-query-digest

Enable slow query logging

Download the log file

Run pt-query-digest on the log file

Incorrect logging

That number looks familiar

Percona Toolkit

Related

Related Blog Articles

RECOMMENDED ARTICLES

MySQL 8.4 First Peek

LDAP Authentication in PgBouncer Through PAM

Trying out the PostgreSQL pg_tde Tech Preview Release

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Analyzing Amazon Aurora Slow Logs with pt-query-digest

Enable slow query logging

Download the log file

Run pt-query-digest on the log file

Incorrect logging

That number looks familiar

Percona Toolkit

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

MySQL 8.4 First Peek

LDAP Authentication in PgBouncer Through PAM

Trying out the PostgreSQL pg_tde Tech Preview Release

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation