Monitor and Optimize Slow Queries with PMM and EverSQL

EverSQL (use code PERCONA-EVERSQL-10OFF for a 10% discount on the first billing cycle) is a platform that intelligently tunes your SQL queries by providing query optimization recommendations, and feedback on missing indexes. This is the second post of our EverSQL series, if you missed our introductory post take a look there first and then come back to this article.

We’ll use the Stackoverflow data set again as we did in our first post.

Diving into query optimization

We’ll grab the worst performing query in the list from PMM and optimize it. This query builds a list of the top 50 most recent posts which have a score greater than two, and involves joining two large tables – posts and comments. The original runtime of that query is above 20 minutes and causes high load on the server while running.

worst-query-in-PMM

Assuming you have EverSQL’s chrome extension installed, you’ll see a new button in the PMM Query Analytics page, allowing you to send the query and schema structure directly to EverSQL, to retrieve indexing and query optimization recommendations.

eversql recommendations

eversql-dashboard1

After implementing EverSQL’s recommendations, the query’s execution duration significantly improved:

improved-query-response-time

Optimization Internals

So what was the actual optimization in this specific case? And why did it work so well? Let’s look at the original query:

SELECT
   p.title
FROM
   so.posts p 
       INNER JOIN
   so.comments c ON p.id = c.postid
WHERE
c.score > 2
GROUP BY p.id
ORDER BY p.creationdate DESC
LIMIT 100;

SELECT

p.title

FROM

so.posts p

INNER JOIN

so.comments c ON p.id = c.postid

WHERE

c.score > 2

GROUP BY p.id

ORDER BY p.creationdate DESC

LIMIT 100;

The tables’ structure:

CREATE TABLE `posts` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  ...
  PRIMARY KEY (`Id`),
  KEY `posts_idx_creationdate` (`CreationDate`),
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `comments` (
  `Id` int(11) NOT NULL,
  `CreationDate` datetime NOT NULL,
  `PostId` int(11) NOT NULL,
  `Score` int(11) DEFAULT NULL,
  ....
  PRIMARY KEY (`Id`),
  KEY `comments_idx_postid` (`PostId`),
  KEY `comments_idx_postid_score` (`PostId`,`Score`),
  KEY `comments_idx_score` (`Score`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `posts` (

`Id` int(11) NOT NULL,

`CreationDate` datetime NOT NULL,

...

PRIMARY KEY (`Id`),

KEY `posts_idx_creationdate` (`CreationDate`),

) ENGINE=InnoDB DEFAULT CHARSET=latin1

CREATE TABLE `comments` (

`Id` int(11) NOT NULL,

`CreationDate` datetime NOT NULL,

`PostId` int(11) NOT NULL,

`Score` int(11) DEFAULT NULL,

....

PRIMARY KEY (`Id`),

KEY `comments_idx_postid` (`PostId`),

KEY `comments_idx_postid_score` (`PostId`,`Score`),

KEY `comments_idx_score` (`Score`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

This query will return the post title of the latest 100 stackoverflow posts, which had at least one popular comment (with a score higher than two). The posts table contains 39,646,923 records, while the comments table contains 64,510,258 records.

This is the execution plan MySQL (v5.7.20) chose:

original-execution-plan

One of the challenges with this query is that the GROUP BY and ORDER BY clauses contain different fields, which prevent MySQL from using an index for the ORDER BY. As MySQL’s documentation states:

“In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it may still use indexes to find the rows that match the WHERE clause. Examples: … The query has different ORDER BY and GROUP BY expressions.”.

Now let’s look into the optimized query:

SELECT 
   p.title
FROM
   so.posts p
WHERE
   EXISTS( SELECT 
           1
       FROM
           so.comments c
       WHERE
           p.id = c.postid AND c.score > 2)
ORDER BY p.creationdate DESC
LIMIT 100;

SELECT

p.title

FROM

so.posts p

WHERE

EXISTS( SELECT

FROM

so.comments c

WHERE

p.id = c.postid AND c.score > 2)

ORDER BY p.creationdate DESC

LIMIT 100;

Since the comments table is joined in this query only to check for the existence of matching records in the posts table, we can use an EXISTS subquery instead. This will allow us to avoid inflating the results (by using JOIN) and then deflating them (by using GROUP BY), which are costly operations.

Now that the GROUP BY is redundant and removed, the database can optionally choose to use an index for the ORDER BY clause.

The new execution plan MySQL chooses is:

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms. We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

As mentioned above, this transformation reduced the query execution duration from ~20 minutes to 370ms.

We hope you enjoyed this post, please let us know your experiences using the integration between PMM Query Analytics and EverSQL!

Co-Author: Tomer Shay

Tomer Shay is the Founder of EverSQL. He loves being where the challenge is. In the last 12 years, he had the privilege to code a lot and lead teams of developers, while focusing on databases and performance. He enjoys using technology to bring ideas into reality, help people and see them smile.

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Jouni Järvinen

5 years ago

How is this even functional when there’s no command being used to join the data ? It should just run through the data and return nothing, or return an unorganized list of post titles and comments.

Tomer Shay

Reply to Jouni Järvinen

5 years ago

Thank you for your input. can you please explain which part you think is missing? In the modified query, the data of both tables is “joined” (though without an actual join) inside the subquery, as it is a correlated query. The relevant part is: p.id = c.postid.

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Monitor and Optimize Slow Queries with PMM and EverSQL – Part 2

Diving into query optimization

Optimization Internals

Co-Author: Tomer Shay

Related

Related Blog Articles

RECOMMENDED ARTICLES

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

How to Improve Database Performance: The Ultimate Guide

A Guide to Better Understanding MySQL Charset Levels

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

Monitor and Optimize Slow Queries with PMM and EverSQL – Part 2

Diving into query optimization

Optimization Internals

Co-Author: Tomer Shay

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

How to Improve Database Performance: The Ultimate Guide

A Guide to Better Understanding MySQL Charset Levels

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation