Identify Active Databases and Users in MySQLDatabase administrators often need to identify inactive databases and users to save resources.  This can be done using various methods to determine which databases and users are frequently accessed. We can save disk space and other resources by purging inactive databases and users. This task is part of regular maintenance and helps to manage disk space and other resources effectively.

This post will explain easy ways to identify active databases and users in MySQL.

Identifying active databases in MySQL

There are various ideas we can use to identify the active databases. We can identify based on the timestamp of the data file update, an entry in a slow log, scan binary logs, and parse the audit log or general log. But we can ignore the general log from this investigation as it consumes a lot of disk space, which is an overhead.

We have the below methods to identify active database objects:

  • Review slow logs to identify database objects being queried.
  • Find the updated timestamp on datadir files.
  • Scan binary logs for identifying tables that are being modified.
  • Scan audit logs for identifying database objects.

During our tests, we observed that the above methods helped us identify the active databases; however, we sometimes had to use two or more ways to get the best result. Also, if there are monthly read or written tables, the scope of the review increases, and you will have to maintain the logs for a more extended period.

We also have two stats table features present in Percona Server for MySQL and MySQL Community versions as follows:

  • User Statistics
  • Sys-schema-table-statistics

User Statistics: We have another way to identify the active databases, users, and other information in Percona Server for MySQL (and MariaDB). Both have userstat, a new plugin feature that is disabled by default and must be enabled by setting userstat to ON. This ensures that the statistics collection doesn’t cause any extra load on the server unless desired.

Sys-schema-table-statistics: sys-schema-table-statistics summarizes table statistics. By default, rows are sorted by descending total wait time. Sys schema stats table will help you to know the table statistics until MySQL Service Restart. Now, as MySQL restarts, the stats are lost; hence, we will not cover it in this blog.

We have a great utility called pt-table-usage from Percona Toolkit, which can identify database objects from logs.

Identify database tables using pt-table-usage

The pt-table-usage tool reads queries from a log and analyzes how they use tables. This tool can generate a report based on the slow query log and show you what tables were used in queries. Here you can explore more options for pt-table-usage.

To identify the databases, you must depend on slow logs where we can find the read queries. Make sure to configure slow_query_log and long_query_time parameters as below:

Slow_query_log: MySQL’s slow query log feature enables you to log queries that exceed a predefined time limit. When we use SET GLOBAL slow_query_log=1; the slow log is enabled, and to disable the slow_query_log, it needs to be set as 0. 

Long_query_time: When the slow_query_log is enabled, it logs any query that takes longer than long_query_time seconds to run. When configured with SET GLOBAL long_query_time=0;, all queries will be logged. 

SET global slow_query_log_use_global_control=’long_query_time’;: This variable is helpful to apply the global settings for current running sessions in Percona Server for MySQL. You can take an overview of slow_query_log_use_global_control with other options.

So, the above parameter settings will help to log all queries in the slow log file. Also, we can add the below parameter if we want to reduce the overhead of logging all queries and disk space.

Note: If we set the value to 10, then 10% of sessions/queries will be logged overhead of logging all queries and disk space. Once we have the slow log collection done for a sufficient time, we can parse the slow log and fetch the database objects actively used.

pt-table-usage reads queries from a log and analyzes how they use tables. This tool can generate a report based on the slow query log and show you what tables were used in queries. Here you can explore more options for pt-table-usage.

The tool prints a usage report for each table in every query, similar to the following:

From the above, the output used pt-table-usage with slow log (/var/lib/mysql/ip-172-31-92-72-slow.log) directed to a log file containing query and database object details.

Now parse the output table_usage.log using the below command to identify unique database objects.

Identify databases using slow logs

Once we have a collection of slow logs for a certain period with the above slow log configuration to capture all queries. We can easily fetch the databases read from a slow log file through Unix commands. 

Below command will list out all database names based on usage.

Example for 8.0 Version:

We can grep tables as well by exploring with Unix grep or awk commands.

Identifying active users in MySQL

We have two methods to identify active users for a specific period. 

  1. Parse Audit Logs to identify active users.
  2. Identify active users from processlist logs.

Disclaimer: Though we have user_stat plugin feature for identifying the active users in Percona Server for MySQL (and MariaDB). But the statistics in the INFORMATION_SCHEMA.CLIENT_STATISTICS table are available only from the last MySQL service restart. 

Parse Audit Logs to identify active users

The Percona Audit Log Plugin monitors and logs connection and query activity performed on a specific server. Information about the activity is stored in a log file. The Audit Log plugin is installed but, by default, is not enabled when you install Percona Server for MySQL. To check if the plugin is enabled, run the following commands:

Follow this link to enable audit_log_plugin and other options in Percona Server for MySQL.

We use audit_log_rotations and audit_log_rotate_on_size parameters to manage the disk space.

 Audit_log_rotate_on_size: This variable is measured in bytes and specifies the maximum size of the audit log file. Upon reaching this size, the audit log will be rotated. The rotated log files are present in the same directory as the current log file. The sequence number is appended to the log file name upon rotation. For this variable to take effect, set the audit_log_handler variable to FILE.

Audit_log_rotations: The variable is used to specify how many log files should be kept when the audit_log_rotate_on_size variable is set to a non-zero value. This variable has an effect only when audit_log_handler is set to FILE.

Once these variables are set, we can fetch the user details from these logs. You can use more filters to get specific users, database objects, host details, etc.

Identify Active Users from processlist logs:

To identify active users from processlist, we will need the data collected over time to have the most possibly accurate information. The processlist will collect the information only at a specific interval; if the user is not executing the query at that time, there’s a high chance of missing it. Thus the granularity of collecting the processlist is critical to get the most accurate user.

So, here we will collect a processlist every five seconds using a chron, save it in a daily file, and analyze it.

processlist_DB1_20221215.log – Processlist file name collected for one day.

Once we have a collection of processlist logs for a particular period, we will fetch the active users information from them.

processlist_DB1*202208* – Processlist file names collected for a specific time.

This command lists the active users on this node/database. From this output, we can filter out the user and then, with awk, sort commands to fetch the user list using a specific database.

At last, comes the general log, the ruler of all logs. But yes, it is logging everything that happens in MySQL.

General Log: The General Log is another option for retrieving information about database activity. It logs all activity on the instance, but it has some limitations. Enabling the General Log can consume a lot of disk space and generate a high I/O load due to the constant writing of operations to the log.

Therefore, it is generally better to avoid using the General Log and to use the methods described above instead. It would also be helpful if MySQL had a system table that stored active/inactive information about databases, users, and other items, even after MySQL restarts.

Conclusion

Following the process described above, we can identify the most active databases and users. We can collect information over a specific period and follow the same process to find the list of inactive databases. Any databases and users that do not appear in the output can be considered inactive. However, it is essential to note that we should consult with the application team before marking any databases or users as inactive, as some jobs may only run periodically (e.g., quarterly or semi-annually).

Percona Distribution for MySQL is the most complete, stable, scalable, and secure, open-source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

Download Percona Distribution for MySQL Today

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Federico Razzoli

Very nice and useful summary. But it’s important to note some things:

  • What is inactive on the master might be used on the replicas, or just one replica.
  • An inactive user might still be used as a DEFINER for stored procedures, triggers, or events.
  • User Statistics (and probably tables in the sys schema) won’t track the usage of special storage engines that don’t store local rows, such as SPIDER or BLACKHOLE.