ChatGPT vs. MySQL DBA Challenge

Given the increasing popularity of ChatGPT in solving questions of humankind, I decided to take a better look at the current state of AI. At the same time that I see database engineers relying on the tool, sites such as StackOverflow are banning ChatGPT. As the question comes from time to time, is my profession at risk of being extinct due to AI? As my mentor said when I started working in the tech area, the answer is: it depends. I say this because, at this moment in time, my answer is a big No. As we will see right after, the answers are quite generic, but they are at least in the right direction. Nothing, however, prohibits the following AI models from being smarter and providing precise answers.

Let’s look at some questions I did that a MySQL DBA usually needs to answer in an interview process.

Questions

Q: I have a MySQL server with 500 GB of RAM; my data set is 100 GB. How large my InnoDB buffer pool needs to be?

ChatGPT: The InnoDB buffer pool is used by MySQL to cache frequently accessed data in memory. It is recommended to set the buffer pool size to a value that is large enough to hold your entire dataset in memory. Since your dataset is 100 GB and you have 500 GB of RAM, you can allocate a significant portion of the available memory to the InnoDB buffer pool. A commonly recommended value for the buffer pool size is between 50% to 75% of the available memory. In your case, a buffer pool size of around 375 GB (75% of 500 GB) should be a good starting point. However, the actual value that you should use will depend on the specific characteristics of your workload. Keep in mind that setting the buffer pool size too high may result in other processes on your server competing for memory, which can impact performance. It is also important to monitor your server’s memory usage regularly to ensure that it is not being exhausted by the buffer pool.

So this answer was inaccurate and evasive. Since my data set is 100GB, I would like to see ChatGPT explicitly mention that a good starting point would be 100GB. If we expand the cache concept more, the buffer pool could be even less if the working set (hot data) is smaller.

Q: I have a MySQL server, and my application is writing at a rate of 100 MB/hour in my redo logs. How large does my innodb_redo_log_capacity need to be?

ChatGPT: The innodb_redo_log_capacity parameter specifies the maximum size of the InnoDB redo log buffer, which is used to store changes made to the database before they are written to disk. The size of the redo log buffer should be large enough to handle the maximum amount of changes that can occur within the time it takes to write the changes to disk. To calculate the required size of your redo log buffer, you can use the following formula:

innodb_redo_log_capacity = (max_rate_of_change * max_redo_log_write_time) / redo_log_block_size

In your case, you have provided the maximum rate of change, which is 100 MB/hour. You will need to estimate the maximum time it takes to write the changes to disk and the redo log block size. Assuming a redo log block size of 512 bytes (which is the default value for innodb_log_block_size), and a maximum time of 10 minutes (0.16667 hours) to write the changes to disk, the calculation would be as follows: […]

I didn’t paste the full answer on purpose because it was very wrong. The innodb_redo_log_capacity is a setting from the MySQL server, while the innodb_log_block_size is a setting from MariaDB. Also, the redo log capacity is not the same as the redo log buffer, so it mixes different technologies and concepts. If you want to know more about how to calculate a good redo log size, check out this blog post:

MySQL 8.0 Dynamic Redo Log Sizing

Q: I have the following table in my MySQL server:

CREATE TABLE `joinit` (
  `i` int(11) NOT NULL AUTO_INCREMENT,
  `s` varchar(64) DEFAULT NULL,
  `t` time NOT NULL,
  `g` int(11) NOT NULL,
  PRIMARY KEY (`i`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

CREATE TABLE `joinit` (

`i` int(11) NOT NULL AUTO_INCREMENT,

`s` varchar(64) DEFAULT NULL,

`t` time NOT NULL,

`g` int(11) NOT NULL,

PRIMARY KEY (`i`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Q: How do I optimize the following query:

SELECT * FROM joinit WHERE g = 10;

ChatGPT: To optimize the query SELECT * FROM joinit WHERE g = 10;, you can follow these steps:

Add an index to the g column: Since the query is filtering by the g column, adding an index on that column will improve the query’s performance. You can add the index using the following command:

ALTER TABLE joinit ADD INDEX (g);

[…]

The answer had a few more comments, but the fact that ChatGPT gave the first answer the “ADD INDEX” command is impressive.

Q: I have a MySQL server with 10 CPUs. What is the ideal number for my innodb_thread_concurrency?

ChatGPT: The ideal value for the innodb_thread_concurrency setting in MySQL depends on several factors, including the workload on the server, the number of CPU cores, and the amount of available memory. In general, the recommended value for innodb_thread_concurrency on a server with 10 CPUs is typically between 16 and 32. However, it’s important to note that the optimal value may vary depending on your specific workload and hardware configuration. To determine the ideal value for your server, you can start by setting the innodb_thread_concurrency value to a moderate number (e.g., 16) and monitoring the server’s performance. You can then gradually increase or decrease the value to optimize the server’s performance based on your workload.[…]

The answer here follows a general rule of thumb. If you are interested in more details about this setting, you can check the following blog post:

InnoDB Thread Concurrency

Also, during the review of this blog post, one of my colleagues was not convinced by the suggestion (which is great!). He said:

Indeed but 16 to 32 in a 10 core server may lead to context switch contention so the number IMO is not right, a good starting point is setting a number smaller than the amount of CPUs. So the actual answer is highly dependent on the workload (i.e. concurrent threads running, duration of queries, etc)

As we can see, the decision is not only based on a formula or documentation. Much of the knowledge relies on the experience and lessons learned during the problems faced. Also, the debate with other engineers allows you to gather other points of view, which improves the solution to a point that is nowhere near ChatGPT can reach.

Let’s make it harder now.

Q: How can I refresh statistics for my spatial indexes in MySQL?

ChatGPT: In MySQL, you can refresh the statistics for your spatial indexes by using the ANALYZE TABLE statement. […]

This answer is wrong. Only when the table is rebuilt the statistics for spatial indexes are refreshed. Also, there is a documented bug:

OPTIMIZE TABLE does not sort R-tree indexes, such as spatial indexes on POINT columns. (Bug #23578)

Q: I have installed MySQL 8 in Ubuntu 22.04 and moved my datadir to another location. I adjusted my datadir in my.cnf file, but my database is not starting. What could it be?

ChatGPT: If you have moved the MySQL data directory to another location and updated the datadir parameter in the my.cnf configuration file, but MySQL is still not starting, there might be other configuration parameters that need to be updated.

[…]

The answer here is generic. However, if I modify my question to make it more accurate:

Q: I have installed MySQL 8 in Ubuntu 22.04 and moved my datadir to another location. I adjusted my datadir in my.cnf file, but my database is not starting. The datadir is correct. What could it be?

Then I get a more plausible answer:

ChatGPT: If the data directory is correct and there are no errors in the MySQL error log file, it’s possible that the file permissions are not set correctly for the new data directory.[…]

Q: I have the following iostat sample:

2020_10_06_14_22_25-iostat:Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
2020_10_06_14_21_55-iostat:sda               2.32     0.84  386.08 1004.05  5570.21  7514.67    18.83     4.45    12.05    0.79    1.14   0.49  8.16
2020_10_06_14_21_55-iostat:sda               0.00     0.00  113.00 1535.00  1808.00 13727.50    18.85     1.61    10.98    0.93    0.98   0.42  8.40
2020_10_06_14_21_55-iostat:sda               0.00     1.00   79.00 1494.00  1264.00  7203.00    10.77     2.04    20.66    0.65    0.67   0.45  1.30
2020_10_06_14_21_55-iostat:sda               0.00     0.00  132.00 1500.00  2112.00 14508.50    20.37     5.87    30.15    0.68    1.19   0.47  6.10

2020_10_06_14_22_25-iostat:Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util

2020_10_06_14_21_55-iostat:sda 2.32 0.84 386.08 1004.05 5570.21 7514.67 18.83 4.45 12.05 0.79 1.14 0.49 8.16

2020_10_06_14_21_55-iostat:sda 0.00 0.00 113.00 1535.00 1808.00 13727.50 18.85 1.61 10.98 0.93 0.98 0.42 8.40

2020_10_06_14_21_55-iostat:sda 0.00 1.00 79.00 1494.00 1264.00 7203.00 10.77 2.04 20.66 0.65 0.67 0.45 1.30

2020_10_06_14_21_55-iostat:sda 0.00 0.00 132.00 1500.00 2112.00 14508.50 20.37 5.87 30.15 0.68 1.19 0.47 6.10

Q: Am I having I/O issues?

ChatGPT: To determine if you are having I/O issues, you need to analyze the %util column. This column shows the percentage of time that the disk was busy during the sampling period. A high %util value indicates that the disk is busy for a significant portion of the time, which could indicate an I/O bottleneck. In your sample, the %util values range from 1.30 to 8.40, which are relatively low values and may not necessarily indicate I/O issues.

The answer could be better. The answer does not consider the queue or latency of the sample, which could indicate a disk with issues. Basically, ChatGPT is taking into consideration only the %util column to conclude the answer.

Q: I have a server with 2 NUMA cores. Which setting can I use in my Percona Server to have better memory usage?

ChatGPT: If you have a server with 2 NUMA cores, you can use the numa-interleave option in Percona Server to improve memory usage.

[…]

This answer applies only to older versions of the Percona Server. Nowadays, you can use the innodb_numa_interleave. I found it relevant to add it here because, as we can see, the data set used to feed the AI needs to be always refreshed to provide relevant answers.

Conclusion

The answers provided by ChatGPT are accurate, but in general, there is too much generic information that could not help to work on specific issues. Also, the data set used by ChatGPT is very important because we might receive outdated or misleading information. We cannot take the answer blindly, and the fact we need to check the answer requires previous knowledge in the area.

Still, ChatGPT can provide a good starting point, especially for those starting with MySQL and Linux.

Many may say that I need to provide precise questions to obtain my desired answer. That is correct. However, working with our customers, most of the time during an emergency, the only information they have is that the application is slow or not connecting to the database. So, in the real world, asking a precise question is difficult if you are not a database expert. Being able to know the right question is half of the path to finding the right answer. And to finish with this post, I did one last question to ChatGPT:

Q: Is ChatGPT going to replace a MySQL DBA job?

Answer: No, ChatGPT is an AI language model developed by OpenAI and is not designed to replace a MySQL DBA job. While ChatGPT can provide some level of assistance and guidance for MySQL-related questions, it cannot replace the knowledge, expertise, and decision-making abilities of a human MySQL DBA. A MySQL DBA is responsible for maintaining, securing, and optimizing MySQL database systems, which involves a range of complex tasks and decision-making processes that require human expertise and experience.

So, as for now, we are safe ? !

Jokes and fears aside, I don’t believe technology is designed to replace but rather to help us to be more effective and to dedicate our time to tasks that really bring value to our customers. I’m positive and excited to see what is coming for the next years! And if ChatGPT does not give you a proper answer, feel free to contact us on social media or our forum!

Useful resources

Finally, you can reach us through social networks, our forum, or access our material using the links presented below:

2 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

fander

11 months ago

use GPT3.5 or GPT4?

Vinicius Grippa

Author

Reply to fander

11 months ago

Hi,

I tested with GPT 3.5 (the free version).

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

ChatGPT vs. MySQL DBA Challenge

Questions

Conclusion

Useful resources

Related

Related Blog Articles

RECOMMENDED ARTICLES

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

How to Improve Database Performance: The Ultimate Guide

A Guide to Better Understanding MySQL Charset Levels

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

ChatGPT vs. MySQL DBA Challenge

Questions

Conclusion

Useful resources

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Should You Deploy Your Databases on Kubernetes? And What Makes StatefulSet Worthwhile?

How to Improve Database Performance: The Ultimate Guide

A Guide to Better Understanding MySQL Charset Levels

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation