In this blog post, we will discuss how to validate at the operating system level the effects of changing the innodb_flush_method to variations other than the default (particularly for O_DIRECT which is most commonly used) and the use of innodb_use_fdatasync.

Introduction

First, let’s define what the innodb_flush_method parameter does. It dictates how InnoDB manages the flushing of data to disk. I won’t detail what each valid value does, but you can check the documentation link here. The list of possible values is detailed below (Unix only):

  • fsync
  • O_DSYNC
  • littlesync
  • nosync
  • O_DIRECT
  • O_DIRECT_NO_FSYNC

As said, we will focus on the O_DIRECT. As part of the best practices, we recommend using O_DIRECT to avoid double-buffering, bypassing the OS cache, and thus improving performance when writing data. Below is the InnoDB architecture extracted from the official documentation:

InnoDB architecture

On platforms that support fdatasync() system calls, the innodb_use_fdatasync variable, introduced in MySQL 8.0.26, permits innodb_flush_method options that use fsync() to use fdatasync() instead. An fdatasync() system call does not flush changes to file metadata unless required for subsequent data retrieval, providing a potential performance benefit.

Because I mentioned the term system call (or syscalls), let’s define it since it is an essential point of this blog post.

To manipulate a file, MySQL and any other software must invoke syscalls. Whenever a process requires a system resource, it sends a request for that resource to the kernel by making a system call. At a high level, system calls are “services” offered by the kernel to user applications. They resemble library APIs, described as function calls with a name, parameters, and return value. The diagram below is a high-level illustration of this process:

syscalls

Question: Why not directly access the resource we want (memory, disk, etc..)? 

This is because Linux divides the execution of the process into two spaces.  User-run processes (generally referred to as user space processes) rely on services provided by the kernel. The kernel is a particular part of the operating system that handles various low-level operations in a privileged running mode. The concept of User and Kernel space is described in detail here. System security and stability would be compromised if applications could directly read and write to the kernel’s address space. In the given scenario, one process is capable of accessing the memory area of another process. This suggests a potential issue with memory isolation and could lead to security vulnerabilities.

Question: How do I check if my Operating System supports a specific syscall?

You can use the command:

It will list the syscalls available and in which Linux Kernel appeared.

Test case

We will use the strace utility and the information presented in /proc/<pid>/fdinfo/<fdinfo> to prove the theory described before. First, I will start a MySQL 8.0.33 instance with default settings. 

We can list the files opened by the mysqld process by checking the /proc/<pid>/fd/:

We can check each file descriptor by running cat /proc/<pid>/fdinfo/<file descriptor number>:

We are interested in the flags description, represented by the octal number 0100002. To interpret the flags, we can use the fdflags repository from GitHub or the command below in the shell:

And using the fdflags project to avoid manual work:

The output shows the file descriptor number, the file name, and the flags applied to it when it opened.

Next, we can confirm MySQL is using fsync() to write data with strace:

Even without enabling the innodb_use_fdatasync, you will notice fdatasync() syscall in the strace output. The fdatasync() syscall is used by default by the binary logs when sync_binlog > 0. We can confirm in strace:

Suggestion: Try setting sync_binlog=0 and check if the fdatasync()syscall is still requested by MySQL for the binary logs.

Now, we are going to add the following settings to MySQL and restart the instance:

Checking again, we can see that a new flag, O_DIRECT, was added to the files:

And checking with strace, we will see our table files(*.ibd) using fdatasync():

Conclusion

We investigated the technical nuances of InnoDB’s data-flushing mechanisms and how they interact with the operating system. We can understand the details of optimizing MySQL performance when adjusting the innodb_flush_method parameter and the innodb_use_fdatasync.

Our experiments with the strace utility and examining the file descriptors in /proc/<pid>/fdinfo/  have provided concrete evidence of the behavior changes when these settings are tweaked. The use of O_DIRECT can lead to more efficient data writing operations. Additionally, the introduction of innodb_use_fdatasync in MySQL 8.0.26 and its preference over fsync() in specific scenarios illustrate the ongoing evolution of MySQL to exploit specific system call advantages for performance gains.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments