Quick Data Archival in MySQL Using PartitionsSpace constraint has been an endless and painstaking process for many of us, especially in systems that have a high number of transactions and data growth exceeding hundreds of GBs in a matter of days. In this blog, I will share a solution to remove this space and remove rows from tables in a few seconds regardless of the size of a table without causing any additional load on the database using table partitions.

The first approach that comes to anyone’s mind for deleting the row is using a DELETE query in SQL. Suppose, one wants to delete rows from a table that are older than one year—the query for such operations would be like this:

The above query is pretty straightforward but there are a few caveats:

  • Server business will grow exponentially and could impact the usual traffic on the server.
  • To speed up the above query we need to create an appropriate index so that the query can be executed in minimal time and have less impact on server performance.
  • If we are using binlog_format as ROW, a huge number of bin logs would be created which could choke I/O of servers and require extra cleanup.
  • Even after deleting all the rows, space won’t be freed. MySQL won’t shrink tablespace and storage won’t be released to the file system. To release it to the file system, we need to recreate the table by running ANALYZE or an ALTER.

One way to get around this is using the Percona pt-archiver. There is an excellent blog you may want to follow for mysql data archival with minimal disruption. But the archiver process will take time as it also considers system load, replica lag, and specified parameters to throttle the archiving process without affecting ongoing traffic.

What I propose here is using MySQL partitioning, which is a much faster approach.

What is partitioning?

In MySQL, the InnoDB storage engine has long supported the notion that a tablespace and the MySQL Server, even prior to the introduction of partitioning, could be configured to employ different physical directories for storing different databases. Partitioning takes this notion a step further, allowing users to save portions of the table according to a user-defined rule. The user-selected rule by which data can be divided is known as a partitioning function, which could be a simple rule against a set of ranges or value lists, an internal hashing function, or a linear hashing function.

When we partition the table data-file is split across multiple partitions of smaller data-files. The operation we do against that specific range of data, will not affect the whole table as only one data file is touched.

Table without partition:

Table with partition:

How will partitioning help in quickly deleting rows and releasing space?

To archive old data in a partitioned table, we will create an empty table that is identical to the original table in structure but does not have multiple partitions like the original table. Once this table is created we will swap the newly created empty table with one of the partitions of the original table in a matter of seconds.

In this example, we are using a table partitioned on the basis of a date range.

The size of this table is around 105 GB.

We will create a table that is identical to the partitioned table but we will remove partitioning from this newly created table.

At this moment, this partition of the table has all the data and is occupying the space in the file system.

Space reserved by the file system:

To exchange partitions we need to execute a query that requires metadata lock on the table and is done almost instantaneously.

Since this operation requires metadata lock and is very fast, it is advised to run this operation in low-traffic periods.

After the swapping, the new table has all the data of the partition and the partition of the table is empty.

Space freed by filesystem:

Now, you can proceed to drop the swapped table without locking the main table or can use this technique as well to speed up the drop: Speed Up Your Large Table Drops in MySQL.

Conclusion

What we did here is use the MySQL partitioning-related functions to answer a customer challenge of purging data older than x days, sooner. But, partitioning is not a one size fits all solution, partitioning comes with its own caveats. If there are a lot of secondary indexes on a table and the search queries are not only limited to the partition key, the query performance could deteriorate exponentially. If partitioning the table is not an option, pt-archiver is a great tool that would automate the DELETE statement on your databases with minimal monitoring.

 

Further Reference:

https://dev.mysql.com/doc/refman/5.7/en/partitioning.html

https://dev.mysql.com/doc/refman/5.7/en/partitioning-management-exchange.html

Want to archive tables? Use Percona Toolkit’s pt-archiver

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments