MySQL is an outstanding open source transactional database used by most web-based applications and is very good at handling OLTP workloads. However, modern business is very much dependent on analytical data. ClickHouse is a columnar database that handles analytical workloads quickly. I recommend you read our previous blog, Using ClickHouse as an Analytic Extension for MySQL, from Vadim to know more about this.

In this blog post, I will show you how to replicate MySQL data in real-time to ClickHouse. I am going to use the MaterializedMySQL engine for this purpose. The blog post contains the following topics.

  • MaterializedMySQL Engine – Overview
  • Prerequisites for the Replication
  • Replication setup
  • Replication testing
  • Understanding ReplacingMergeTree
  • What happens if the Replication event fails?
  • What happens if MySQL or ClickHouse restarted?
  • Conclusion

MaterializedMySQL Engine – Overview

  • MaterializedMySQL engine is an experimental release from the ClickHouse team.
  • By default, it will create a database on ClickHouse with all the tables and their data. (We can do the filter as well.)
  • Schema conversion from MySQL to ClickHouse will be taken care of by the engine. No manual work!
  • Once we configured the setup, the ClickHouse server will act as a replica.
  • ClickHouse will read the MySQL binary logs and perform the replication.
  • Replication will work well with DDL and DML statements.
  • ReplacingMergeTree will be used to perform the replication.

Prerequisites for the replication

We must meet certain requirements to configure the replication from MySQL to ClickHouse. I have split this into two parts, which have to be done from the MySQL end and the ClickHouse end. 

From the MySQL end

MySQL should be configured with the following parameters:

To configure the replication, we have to create a MySQL user with “REPLICATION SLAVE, REPLICATION CLIENT, SELECT, RELOAD” privileges. Make sure the user should be created with “mysql_native_password” authentication. When you initiate the replication, ClickHouse will copy the existing data from the MySQL table to the ClickHouse table and then start the replication from Binary logs. That is the reason we need SELECT privilege as well.

From the ClickHouse end

Make sure you have MySQL port access from the ClickHouse server to the MySQL server.

MaterializedMySQL engine is the experimental release. So, you should enable the parameter (allow_experimental_database_materialized_mysql) in the ClickHouse config file. This can be enabled in the User profile. (file: /etc/clickhouse-server/users.xml)

Activating this parameter does not need a ClickHouse restart. You can verify it using the following command.

Now, you can configure the replication from MySQL to ClickHouse.

Replication setup

To configure the replication, I have created two servers with the following hostnames. Both servers are running their respective databases.

  • mysql
  • clickhouse

At MySQL, I have created the table “percona.herc” which has five records. 

To configure the replication, we have to run the following command in ClickHouse instance.

Syntax:

In action:

Let’s verify the data in ClickHouse.

As you can see, the table “herc” was created, and the data has been copied. (Note: You can use a different database name in ClickHouse.)

ClickHouse has a different data type structure. The data types will be automatically converted by ClickHouse when we initiate the replication process. The relevant data types will be chosen. For example, below, I am sharing the table structure from MySQL and ClickHouse. You can compare the data types for the table we created. You can find more details here. 

MySQL

ClickHouse

Note: There are two new columns, “_sign” and “_version.” They are created to manage the data deduplication by ReplacingmergeTree. We will see about it later in this post.

To understand the mechanism of how ClickHouse copies the existing data from MySQL, I am sharing the logs that were collected from MySQL’s general logs. You can see the SELECT statement, which is used to copy the data. Initially, before copying the data, it used “FLUSH TABLES WITH READ LOCK → REPEATEBLE_READ → START TRANSACTION – UNLOCK TABLES”.

The following two threads will be initiated at the MySQL end to copy the data and for replication purpose.

From the ClickHouse end, ClickHouse logs are very verbose trace logs. We can find some useful information there. For example, (file: /var/log/clickhouse-server/clickhouse-server.log).

You can see the CREATE TABLE statement from MySQL and how ClickHouse rewrites it from the above logs.

So, all set! We were able to configure the replication, and the data has been copied from MySQL to ClickHouse. Next step, we can start to test the replication.

UPDATE on MySQL

In MySQL

At ClickHouse

We can see the data has been updated in ClickHouse. Let’s try to get more details from the ClickHouse log file regarding the UPDATE statement (file: /var/log/clickhouse-server/clickhouse-server.log).

As you see above, the logs were generated for the single UPDATE statement. You can find more useful information related to Binary logs, position, GTID, XID number, etc. This can be used to debug and monitor the replication events.

And, as you see above, the UPDATE statement was converted to an INSERT statement with “_sign” and “_version” columns. To understand this, we have to understand How ReplacingMergeTree works.

Understanding ReplacingMergeTree

ReplacingMergeTree deduplicates the data using the “_version” and “_sign” columns in the background. To understand this better in a practical way:

Creating a new record in MySQL

In ClickHouse, this time, I will use all the columns in the SELECT statement instead of “SELECT *.”

You can see this is the first version of data “_version = 1”.

Let’s do UPDATE on MySQL:

At ClickHouse:

You can see two records for “id = 7”. This means it is keeping the previous version of data “_version = 1, name = testRep”, as well as the latest version “_version = 2, name = UPDtestRep”.

Now, let’s execute the DELETE statement on MySQL.

At ClickHouse:

Now, you can see three records (three versions).

  • The first record is the one we initially created with INSERT ( _sign = 1, _version = 1, name = testRep )
  • Second record is created when doing UPDATE ( _sign = 1, _version = 2, name = UPDtestRep)
  • The third record is created when doing DELETE. You can see the _sign with negative value when doing DELETE ( _sign = -1, _version = 3, name = UPDtestRep)

So, here, the latest version is “3” with negative value “_sign = -1”. Which means the data got deleted. You can find this by executing the following query. 

When doing without “_sign, _version” columns OR “SELECT *”, you can find the latest value ?

What happens if the replication event fails?

All the MySQL statements are not supported for ClickHouse with the MaterialisedMySQL engine.  So, it is necessary to monitor the replication failure events. (I will create another blog to cover more on this topic.) You can use the ClickHouse server and error logs to monitor the replication failure events. For example, at MySQL:

At ClickHouse logs:

You can see in the log it mentioned: “Query wasn’t finished successfully.” This can be used to monitor the failure events. The replication chain will not be broken if the event is not executed. It will just skip that event.

What happens if MySQL or ClickHouse restarted?

In case of failures of MySQL or ClickHouse, replication will be reinitiated when they are alive. For example, If MySQL restarted, you can find the following logs:

From the above logs, you can see that ClickHouse keeps checking if MySQL is alive or not. Once MySQL is alive, it will initiate the connection to MySQL. When the replication starts, the executed events will not be executed again. 

If ClickHouse restarted

As you can see below, ClickHouse was terminated. After some time, when it starts again, ClickHouse is trying to initiate the connection to MySQL and then resume the replication.

Conclusion

Having data in real-time for analytics is helpful in improving your business. The MaterializedMySQL engine is one of the components that can be used to replicate the MySQL data to ClickHouse. But remember, this is still in the experimental phase. Altinity’s Sink connector is another good solution to explore. I will try to write about that in the future.

Percona Distribution for MySQL is the most complete, stable, scalable, and secure open source MySQL solution available, delivering enterprise-grade database environments for your most critical business applications… and it’s free to use!

 

Try Percona Distribution for MySQL today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments