These days, there’s been a lot of talk in the industry about setting up one’s database system on PostgreSQL. Most times, these are Greenfield projects where the bulk of the effort is on its design and architecture. But sometimes, more often than you’d think, decisions are being made to move from an existing platform to Postgres for purely business and economic reasons. This is where the decision to sunset an otherwise successful stack has less to do with performance and more about changing realities and opportunities.

For example, suppose you are tasked to migrate a well-performing, albeit aging, DBMS to a cutting-edge PostgreSQL one; how would you do it? Though the existing system may be mature and stable, they will have distinct differences in architecture, feature set, and performance capabilities. Even differences as simple as data types can present a challenge.

About pgLoader

Enter pgloader, a command line interface (CLI) designed to facilitate the conversion and migration from one DBMS into PostgreSQL. pgLoader can extract data from MySQL, MSSQL, Oracle, and even another PostgreSQL host. It can perform ETL transforms from one database to another and, under certain special circumstances, can even be implemented as a poor man’s replication solution.

Database migration, a POC in action

Objective

Perform a database migration from MySQL to PostgreSQL.

Attention: For the sake of brevity, I’ve NOT included the database schema in this blog.

Migration environment

The migration environment used for this blog is a production MySQL database, version 5, with several hundred tables. Some of the tables have a large number of number of columns with indexes. The primary keys are comprised of both single and combined columns. Finally, many of the tables and indexes have very long names exceeding the Postgres maximum of 63 characters.

Method

There are quite a few ways to perform this migration, but being more of a PostgreSQL person than a MySQL one (just ask my coworkers!) I’ve chosen to minimize the pgloader ETL process by opting to alter the source, MySQL, database’s columnar, and table definitions instead.

TIP: One should ensure that any changes made to the source database are compatible with the existing application stack before proceeding with the migration.

Using the aforementioned development environment:

  1. Determine steps by performing trial migrations
  2. Observe and note the required updates to be made on the MySQL database in order to perform a successful database migration.
  3. Document a synthesis of the development process in order to perform the database migration currently in production.
  4. The data migration is performed using the CLI pgloader.

Steps

PostgreSQL host

  • Install
    • postgres server
    • mysql client
    • pgloader CLI
  • Create target database
  • Perform the database migration: execute pgloader with the appropriate configuration parameters
  • Post database migration
    • move all tables, indexes, and sequences to schema PUBLIC
    • ANALYZE the database

MySQL host

  • Upload database
  • alter/update the database as required: refer to section PRE-Migration for more details

pgLoader configuration file

A number of iterative passes are made using different pgloader configuration parameters in order to determine what changes are needed on the MySQL database before a final database migration is possible.

The first step is to migrate the database schema without any data in order to identify failures related to data types, column and table names, and foreign constraints.

Here’s my initial pgloader configuration file; the default behavior is to execute all migration steps until the process is completed. In the case of an ERROR, it is logged, and the migration continues. Note that it is possible to halt the migration as soon as it encounters an ERROR with the parameter on error stop; refer to the reference documentation for more information.

pgLoader invocation

Here’s a very simple execution of the migration process. Note that all messages are logged in the file DEV_migration.log:

Pre-migration: MySQL database updates

The following issues are documented in the order of their discovery in this POC. Remember that your own experiences will, of course, vary:

Issue #1: Incompatible values/data types, MySQL (datetime) -> Postgres (timestamp)

The values in many columns were changed from “0000-00-00 00:00:00” to “1970-01-01 00:00:00“.  The values were explicitly updated in order to allow Postgres to accept the value. For the MySQL DBA, this is a known issue in older MySQL DBMS. Newer versions don’t allow this behavior.

Issue #2: Incompatible value/datatype,  MySQL (time) -> Postgres (timestamp)

The table.column “pro_game_reports.game_time_tomorrow” was switched from datatype “time” to “integer“.

Issue #3: MySQL table names are too long

Since MySQL can have longer names than is legally acceptable for Postgres, pgLoader must rename them to shorter ones. There is a caveat, however, when one can encounter errors when the first 63+ characters of the source relation are the same, i.e., an attempt is made to generate duplicate names.

Issue #4: MySQL index names are too long

Similar to issue #3, pgloader automatically renames indexes as they are reconstituted into PostgreSQL, i.e., duplicate named indexes which are illegal in PostgreSQL. Assuming the name length is legal in Postgres, this issue can be resolved using the option preserve index names, preserving the original index names.

Issue #5: MySQL index names duplicated

Many indexes with the same name were duplicated because multiple columns were combined, forming a PRIMARY KEY constraint. Duplicate-named indexes in PostgreSQL, unlike MySQL, are not permitted. Consequently, they are manually renamed to unique ones:

Issue #6: Missing data detected in tables resulting in Foreign Key Constraint failures

Two values in table “pro_scouting_reports” were seen to be missing, thus preventing the creation of a number of Foreign Key constraints:

These two INSERT statements permit the creation of the FK constraints:

ATTENTION: Prior to adding column and table constraints, it is suggested that upon a successful data migration, all inconsistent records be deleted first.

Loader data migration configuration file

After all the passes of identifying and updating the MySQL database have been performed, the final version of the loader configuration file results in a pretty simple invocation; see below:

References

About pgloader

About Percona Monitoring and Management

About MySQL

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments