Recently, one of our customers approached us with a unique challenge: they needed to migrate their entire PostgreSQL cluster from DigitalOcean’s Database as a Service (DBaaS) to a DigitalOcean Droplet. The reason for their migration from DBaaS to Droplets was to lower their cost. This task proved to be quite challenging, as DigitalOcean’s documentation clearly states that “We do not currently support migrating databases from clusters inside of DigitalOcean to other clusters inside of DigitalOcean.”

In short, we have to migrate the database as per the client’s request, and we gave them two options:

1. pg_dump

2. Logical replication

The pg_dump method requires downtime as we must take the dump and restore it on the new server. Logical replication keeps the source database operational while transferring data to the target database. Once we reach the desired state, we can cut over to the target database.

For migrating to the logical replication method, all tables required to get replicated must have a Primary Key/Unique Key.

Prerequisites for migration

To migrate an existing database to a DigitalOcean database cluster, we need to ensure logical replication is enabled on the source database, have the source database’s connection credentials and disable or update any firewalls between the databases.

Have Superuser permissions: For preparing a database for migration and to migrate a database, we need superuser permissions on the source database.

Make database publicly accessible: To migrate a database, the source database’s hostname or IP address must be accessible from the public internet. Public connection information for DigitalOcean databases are in the database’s Connection Details in the control panel.

Allow remote connections: First, verify that the database allows all remote connections. This is determined by the database’s listen_addresses variable, which allows all remote connections when its value is set to *. To check its current value, run the following query in the PostgreSQL (psql) terminal:

If the output is different, allow remote connections in your database by running the following query:

We must also change your local IPv4 connection to allow all incoming IPs. To do this, find the configuration file pg_hba.conf with the following query:

Open pg_hba.conf in your text editor, such as nano: nano pg_hba.conf

Under IPv4 local connections, find and replace the IP address with 0.0.0.0/0, which allows all IPv4 addresses:

Enable logical replication:

Most cloud database providers have logical replication enabled by default. Logical replication may not be enabled if you migrate a database from an on-premises server. If your database is not set up for logical replication, the migration process will not work because the database can only move your schemas, not your data.

To verify that logical replication has been enabled, run the following query in the PostgreSQL (psql) terminal:

Change max replication slots:

After enabling logical replication, we need to verify that the database’s max_replication_slots value is equal to or greater than the number of databases we have in your PostgreSQL server. To check the current value, run the following query in the PostgreSQL (psql) terminal:

The output returns:

If it is smaller than the number of databases in our PostgreSQL server, adjust it by running the following query, where use_your_number is the number of databases in our server:

And restart the server.

Challenges we face during migration

There are some challenges when we implement a logical replication without having any primary key. There are two different methods to implement logical replication without having a PK column, the second being by using a unique key.

This can be implemented with a similar set of steps that we perform. Also its function is similar. Here, instead of the primary key, a unique key is going to keep updates.

Caveats

  • It does not support DELETE/UPDATE without a replica identity.
  • A unique index can not be used with a replica identity if NULLs are allowed.
  • Using REPLICA IDENTITY to FULL
  • When no appropriate index is found for replica identity, we may set replica identity to FULL. In this case, all the table columns collectively act as a primary key.
  • Due to supplemental logging, this generates a huge amount of WALs.
  • This may be slower than the traditional one.

Things to consider

We need to set the replica identity full for the tables that are logically migrated using only the UNIQUE key as otherwise DELETE/UPDATE won’t be supported.

After data gets synced from the DBaaS fork to the new droplet VM, we need to perform the pg_dump and pg_restore method for sequences. Now here is a question that arises: Why do we need to dump the sequence and why cannot we replicate it via logical replication?

Logical replication is designed to track the WAL changes and report to subscribers about the current states and values. It would be quite contradicting to replicate a sequence because the current sequence value does not equal the value stored in the WAL. To remedy this, PostgreSQL documentation suggests manually copying over the sequence values or using a utility such as pg_dump to do the copying.

  • Dump the sequences from the DBaaS DB fork
  • Stop the DBaaS DB fork
  • Restore the sequences on the new droplet
  • Disable the logical subscriptions

Below is the short summary that has been followed to migrate the environment:

Source: Digital Ocean DBasS
Destination: Digital Ocean Droplets
Process:

  • The client has chosen migration via a logical replication process to reduce downtime.
  • On the target VM, we installed Percona Distribution for PostgreSQL 13.7.
  • Dumped the roles from source to destination, i.e., DBasS.
  • Listed out the tables that don’t have PK and informed them.
  • The client added the PK for some tables and the UNIQUE key for some tables.
  • Installed the extensions on the VM that was present on the source cluster.
  • Dumped only schema from source, i.e., DBasS
  • Restored the schema on destination, i.e., Droplets
  • Adjusted the logical replication-related parameters on the source and destination like max_replication_slots, max_logical_replication_workers, and max_wal_senders.
  • Configured the logical replication by creating the publication and subscription between the source and destination.
  • Once the destination is in sync, disable the subscribers.
  • Dumped the sequences from the source and restored them on the destination.
  • Adjusted the listen_address, pg_hba files on the destination.
  • Dropped the subscribers on the destination

Conclusion

As we all know, PostgreSQL is an open source, object-relational database built with a focus on extensibility, data integrity, and speed. Its concurrency support makes it fully ACID-compliant, and we can achieve the result of migrating customer data from DBasS to Droplets by using one of the great features of PostgreSQL, i.e., logical replication, and we dumped the sequences from the source and restored them on the destination.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community, in a single distribution, designed and tested to work together.

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Sids

Hello Naveed, nice article.
Few questions regarding Prerequisites. From my past experience with DigitalOcean, they don’t provide access to superuser account, allow you to enable extension, etc.
How did you enabled publications in source?
How did you managed to change system parameters like listen_addresses, max_replication_slots, etc.