Changing the data type on a column in a big table is almost always a pain. It can lock out the entire table for reading and writing because of the exclusive lock acquired by the ALTER TABLE statement that changes the data type. In this blog post, we will look at one possibility of performing such an operation with minimal impact, with minimized locks. It works for any data type; let’s look at Integer to Big Integer change as an example.

Example use case

Let’s say we have a table with many columns. One of them is of a type Integer (INT, 4 bytes) where values were incrementally growing over time. After a while, we started getting messages in the log ERROR: integer out of range, which means that the value we are trying to insert is bigger than the integer limit (Max value 2147483647). The easy choice would be to change its type to Big Integer (BIGINT, 8 bytes). Here is how we can accomplish this almost “online.”

First, we will need to create a new column with the new data type. The new column will be created empty, so it should be done in milliseconds.

To ensure it is not causing too many locks, we can write a small SQL script that will timeout the command if it takes too long to complete, i.e.

Statement_timeout will terminate ALTER TABLE command if it runs for more than 100 milliseconds, and because of ON_ERROR_STOP added to psql parameters, its process will exit with an error. Therefore “break” will not be executed after the timeout because the && operator is expecting the exit code from the first part to be 0, which means success, before executing the next one.

Next, we will have to create a function and a trigger that will copy the value of order_id from newly inserted and updated rows to order_id_tmp.

And we are ready to populate the new column with the data. For tables that are bigger in size, we can create a support table to help with this progress. It should contain the Primary Key, in this case, column “ID” of big_table, for all rows that have the new bigint column empty.

We will be populating the data in chunks, as Postgres does not like massive updates because of how MVCC (Multi-Version Concurrency Control) mechanism and autovacuum works. Update would update all tuples at once, would cause huge bloat, and the table would grow double in size.

Run the above transaction in a loop until the temporary table, temp_order_id_bigint, is empty.

Use LIMIT that works for you best; depending on row size, you may achieve better results with lower or higher values. In my test case, 5000 was optimal, and every 5k rows was finished in less than 300ms. 

Once the data is populated, we are ready to perform column switchover. We can use a similar approach with the while loop, as previously mentioned. It will make sure that the lock is not waiting and blocking other sessions for too long.

Et voila, we just changed the column from Integer to Big Integer with less than one second of locking. The operation itself could take much longer than just altering the data type of the existing column because of all the preparation work, data population, etc. But we avoided the downtime or maintenance window that would be required using the standard approach.

Enjoy!

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together.

 

Download Percona Distribution for PostgreSQL Today!

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments