PostgreSQL Column Alignment and Padding - How To Improve Performance With Smarter Table Design

In this blog post, we are going to briefly discuss the reasons why column order matters when creating tables -or relations- in PostgreSQL.

Alignment

In its internal representation of tuples, Postgres will set each column at a position depending on the alignment defined for the data type used (following the same sequential order as the one used in the table definition we use to create it). We can read more about it in the following online documentation sections:

https://www.postgresql.org/docs/current/catalog-pg-type.html

Quoting the relevant excerpt here:

typalign is the alignment required when storing a value of this type. It applies to storage on disk as well as most representations of the value inside PostgreSQL. When multiple values are stored consecutively, such as in the representation of a complete row on disk, padding is inserted before a datum of this type so that it begins on the specified boundary. The alignment reference is the beginning of the first datum in the sequence. Possible values are:
c = char alignment, i.e., no alignment needed.
s = short alignment (2 bytes on most machines).
i = int alignment (4 bytes on most machines).
d = double alignment (8 bytes on many machines, but by no means all).

As we will check in more depth later on, this means that the exact position of the column in the tuple array will vary depending on which data type you use: some require less and can be “stored closer together,” and others don’t.

And from https://www.postgresql.org/docs/current/storage-page-layout.html:

> […] Interpreting the actual data can only be done with information obtained from other tables, mostly pg_attribute. The key values needed to identify field locations are attlen and attalign.

Additionally, we can say that choosing a good column order will not only impact disk usage but also memory and, potentially, CPU usage.

Padding

The first documentation quote brings us to padding. Alignment is the reason why column order matters, and padding is the reason why we could be wasting space in each tuple. This is best explained with a simple real-life example.

Let’s say we have two relations, each with the exact same columns but in different order.

postgres=# CREATE TABLE t1 (c1 smallint, c2 bigint, c3 int, c4 bigint);

postgres=# CREATE TABLE t2 (c1 bigint, c2 bigint, c3 int, c4 smallint);

postgres=# CREATE TABLE t1 (c1 smallint, c2 bigint, c3 int, c4 bigint);

postgres=# CREATE TABLE t2 (c1 bigint, c2 bigint, c3 int, c4 smallint);

We have two BIGINT columns, one INT and one SMALLINT in each, so we can say they are exactly the same in terms of the data they can hold. If we go back to the documentation, it mentions:

When multiple values are stored consecutively, such as in the representation of a complete row on disk, padding is inserted before a datum of this type so that it begins on the specified boundary.

Ok, so how do we know each data type’s boundary? We can use the following queries to check the alignment of data types used for our tables in particular:

postgres=# SELECT a.attname, t.typname, t.typalign, t.typlen
FROM pg_class c
 JOIN pg_attribute a ON (a.attrelid = c.oid)
 JOIN pg_type t ON (t.oid = a.atttypid)
WHERE c.relname = 't1' AND a.attnum >= 0;
 attname | typname | typalign | typlen
---------+---------+----------+--------
 c1      | int2    | s        |      2
 c2      | int8    | d        |      8
 c3      | int4    | i        |      4
 c4      | int8    | d        |      8
(4 rows)

postgres=# SELECT a.attname, t.typname, t.typalign, t.typlen
FROM pg_class c
 JOIN pg_attribute a ON (a.attrelid = c.oid)
 JOIN pg_type t ON (t.oid = a.atttypid)
WHERE c.relname = 't2' AND a.attnum >= 0;
 attname | typname | typalign | typlen
---------+---------+----------+--------
 c1      | int8    | d        |      8
 c2      | int8    | d        |      8
 c3      | int4    | i        |      4
 c4      | int2    | s        |      2
(4 rows)

postgres=# SELECT a.attname, t.typname, t.typalign, t.typlen

FROM pg_class c

JOIN pg_attribute a ON (a.attrelid = c.oid)

JOIN pg_type t ON (t.oid = a.atttypid)

WHERE c.relname = 't1' AND a.attnum >= 0;

attname | typname | typalign | typlen

---------+---------+----------+--------

c1 | int2 | s | 2

c2 | int8 | d | 8

c3 | int4 | i | 4

c4 | int8 | d | 8

(4 rows)

postgres=# SELECT a.attname, t.typname, t.typalign, t.typlen

FROM pg_class c

JOIN pg_attribute a ON (a.attrelid = c.oid)

JOIN pg_type t ON (t.oid = a.atttypid)

WHERE c.relname = 't2' AND a.attnum >= 0;

attname | typname | typalign | typlen

---------+---------+----------+--------

c1 | int8 | d | 8

c2 | int8 | d | 8

c3 | int4 | i | 4

c4 | int2 | s | 2

(4 rows)

In summary:

BIGINT has an alignment of eight bytes
INT has an alignment of four bytes
SMALLINT has an alignment of two bytes

Alignment and padding example

Now let’s think of the tuple in terms of an array, where each position represents one bit. If the first column takes two bytes, and the second one has an alignment of eight, we will need to have six more bytes added -as padding- before we can store its data. This leaves us with the following representations for tuples on each table:

Alignment and padding example

The zeros in the “tuple” rows mean that padding is needed, and space is wasted with no data at all. As we can see in this small example, we’ll need ten more bytes per tuple for table t1, even if they’ll hold the same exact information! Of course, you could say that ten more bytes per row is negligible, and it may be; but this could be a very different story in environments with large datasets. You can also think of it as a 45% overhead, in this case, wasted on padding.

An easy way to check how many bytes each tuple takes on each table is to use pg_column_size. In this example, we are subtracting 24 bits to discard the size of the header of the row, but do note that the header could be larger in other cases.

postgres=# select pg_column_size(t1.*) - 24 as row_size from t1 limit 1;
 row_size
----------
       32
(1 row)

postgres=# select pg_column_size(t2.*) - 24 as row_size from t2 limit 1;
 row_size
----------
       22
(1 row)

postgres=# select pg_column_size(t1.*) - 24 as row_size from t1 limit 1;

row_size

----------

(1 row)

postgres=# select pg_column_size(t2.*) - 24 as row_size from t2 limit 1;

row_size

----------

(1 row)

The difference between them is 10 bytes, as we calculated above.

We are not considering variable-length data types here because they are not as straightforward. If there is interest in a second part exploring this in more depth, let me know.

Using the pageinspect extension

Another interesting way to check this is to use the pageinspect extension. With it, we can get the actual contents (bytes) for each page used in a table and see exactly how the tuples are being stored. First, we need to enable it (note: this is not suggested for use in a production environment):

postgres=# CREATE EXTENSION IF NOT EXISTS pageinspect;
CREATE EXTENSION

1 2	postgres=# CREATE EXTENSION IF NOT EXISTS pageinspect; CREATE EXTENSION

We will use two functions:

https://www.postgresql.org/docs/current/pageinspect.html#PAGEINSPECT-GENERAL-FUNCS

get_raw_page(relname text, fork text, blkno bigint) returns bytea
get_raw_page reads the specified block of the named relation and returns a copy as a bytea value. This allows a single time-consistent copy of the block to be obtained. fork should be ‘main’ for the main data fork, ‘fsm’ for the free space map, ‘vm’ for the visibility map, or ‘init’ for the initialization fork.

https://www.postgresql.org/docs/current/pageinspect.html#PAGEINSPECT-HEAP-FUNCS

heap_page_items(page bytea) returns setof record
heap_page_items shows all line pointers on a heap page. For those line pointers that are in use, tuple headers as well as tuple raw data are also shown. All tuples are shown, whether or not the tuples were visible to an MVCC snapshot at the time the raw page was copied.

Let’s now insert MAX values for the respective types of each column, so we can clearly see when padding is added (before these inserts, I recreated the tables, so we have only one row in each):

postgres=# INSERT INTO t1 VALUES (32767,9223372036854775807,2147483647,9223372036854775807);
INSERT 0 1
postgres=# INSERT INTO t2 VALUES (9223372036854775807,9223372036854775807,2147483647,32767);
INSERT 0 1

postgres=# INSERT INTO t1 VALUES (32767,9223372036854775807,2147483647,9223372036854775807);

INSERT 0 1

postgres=# INSERT INTO t2 VALUES (9223372036854775807,9223372036854775807,2147483647,32767);

INSERT 0 1

We can use a query like the following to inspect all the page data available:

pastgres=# SELECT * FROM heap_page_items(get_raw_page('t1','main',0))gx

1	pastgres=# SELECT * FROM heap_page_items(get_raw_page('t1','main',0))gx

But in this case, we’ll focus on t_data (the actual tuple data):

postgres=# SELECT t_data FROM heap_page_items(get_raw_page('t1','main',0))gx
-[ RECORD 1 ]--------------------------------------------------------------
t_data | xff7f000000000000ffffffffffffff7fffffff7f00000000ffffffffffffff7f

postgres=# SELECT t_data FROM heap_page_items(get_raw_page('t2','main',0))gx
-[ RECORD 1 ]------------------------------------------
t_data | xffffffffffffff7fffffffffffffff7fffffff7fff7f

postgres=# SELECT t_data FROM heap_page_items(get_raw_page('t1','main',0))gx

-[ RECORD 1 ]--------------------------------------------------------------

t_data | xff7f000000000000ffffffffffffff7fffffff7f00000000ffffffffffffff7f

postgres=# SELECT t_data FROM heap_page_items(get_raw_page('t2','main',0))gx

-[ RECORD 1 ]------------------------------------------

t_data | xffffffffffffff7fffffffffffffff7fffffff7fff7f

Since it’s in HEX (hexadecimal) notation, each number is four bits (which means for every two zeros, we have a byte of padding). Since we are using max values for each datatype, we know the zeros are padding, and this is why we don’t get any zeros printed in the second query. We have 12 zeros in the first column and eight zeros in the third, which gives 20 zeros in total, amounting to:

4 bits * 20 = 80 bits = 10 bytes

1	4 bits * 20 = 80 bits = 10 bytes

Which is the same number we got with the different approaches taken above. This tool gives us a more visual understanding of the theory we discussed above, which is always helpful.

Conclusion

Taking some time to think about the low-level row structure can save us resources and improve performance when designing our tables. Data will need less disk space, and the overall query and server performance should be positively impacted by it. On the other hand, it could make table definitions less clear if we enforce minimizing padding, so we might need to be a bit lax at times (for instance, we could make the exception for primary key columns and have those at the beginning).

If you are interested in this topic, there is a Percona Live recording that covers this in more depth: “Reducing Costs and Improving Performance With Data Modeling in Postgres“ by Charly Batista.

Percona Distribution for PostgreSQL provides the best and most critical enterprise components from the open-source community in a single distribution, designed and tested to work together. Run PostgreSQL in your production and mission-critical environments and easily deploy and orchestrate reliable PostgreSQL in Kubernetes.

Download Percona Distribution for PostgreSQL Today!

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Aravinth Manogaran

2 months ago

Ideally it should have designed to store all the fixed length data types serially and should be handled with meta data definition during run time.

David Burns

2 months ago

I would be very interested in a follow-on article on variable-length columns.

Rishabh

1 month ago

“We are not considering variable-length data types here because they are not as straightforward” – Please write an article on this.

Also does the same concept apply to MySQL (InnoDB)?

MySQL 5.7
End of Life

Compare Percona to Leading Database Solutions

Software
Downloads

Product
Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

PostgreSQL Column Alignment and Padding – How To Improve Performance With Smarter Table Design

Alignment

Padding

Alignment and padding example

Using the pageinspect extension

Conclusion

Related

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7 End of Life

Compare Percona to Leading Database Solutions

Software Downloads

Product Documentation

Resource Hub

Financial Services

Driving Database Success

Percona Blog

Percona Community Hub

Percona Events Hub

About Percona

Percona in the News

Our Customers

Our Partners

Careers

Contact Us

PostgreSQL Column Alignment and Padding – How To Improve Performance With Smarter Table Design

Alignment

Padding

Alignment and padding example

Using the pageinspect extension

Conclusion

Related

Share This Post!

Want to get weekly updates listing the latest blog posts?

Related Blog Articles

RECOMMENDED ARTICLES

Trying out the PostgreSQL pg_tde Tech Preview Release

Benchmarking MongoDB Performance on Kubernetes

Why MariaDB Is “Better” Than MySQL

MOST POPULAR ARTICLES

Auditing login attempts in MySQL

Deploy Django on Kubernetes With Percona Operator for PostgreSQL

MySQL “Got an error reading communication packet”

MySQL 5.7
End of Life

Software
Downloads

Product
Documentation