This blog was originally published in July 2023 and was updated in April 2024.

Partitioning is the concept of splitting large tables logically into smaller pieces for better performance of the database. In this article, we’ll discuss how to effectively use traditional partitioning methods in PostgreSQL to effectively split tables and improve database performance.

Methods of built-in PostgreSQL partitioning techniques

  • Range partitioning
  • List partitioning
  • Hash partitioning

When to use partitioning in PostgreSQL

  • Bulk operations like data loads and deletes can be performed using the partition feature of ATTACH and DETACH effectively.
  • The exact point at which a table will benefit from partitioning depends on the application. However, a rule of thumb is that the size of the table should exceed the physical memory of the database server.
  • As data is growing, sub-partitions can be created, which enhances the performance, and also old partitions can be deleted either by making them standalone or dropping them entirely.

Benefits of partitioning in PostgreSQL

  • Query performance for DDL and DML operations can be improved in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions which is explained below.
  • When queries or updates access a large percentage of a single partition, performance can be improved by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.
  • Dropping the partition table or truncating the partition table can be done using DROP TABLE and TRUNCATE TABLE, respectively, reducing the load through DELETE operations.

Range partitioning

Range partitioning is a database partitioning method that is based on a specific range of columns with data like dates and Numeric values.

Here, as an example, I created a table with range partitioning and partition tables for each quarter on a Date column.

Range partitions are seen below in the table structure.

Inserted some random data for entries with 365 days a year.

Range partitioned data is seen as below distributed among its partitions.

Performance of DDL operations

Here, I created a table without a partition and inserted the same data, similar to the partitioned table.

A query plan is seen better for DDL operations when performed on data with a single partition or fewer partitions.

Here we can see a better query plan when data is fetched from the partitioned table than data fetched from the non-partitioned table.

List partitioning

List partitioning is a database partitioning method that is based on key value(s) or discrete values and partition can also be done with the expression of the column like (RANGE BY LIST(expression)), which is explained below:

For example, I created a table with a list partition and a few list-partitioned tables and inserted some random data with 1,000 rows.

List partitions are seen in the table definition below:

Partitioned data distributed among its partitions is seen below:

List partitioning using expression

For example, I created a table with list partitioning using the expression of a column.

List partitions are seen in the table definition below:

Here, I inserted some random 100 rows.

List partitioned data with expression distributed among its partitions is seen below:

Performance of DML operations

Here is an example shown with the table, which is created without partitions and inserted the same data similar to that of the partitioned table.

Below I created a table without a partition and inserted some random data with 1,000 rows to show query performance.

UPDATE Query Performance

DELETE Query Performance

The above examples show the performance of DELETE and UPDATE operations with data fetched from a single partitioned table having a better query plan than the one with no partitions.

Hash partitioning

A hash partitioning table is defined as the table partitioned by specifying a modulus and a remainder for each partition.

  • Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.
  • Hash partitioning is best used when each partition is on different table spaces residing on separate physical disks, so the IO is equally divided by more devices.

For example, I created a table with hash partitioning and a few partitioned tables with modulus five.

The table structure looks like the one below with five created partitions:

Here, I Inserted some random data with 100,000 rows.

We see below the hash partitioned data among its partitioned tables.

Benefits of hash partitioning

  • The primary benefit is to ensure an even distribution of data among a predetermined number of partitions.
  • Hash keys are used effectively and efficiently in cases where ranges are not applicable, like employee number, product ID, etc.

What if the data is out of range or list?

For this purpose, we use default partitions on range and list partitioned tables.

For both range and list partitions, data can be stored temporarily, which is out-of-range, by creating a default partition and later creating an appropriate partition.

Hash-partitioned tables may not have a default partition, as the creation of a default partition for hash partitioning does not make any sense and is not needed.

We see here what happens when I try to insert data for which a partition doesn’t exist and how the default partition helps in this case.

So the data we inserted is sent to the default partition, and partitions can be created later based on the data in the default table and available partitions.

Learn More About PostgreSQL Partitioning with our Free Resources

Here, we discussed default partitioning techniques in PostgreSQL using single columns and how to create multi-column partitioning. PostgreSQL Partition Manager(pg_partman) can also be used to create and manage partitions effectively. Further details will be explained in upcoming blogs.

Also, please find below the related blogs for reference:

PostgreSQL Sharding: An Overview and MongoDB Comparison

Performing ETL Using Inheritance in PostgreSQL

Partitioning in PostgreSQL With pg_partman (Serial-Based & Trigger-Based)

PostgreSQL Partitioning Made Easy Using pg_partman (TimeBased)

Our PostgreSQL Performance Tuning guide condenses years of database expertise into a practical guide for optimizing your PostgreSQL databases. Get it today:

 

Elevate your PostgreSQL Performance

 

FAQs

What is the difference between partitioning and indexing in PostgreSQL?

In PostgreSQL, partitioning involves dividing a large table into smaller, more manageable pieces, known as partitions, based on certain criteria like ranges or list values. This can significantly improve query performance on large datasets and simplify data management. Indexing, on the other hand, involves creating a data structure that improves the speed of data retrieval operations on a table. While indexing speeds up query processing by providing quick lookups, partitioning optimizes performance by limiting the number of rows to scan.

What is the difference between sharding and partitioning in PostgreSQL?

Sharding and partitioning in PostgreSQL both deal with distributing data but in subtly different ways. Sharding typically refers to distributing data across multiple databases or servers, potentially across different physical locations, to balance load and improve performance. Partitioning, however, occurs within a single database and involves dividing a table into segments that are easier to manage and query. While both techniques can help with scaling, sharding is more about the horizontal distribution of data across a cluster, whereas partitioning is about organizing data within a database.

When is it most appropriate to partition a database?

Partitioning a database is most appropriate when dealing with large tables that significantly impact query performance and maintenance tasks and is particularly beneficial when a database frequently executes queries that scan large segments of data based on specific keys or ranges, such as dates or geographic locations. Partitioning can also be advantageous when archival operations are routine, allowing for easier management of historical data by segregating older entries into separate partitions. 

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments