Breaking Down PostgreSQL Partitioning- Range, List, Hash: Use Cases, Limitations, and Practical Recommendations

"Partitioning refers to splitting what is logically one large table into smaller physical pieces" - PostgreSQL.

It is a technique used to organize large tables into smaller, more manageable pieces. Partitioning can be thought of as, organizing a book collection in a book cabinet. Instead of dumping all of your books together in one big pile, you might sort them by category or author and store them in different sections(Partitions) of a bookshelf(Table). For example, you might keep all of your maths books in one section, and all of your history books in another section.

This similar book organization concept can be applied to real-world data stored in our PostgreSQL table.

What are the use cases?

Let's say you have a large database table that stores customer orders for an e-commerce website. The table has columns for the customer name, order date, order ID, product ID, and product quantity etc. As your user base grows and you start processing more orders, the table becomes very large and it takes longer and longer to query update, or delete the data.

Based on this scenario let's understand the use case in retrieving, updating, and deleting the data.

Select and Update:

In the use case mentioned above, suppose you have a query pattern where you need to fetch or update all the orders from a specific month or the current month. In such a scenario, the queries will take longer to execute as they have to scan through the entire order table to retrieve/update the relevant data. This can lead to slower query response times and decreased database performance.

To make things more efficient, you could partition the table by date, creating separate partitions for each month of orders. For example, you could create a new partition for orders placed in January, February, March, and so on. And when you query the table, the system knows which partition to target based on the date range you specify in the query thus limiting the search scans.

Delete or Archive:

If your order data table goes back several years, you might not need to keep all of that data readily available in your active database. By partitioning your data by date as discussed above, you could easily delete or archive old partitions that are no longer needed, without affecting the rest of the data.

Now that we understand the use cases and scenarios where partitioning can be helpful, let's list the benefits of partitioning:

  1. Improved query performance: As PGSQL query planner knows which all partitions to target

  2. Dropping an individual partition using the "DROP TABLE" command or detaching a partition using "ALTER TABLE DETACH PARTITION" is much faster compared to performing a bulk delete operation. Moreover, these commands completely bypass the overhead associated with the "VACUUM" the operation, which can be a major performance bottleneck in the case of a bulk "DELETE" operation.

  3. Archival: Infrequently accessed data or older data can be migrated to different storage

  4. When queries or updates access a large percentage of a single partition, performance can be improved by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.

Type of partitioning supported in PostgreSQL

There are three types of partitioning supported in PostgreSQL. Let's discuss each of them in detail.

Partition Visualisation

List Partitioning

In this partitioning, the table is partitioned based on a list of values in a specified column.

Suppose we want to partition the table into two partitions based on the region column:

  • Orders from the United States will be stored in Partition 1

  • Orders from Asia will be stored in partition 2

Partition creation involves two steps 1. Create a table with partition declaration- Also called declarative partitioning 2. Define the partitions with range criteria.

Let's create an order table and define two partitions on it for our discussion. However, You can create as many as you need.

-- Create the table with partitioning
CREATE TABLE customer_orders (
    order_id SERIAL,
    customer_id INTEGER,
    order_date DATE,
    order_amount NUMERIC NOT NULL,
    region VARCHAR(50),
    CONSTRAINT customer_orders_pkey PRIMARY KEY (region, order_id)
)
PARTITION BY LIST (region);

-- Create the partitions
CREATE TABLE customer_orders_p1 PARTITION OF customer_orders
    FOR VALUES IN ('USA');

CREATE TABLE customer_orders_p2 PARTITION OF customer_orders
    FOR VALUES IN ('Asia');

-- Insert few records 
INSERT INTO customer_orders (customer_id, order_date, order_amount, region) VALUES (123, '2023-04-17', 100.00, 'USA');

INSERT INTO customer_orders (customer_id, order_date, order_amount, region) VALUES (456, '2023-04-17', 250.00, 'Asia');

After inserting the record if we search all records from regions "Asia" this is how the query plan looks like

You would notice in above it scans only "customer_orders_p2" partition.

Let us discuss now range partitioning.

Range Partitioning

In range partitioning, the table is partitioned based on a range of values in a specified column. Let's partition our customer orders table by the order date column using range partitioning.

  • Orders placed before 2023-01-01 will be stored in partition 1

  • Orders placed after 2023-01-01 will be stored in partition 2

/* declare partition while creating table */
CREATE TABLE customer_orders (
    order_id SERIAL,
    customer_id INTEGER,
    order_date DATE,
    order_amount NUMERIC NOT NULL,
    CONSTRAINT customer_orders_pkey PRIMARY KEY (order_id, order_date)
)
PARTITION BY RANGE (order_date);

/* create and attach partitions */

CREATE TABLE customer_orders_p1 PARTITION OF customer_orders
    FOR VALUES FROM ('0001-01-01') TO ('2023-01-01');

CREATE TABLE customer_orders_p2 PARTITION OF customer_orders
    FOR VALUES FROM ('2023-01-01') TO ('9999-12-31');

Note: Adjacent partitions can share a bound value since range upper bounds are treated as exclusive bounds. For example, the first partition range is
<= '0001-01-01' and < '2022-01-01'

Hash Partitioning

In this partitioning, the table is partitioned based on a hash function applied to a specified column. Let's partition our customer orders table by the customer_id column using hash partitioning.

Suppose we want to partition the table into two partitions based on the customer_id column:

  • Orders placed by customers with even ID number

  • Orders placed by customers with odd ID numbers

      -- Create the table with partitioning
      CREATE TABLE customer_orders (
         order_id serial,
         customer_id int NOT NULL,
         order_date date NOT NULL,
         order_amount numeric NOT NULL,
         CONSTRAINT customer_orders_pkey PRIMARY KEY (order_id, customer_id)
      )
      PARTITION BY HASH (customer_id);
    
      -- Create the partitions
      CREATE TABLE customer_orders_1 PARTITION OF customer_orders
      FOR VALUES WITH (MODULUS 2, REMAINDER 0);
    
      CREATE TABLE customer_orders_2 PARTITION OF customer_orders
      FOR VALUES WITH (MODULUS 2, REMAINDER 1);
    

Limitations

Let's discuss the few important limitations of the partitioning features.

  • The columns used in the primary key or unique key constraint must include all of the partition key columns. If you run the below query you will get an error.

      /* declare partition while creating table */
      CREATE TABLE customer_orders (
          order_id SERIAL,
          customer_id INTEGER,
          order_date DATE,
          order_amount NUMERIC NOT NULL,
          CONSTRAINT customer_orders_pkey PRIMARY KEY (order_id)
      )
      PARTITION BY RANGE (order_date);
    
      -- output--- 
      ERROR:  unique constraint on partitioned table must include all partitioning columns
      DETAIL:  PRIMARY KEY constraint on table "customer_orders" lacks column "order_date" which is part of the partition key.
    

    This is the biggest limitation and you have to adjust your data model based on this requirement(Notice our previous example, which intentionally includes a partition key in its composite keys). As per PGSQL, This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions; therefore, the partition structure itself must guarantee that there are no duplicates in different partitions.

  • It is not possible to turn a regular table into a partitioned table or vice versa. However, it is possible to add an existing regular or partitioned table as a partition of a partitioned table or remove a partition from a partitioned table turning it into a standalone table.

  • There is no way to create an exclusion constraint spanning the whole partitioned table. You have to put such a constraint on each partition individually.

Attach and Detach Partition

You can attach a new partition on the partitioned table as well as detach any partition

Detach Partition

This is how our customer orders table looks after range partitions

run the detach command

ALTER TABLE customer_orders DETACH PARTITION customer_orders_p1

After detach it is left with only one partition

Attach Partition

Let's attach the customer_orders_p1 partition again

ALTER TABLE customer_orders ATTACH PARTITION customer_orders_p1
    FOR VALUES FROM ('0001-01-01') TO ('2023-01-01');

It's again back to the same stage.

Recommendations

  • When to not use:

    • If improving query performance is the only goal - First figure out the bottleneck and Try other mechanisms first

    • When the table size is not big or very large except archival use case

  • Do not confuse it with Sharding - Sharding is putting data on a separate node

  • Partition Pruning: Partition pruning is a feature of PostgreSQL partitioning that allows the query planner to eliminate partitions that don't need to be scanned to satisfy a query. Make sure it is enabled which is the default behavior. Without this query, the planner will call all the partitions even though they do not have the data SET enable_partition_pruning = on;

  • PostgreSQL does not create partitions automatically - Use scripts or application code to create it

  • Use the latest version of PG as older versions had a few more limitations like Foreign keys referencing partitioned tables, as well as foreign key references from a partitioned table to another table was not supported

That's all you needed to know about PG partitioning. I hope these insights were helpful to you.