Beyond DELETE

PostgreSQL is the most popular database among developers for good reason: It’s known for its rock-solid stability, extensibility, ACID compliance, and excellent support for complex queries and concurrent workloads. Whether you’re building a transactional system or a data-intensive application, PostgreSQL scales remarkably well thanks to its robust architecture.

A big part of that scalability comes from PostgreSQL’s use of Multi-Version Concurrency Control (MVCC)—a technique that allows multiple queries to interact with the database at the same time, without blocking each other.

But this concurrency model comes with tradeoffs, especially when it comes to deleting large volumes of data.

The Problem With DELETE

To support concurrent reads and writes without locking, MVCC creates new versions of rows with each DELETE or UPDATE operation. This allows database reads to return a consistent snapshot of the data, even while it’s being modified by concurrent transactions.

As a result, a DELETE doesn’t actually remove the row from disk. Instead, it marks the tuple as “dead,” making it invisible to future transactions—but the data still physically exists until it’s cleaned up.

That cleanup is handled by PostgreSQL’s background process: autovacuum.

In most cases, autovacuum runs quietly in the background, cleaning up dead tuples and keeping things efficient. But in high-churn tables—like logs or events, with millions or billions of inserts and deletes per week—dead tuples can accumulate faster than autovacuum can clean them.
When that happens, PostgreSQL may trigger a more aggressive vacuum to prevent a catastrophic event known as transaction ID wraparound failure. In these cases, autovacuum may acquire a full table lock (AccessExclusiveLock), blocking all reads and writes until sufficient cleanup is complete.
Needless to say, if your application relies on that table, this type of lock can lead to major downtime.

Partitions: A Better Data Deletion Strategy

Enter partitions.

Each partition holds a subset of the data—often segmented by time (e.g. by day, week, or month) or by range (e.g. by auto-incrementing serial id)—while the parent table serves as the unified interface for queries and writes.

Partitioning is typically used for performance: it can speed up queries by limiting scans to just the relevant partitions, and reduce index bloat by keeping indexes smaller and more focused.

But partitioning offers another powerful advantage: Efficient data deletion.

Instead of issuing a DELETE and waiting for autovacuum to clean up the dead tuples, you can simply drop an entire partition, an operation that is:

Instant.
Doesn’t generate dead tuples.
Doesn’t invoke autovacuum.
Doesn’t block your table.

This makes partitioning an excellent strategy for managing data retention, especially in high-churn systems like those running on Aurora PostgreSQL.

Partitioning in Aurora PostgreSQL with pg_partman

AWS RDS Aurora PostgreSQL includes the pg_partman extension, which makes it easy to manage the creation and maintenance of partitions. While AWS offers useful documentation on how to configure pg_partman for a new table, the steps below offer a guide on how to leverage pg_partman to partition an existing table, which perhaps is in need of an improved deletion strategy.

Let’s assume that you have a simple `analytics` table that you’d like to partition by range, using a serial `id` column. You cannot turn an existing non-partitioned table into the parent of a set of partitions, so you need to employ a migration strategy that creates a new table and migrates data from the original table.

To get started with pg_partman, first connect to your RDS instance and enable the pg_partman extension. It’s recommended to create a new schema to manage the partman configuration.

CREATE SCHEMA partman;
CREATE EXTENSION pg_partman WITH SCHEMA partman;

Next, you’ll want to employ the following strategy to create a new partitioned `analytics` table and then migrate your data.

DO
$$
DECLARE
  current_serial_id int8;
BEGIN
LOCK TABLE public.analytics IN ACCESS EXCLUSIVE MODE;

SELECT max(id) INTO current_serial_id FROM public.analytics;

ALTER TABLE public.analytics RENAME TO analytics_backup;

CREATE TABLE public.analytics (
  id BIGSERIAL NOT NULL PRIMARY KEY,
  event JSON,
  created_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY RANGE (id);

PERFORM setval(pg_get_serial_sequence('public.analytics', 'id'), current_serial_id);

PERFORM partman.create_parent(
  p_parent_table => 'public.analytics',
  p_control => 'id',
  p_type => 'range',
  p_interval => '100000000',
  p_premake => '10'
);

INSERT INTO public.analytics
SELECT * FROM analytics_backup;
END;
$$ LANGUAGE plpgsql;

Let’s break this down, piece by piece. First, we’re creating an anonymous code block to execute the migration as a single transaction. We declare a `current_serial_id` variable to store the current sequence value for the `id` column.

DO
$$
DECLARE
  current_serial_id int8;
BEGIN
  ...

Next, we’re locking the `analytics` table so that our transaction has exclusive access to the table. This ensures that no changes can be made until the migration is complete.

LOCK TABLE public.analytics IN ACCESS EXCLUSIVE MODE;

Since we’ll be creating a new `analytics` table, we need a way to preserve the current sequence value of our serial `id` column. Otherwise, it would begin at the default value of `1`, which we probably do not want!

SELECT max(id) INTO current_serial_id FROM public.analytics;

From there, we’ll want to rename our existing table and create a new table with an identical schema.

ALTER TABLE public.analytics RENAME TO analytics_backup;

CREATE TABLE public.analytics (
  id BIGSERIAL NOT NULL PRIMARY KEY,
  event JSON,
  created_at TIMESTAMPTZ DEFAULT now()
) PARTITION BY RANGE (id);

Then, using our stored `id` sequence value from the original table, we can update the new table’s sequence value.

PERFORM setval(pg_get_serial_sequence('public.analytics', 'id'), current_serial_id);

Now it’s time to create our pg_partman configuration, using the `partman.create_parent` function.

PERFORM partman.create_parent(
p_parent_table => 'public.analytics',
p_control => 'id',
p_type => 'range',
p_interval => '100000000',
p_premake => '10'
);

`p_parent_table` – The parent table of our partition set. Note: This must be schema-qualified!
`p_control` – The column that the partitions will be based on. This must be either an integer or a time-based column type. We want our partitions to be based on the `id` column.
`p_type` – Specifies the type of partitioning. Must be either `range` or `list`.
`p_interval` – The interval that determines your partition size. We are setting up our partitions to each store 100M rows.
`p_premake` – This tells pg_partman to initially create 10 empty partitions. Given our 100M partition size, this means we’ll have enough partitions to store 1B rows.

You’ll want to tweak your `p_interval` and `p_premake` values to ensure that you’ve created enough space to migrate all of the data from your existing table. In this example, that means that the original `analytics` table must currently have fewer than 1B rows. Ideally, you’ll give yourself a bit of headroom in the form of empty partitions.

Lastly, we’re migrating the data from the backup table into our new partitioned `analytics` table.

INSERT INTO public.analytics
SELECT * FROM analytics_backup;

You now have a partitioned table!

You’ll continue calling the same `public.analytics` table for reads and writes, so your application code doesn’t need to change. Under the hood, pg_partman will handle which partition existing data is read from, as well as where inserts and updates are written to, based on the record’s `id` value.

Automating Partition Maintenance with pg_cron

Once you’ve established your partition schema, you’ll want a way to automate partition maintenance, including the creation of new partitions and the removal of old partitions. This is where pg_cron comes into play.

Before you can use pg_cron, you’ll first need to modify the parameter group of your Aurora instance, adding pg_cron to the `shared_preload_libraries` value. Note: This requires a database restart to take effect!

Next, you can create the pg_cron extension and use it to configure periodic partition maintenance.

CREATE EXTENSION pg_cron;

UPDATE partman.part_config
SET infinite_time_partitions = true,
  retention = '1000000000',
  retention_keep_table = false
WHERE parent_table = 'public.analytics';
SELECT cron.schedule('0 0 * * 1', $$CALL partman.run_maintenance_proc()$$);

First, we’re updating our partition configuration `part_config` with our desired settings:

`infinite_time_partitions = true` – This allows the table to be able to automatically create new partitions.
`retention = ‘1000000000’` – This configures the maximum retention difference value for our partitions. Since we’re using range partitioning, this means that any partitions which contain rows with `id` values that are less than the current maximum `id` minus 1B will not be retained.
`retention_keep_table = false` – This configures our table to delete any partitions for which the retention condition is met.

Lastly, we’re leveraging `cron.schedule` to automatically call the partman `run_maintenance_proc` procedure on a defined schedule. `run_maintenance_proc` handles creating new partitions and enforcing the configure retention policy on old partitions.

This is the key, so it bears repeating: We’ve now set up our table to automatically drop old partitions, eliminating the need for expensive bulk DELETE operations with costly autovacuum processes.

As a final cleanup, once you’re confident in your new partitioned schema, you can safely drop the original table.

DROP TABLE analytics_backup;

Takeaways

Deleting data at scale in PostgreSQL is tricky. The default path—DELETE statements followed by autovacuum cleanup—can lead to bloat, degraded performance, and in extreme cases, even downtime.

By using partitioning as a deletion strategy and leveraging tools like pg_partman and pg_cron on Aurora PostgreSQL, you can skip the pain entirely. Dropping old partitions is fast, clean, and doesn’t interfere with your application’s performance.

If you’re fighting with autovacuum, or just want a scalable retention strategy, this method is worth trying.

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Beyond DELETE

The Problem With DELETE

Partitions: A Better Data Deletion Strategy

Partitioning in Aurora PostgreSQL with pg_partman

Automating Partition Maintenance with pg_cron

Takeaways

Leave a comment

Leave a Reply Cancel reply

More Insights

Safely Formatting Template Strings with HTML in Rails

2024 Summer Hackathon in Paris

The Philosopher King, the Demiurge and the Programmer

The Path for Energy Transformation: Vision and Purpose

Interested in empowering your energy and utility operations?