Partitioning of tables in PostgreSQL

Disclaimer

This document is only collection of author’s notes, experiences and point of views. It is not, in any meaning, either complete description of the topic nor official RTB documentation. It may be inaccurate, incomplete, obsolete, misleading or completely wrong. It may even cause loss of data or damage system integrity. It may not comply with company codex, values, presentation style or economic interests or may reveal company secrets.

As such, it is absolutely prohibited to distribute this document outside of RTB & Co. GmbH. Such an action may result into legal acts against both the sender and the company. It is only intended, after review of technical and presentational correctness and accuracy, to be used as an information source for official documentation.

Always contact documentation department for information about current presentation style and allowed formats before creating customer’s documentation.

If queries to payment table running too slow, the database is too big and performance suffers, it is time to split big table to multiple small ones. We use PostgreSQL integrated partitioning, guide.

1. Motivation

The table where payments are stored grows rapidly (currently it has ~ 110 000 000 rows). This make query slower because of big indexes and other data structures. To address this issue PostgreSQL offers an instrument called table partitioning.

To run queries more effectively it is desired to split payment table to multiple partitions by payment timestamp. It was not possible so far, because the column on which is the partitioning done must be a part of primary key. Up to now primary key of payment was an artificial key, auto incrementing number.

The goal of this feature is to split payment table to multiple smaller sections. It allows to efficiently skip partitions which do not participate on the result data set.

Figure 1. Current payment table

Figure 2. Partitioned payment table

Although the task seems to be simple, it is necessary to solve few issues before.

2. Partition key

The requirement for partitioning is that the column on which partitioning is done must be part of primary key. We want to partition on payment timestamp. Currently the primary key is is auto incrementing number.

We need to find a new way how to uniquely identify a payment. A natural key composed from payment_timestamp, and pdm_id is good candidate and it will be used. PDM should not sell more than one ticket per at the same time.

However there are cases when one PDM generate more payments with the same timestamp. They are described in the next chapter Exceptions from uniqueness.

2.1. Exceptions from uniqueness

2.1.1. PDM 5

Precision of PDM 5 clock is one minute. Therefore it is certain that PDM 5 can sell more then one ticket per minute and send more then one payment message with the same timestamp.

2.1.2. PDM 7

Precision of PDM 7 clock is one second. Theoretically it should not happen that more then one payment message is sent with the same time. However it happens in following cases.

Mix payment. If a ticket is payed from multiple sources there is multiple payment messages with same date and ticket number. Only tracer field is different.
Reconciliation. If PDM sends money reconciliation message it is always composition of multiple payment (one per payment type) messages having exactly same time stamp. The difference is again tracer number. Ticket number are usually different but may be also created at the same. It depends on clearing synchronization.

3. Remove duplicities from database

Currently, trying to set composite key defined above will generate lot of conflicts. There are two types of conflicts which is necessary to solve, before a new primary key can be used.

3.1. Duplications by error

For some reason there are some duplication in payment messages. It may be caused by some error in application logic. To find such a duplicities we have to look for records which have same pdm_id, payment_date_time and tracer_number.

Such a duplicities will be removed during database migration.

See Remove duplicates to learn more, how are the duplications removed.

3.2. Solution for natural keys duplicated by design

To get rid of such a duplications (described in chapter Exceptions from uniqueness) a solution to slightly adjust timestamp field in payment will be used. To get unique number within one minute will the payments will be sorted according to tracer number. It should grow by each payment. Seconds timestamp fraction will be set to order number. E.g. the payments with timestamp '…T10:10:00' and '…T10:10:01', '…T10:10:02' … . It is not expected that PDM could create more then 60 payments per minute.

Time stamps will be adjusted during database migration.

Timestamps of new payments coming from PDM will be adjusted at runtime in case of conflict by applying same logic as it is described above.

See Adjust time stamps for duplication by design to learn more, how are the duplications removed.

4. Partitioned table

Now when the problem of duplicity is solved, we can start to partition the table. The table is partitioned based on payment timestamp. It is necessary to decide how big a partition should be to keep balance between system performance and number of partitions. The exact size depends on the database server performance and PostgreSQL configuration. Size of one partition should not be bigger then available RAM. We use partition size about 30 - 40 million record. Therefore it is recommended to create partition range according to expected number of rows for selected period. E.g. there can be one partition for first two year. As number of PDMs (and payments) grow there can be one partition of one year. Later a partition can be e.g. 6 months or so.

There is also necessary to create one default partition for the cases, where payment timestamp does not fall to any existing partition.

Be sure to create at least one future partition to be sure that new records will not go unexpectedly to default partition.

If you want to move rows to different partition later it is necessary to work on detached partition. It usually mean downtime. See section How to move data to some other partition for more details.

Appendix A: Mitigation of duplicated records in the database

A.1. Remove duplicates

Run following SQL to check if there area duplicities.

Check presents of duplicities

SELECT tl_pdm_id, tl_tracer, tl_pay_date_time, count (*) as duplication_count
FROM control.translog
group by tl_pdm_id, tl_tracer, tl_pay_date_time
HAVING count (*) > 1
LIMIT 10

It will provide a result of (at most 10 rows - to make execution faster) duplicated entries with count.

If table is not empty delete duplicated entries by following SQL (source). It will leave only oldest version of the row. Execution may take longer (10 - 30 minutes) depending on table size and computer speed.

Delete duplicated rows

DELETE FROM control.translog a USING control.translog b
WHERE a.tl_id < b.tl_id
AND a.tl_pdm_id = b.tl_pdm_id AND a.tl_pay_date_time = b.tl_pay_date_time AND a.tl_tracer=b.tl_tracer;

A.2. Adjust time stamps for duplication by design

As it is described above, some payment are (under composite key defined above) falsely positive evaluated as duplicates because of reasons. The solution described there can be used to fix them.

Make unique timestamps

DO
$$
DECLARE
	pay_group record;
BEGIN
	-- Loop over payment id groups having same PDM and timestamp.
	FOR pay_group IN
		-- Find payments groups where multiple payment was done within same time
		WITH pay_group AS (
			SELECT tl_pdm_id as pdm_id, tl_pay_date_time as pay_ts, count (*) as duplication_count
			FROM control.translog
			group by tl_pdm_id, tl_pay_date_time
			HAVING count (*) > 1
		)
		-- We need to adjust all payments within tis minute. Re-group by minute start of minute
		SELECT pdm_id, date_trunc('minute', pay_ts) AS pay_ts_from
		FROM pay_group
		GROUP BY pdm_id, date_trunc('minute', pay_ts)
	LOOP
		-- Update each group by setting seconds, based on order (by tracer) to achive unique time stamp for each payment
		-- Update from select: https://stackoverflow.com/questions/6256610/updating-table-rows-in-postgres-using-subquery
		UPDATE control.translog
		SET tl_pay_date_time = subq.new_ts
		FROM(
			-- Prepare modified timestamps - base minute + number of seconds from order.
			SELECT tl_id as pay_id, date_trunc('minute', tl_pay_date_time) + (INTERVAL '1 second' * RANK() OVER w) as new_ts
			FROM control.translog
			WHERE tl_pdm_id = pay_group.pdm_id
			AND tl_pay_date_time >= pay_group.pay_ts_from
			AND tl_pay_date_time < pay_group.pay_ts_from + INTERVAL '1 minute'
			WINDOW w AS (PARTITION BY tl_pdm_id ORDER BY tl_tracer)
		) AS subq
		WHERE tl_id = subq.pay_id;
	END LOOP;
END;
$$;

Appendix B: Create and use partitioned table

An example of partitioned table for payments

CREATE TABLE IF NOT EXISTS control.payment
(
    pay_ts timestamp(0) with time zone NOT NULL,
    pdm_id integer NOT NULL,
    amount numeric(8,2),
    currency character varying(3),
    expiration_ts timestamp with time zone,
    ticket_nr integer,
    pay_reason control.payment_reason,
    pay_type control.payment_type,
    tracer integer,
    lpn character varying(255),
    psn integer,
    card_nr character varying(255),
    card_code integer,
    card_type smallint,
    auth_code character varying,
    tariff_info_id integer,
    rtp_session integer,
    custom_data jsonb,
    server_ts timestamp with time zone,
    pay_id bigint NOT NULL DEFAULT nextval('control.payment_id_seq'::regclass),
    CONSTRAINT pk_payment PRIMARY KEY (pay_ts, pdm_id),
    CONSTRAINT fk_payment__pdm FOREIGN KEY (pdm_id) REFERENCES control.pdm (pdm_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE,
    CONSTRAINT fk_payment__tariff_info FOREIGN KEY (tariff_info_id) REFERENCES control.tariff_info (tai_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE RESTRICT,
    CONSTRAINT fk_payment__payment_rtp FOREIGN KEY (rtp_session) REFERENCES control.payment_rtp (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE RESTRICT
)
-- Partitions by payment timestamp
PARTITION BY RANGE (pay_ts);
CREATE INDEX IF NOT EXISTS fki_payment__pdm ON control.payment USING btree (pdm_id ASC NULLS LAST);
CREATE INDEX IF NOT EXISTS fki_payment__rtp ON control.payment USING btree (rtp_session ASC NULLS LAST);

-- Add partitions as needed
CREATE TABLE control."payment_2000_2019" PARTITION OF control.payment FOR VALUES FROM ('2000-01-01 +00') TO ('2020-01-01 +00');
CREATE TABLE control."payment_2020_2021" PARTITION OF control.payment FOR VALUES FROM ('2020-01-01 +00') TO ('2022-01-01 +00');
CREATE TABLE control."payment_2022" PARTITION OF control.payment FOR VALUES FROM ('2022-01-01 +00') TO ('2023-01-01 +00');
CREATE TABLE control."payment_2023" PARTITION OF control.payment FOR VALUES FROM ('2023-01-01 +00') TO ('2024-01-01 +00');
CREATE TABLE control."payment_2024" PARTITION OF control.payment FOR VALUES FROM ('2024-01-01 +00') TO ('2025-01-01 +00');
-- Default partition for payments which do not match any existing partition.
CREATE TABLE control."payment_default" PARTITION OF control.payment DEFAULT;

In our case there was not too much payments before year 2018 therefore it is OK to put them all to default partition. Then we can combine years two years to one partition and then just one year.

Example of copying data from temporary table control.payment2

INSERT INTO control.payment SELECT
    pay_ts, pdm_id, amount, currency, expiration_ts, ticket_nr, pay_reason, pay_type, server_ts,
    tracer, lpn, psn, card_nr, card_code, card_type, auth_code, tariff_info_id, rtp_session,
    custom_data, pay_id
FROM control.payment2;

B.1. How to move data to some other partition

If there is a need to move data to new partition (e.g. because of bad estimation is one of them too big) do following.

Stop PDM.control
Detach source partition
Detach or create new (table must exactly match columns of main table) target partition. E.g. copy and execute SQL from pgAdmin. Rename primary key and indexes to match table name. Remove partitioning part.
Copy relevant data to new table

insert INTO control.payment_2023_h2
(select * from control.payment_2023 WHERE pay_ts >= '2023-07-01+00');

Remove data from original partition

DELETE FROM control.payment_2023 WHERE tl_pay_date_time >= '2023-07-01+00';

Attach partition for required time range

ALTER TABLE control.payment ATTACH PARTITION control.payment_2023 FOR VALUES FROM ('2023-01-01+00') TO ('2023-08-01+00');

Appendix C: Migrate payments to partitioned table

Make a plan for partition sizes. To get a raw idea how many payments do the PDMs can be used following query:

SELECT EXTRACT('year' from pay_ts) AS res_year, count(*)
FROM control.payment
GROUP BY res_year
ORDER BY res_year;

Choose partition size that fits in memory of your system.

All SQLs below are valid for version 24.0. In future release may the data structure be changed. Please always use pgAdmin to inspect current database structure.

C.1. Rename old structures

Rename table control.payment to control.payment2.

ALTER TABLE control.payment RENAME TO payment2;

Rename pk_payment TO pk_payment2

ALTER TABLE control.payment2 RENAME CONSTRAINT pk_payment TO pk_payment2;

Create new payment table with the same structure as the old one and define partitions. Do not create any constraint or index yet. E.g.

CREATE TABLE control.payment
(
    pay_ts timestamp(0) with time zone NOT NULL,
    pdm_id integer NOT NULL,
    amount numeric(8,2),
    currency character varying(3),
    expiration_ts timestamp with time zone,
    ticket_nr integer,
    pay_reason control.payment_reason,
    pay_type control.payment_type,
    tracer integer,
    lpn character varying(255),
    psn integer,
    card_nr character varying(255),
    card_code integer,
    card_type smallint,
    auth_code character varying,
    tariff_info_id integer,
    rtp_session integer,
    custom_data jsonb,
    server_ts timestamp with time zone,
    CONSTRAINT pk_payment PRIMARY KEY (pay_ts, pdm_id)
)
-- Partitions by payment timestamp
PARTITION BY RANGE (pay_ts);

-- Add partitions as needed. For example
CREATE TABLE control."payment_2000_2019" PARTITION OF control.payment FOR VALUES FROM ('2000-01-01 +00') TO ('2020-01-01 +00');
CREATE TABLE control."payment_2020_2021" PARTITION OF control.payment FOR VALUES FROM ('2020-01-01 +00') TO ('2022-01-01 +00');
CREATE TABLE control."payment_2022" PARTITION OF control.payment FOR VALUES FROM ('2022-01-01 +00') TO ('2023-01-01 +00');
CREATE TABLE control."payment_2023" PARTITION OF control.payment FOR VALUES FROM ('2023-01-01 +00') TO ('2024-01-01 +00');
CREATE TABLE control."payment_2024" PARTITION OF control.payment FOR VALUES FROM ('2024-01-01 +00') TO ('2025-01-01 +00');
-- Default partition for payments which do not match any existing partition.
CREATE TABLE control."payment_default" PARTITION OF control.payment DEFAULT;

Copy data to new table

INSERT INTO control.payment
(
	SELECT
		pay_ts, pdm_id, amount, currency, expiration_ts, ticket_nr, pay_reason, pay_type, tracer, lpn,
		psn, card_nr, card_code, card_type, auth_code, tariff_info_id, rtp_session, custom_data, server_ts
	FROM control.payment2
);

Find out all objects depending on table control.payment2 (use pgAdmin).
Re-create it for control.payment
- Copy SQL of the object (pgAdmin)
- Delete objects
- Replace payment2 by payment in create statement.

-- Constraints
ALTER TABLE control.payment2 DROP CONSTRAINT fk_payment__payment_rtp;
ALTER TABLE control.payment ADD CONSTRAINT fk_payment__payment_rtp FOREIGN KEY (rtp_session) REFERENCES control.payment_rtp (id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE RESTRICT;

ALTER TABLE control.payment2 DROP CONSTRAINT fk_payment__pdm;
ALTER TABLE control.payment ADD CONSTRAINT fk_payment__pdm FOREIGN KEY (pdm_id) REFERENCES control.pdm (pdm_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE control.payment2 DROP CONSTRAINT fk_payment__tariff_info;
ALTER TABLE control.payment ADD CONSTRAINT fk_payment__tariff_info FOREIGN KEY (tariff_info_id) REFERENCES control.tariff_info (tai_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE RESTRICT;

ALTER TABLE control.bonus_disposable DROP CONSTRAINT fk_bonus_disposable__payment;
ALTER TABLE control.bonus_disposable ADD CONSTRAINT fk_bonus_disposable__payment FOREIGN KEY (pay_ts, pay_pdm_id) REFERENCES control.payment (pay_ts, pdm_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE control.payment_bonus DROP CONSTRAINT fk_payment_bonus__payment;
ALTER TABLE control.payment_bonus ADD CONSTRAINT fk_payment_bonus__payment FOREIGN KEY (pay_ts, pay_pdm_id) REFERENCES control.payment (pay_ts, pdm_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE control.payment_ext DROP CONSTRAINT "fk_payment_ext__payment-base";
ALTER TABLE control.payment_ext ADD CONSTRAINT "fk_payment_ext__payment-base" FOREIGN KEY (base_pdm_id, base_pay_ts) REFERENCES control.payment (pdm_id, pay_ts) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE control.payment_ext DROP CONSTRAINT "fk_payment_ext__payment-ext";
ALTER TABLE control.payment_ext ADD CONSTRAINT "fk_payment_ext__payment-ext" FOREIGN KEY (ext_pdm_id, ext_pay_ts) REFERENCES control.payment (pdm_id, pay_ts) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE control.payment_recent DROP CONSTRAINT fk_payment_recent__payment;
ALTER TABLE control.payment_recent ADD CONSTRAINT fk_payment_recent__payment FOREIGN KEY (pay_ts, pdm_id) REFERENCES control.payment (pay_ts, pdm_id) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE;

-- Indexes
DROP INDEX control.fki_payment__pdm;
CREATE INDEX IF NOT EXISTS fki_payment__pdm ON control.payment USING btree (pdm_id ASC NULLS LAST);

DROP INDEX control.fki_payment__rtp;
CREATE INDEX IF NOT EXISTS fki_payment__rtp ON control.payment USING btree (rtp_session ASC NULLS LAST);

Drop original table

DROP TABLE control.payment2;