Optimizing Large-Scale PostgreSQL Databases for Performance

Published:

Introduction

In environments where large datasets are common, such as a TB dataset, it is essential to optimize the database to ensure efficient performance. This article outlines strategies to enhance the performance of PostgreSQL databases, particularly when dealing with large volumes of data and complex dashboards.

Database Configuration

For the purpose of this guide, we consider a PostgreSQL Server database setup with the following specifications:

Size	Operating System	Disk Size
Standard_D32as_v5 (32 vCPUs, 128 GiB memory)	Linux	2 TB

Example Dashboard

Please refer to the Marketing Analytics Dashboard mentioned in the article Optimizing Database Performance with Materialized Views and Pre-Aggregated Tables for further information. In this article, we will provide examples based on the dashboard.

Performance Enhancement Strategies

Indexing:

PostgreSQL offers several index types, including B-tree, Hash, GiST, GIN, and BRIN. Selecting the appropriate index type based on the specific use case can significantly improve query performance.

We have used below indexing in this case

-- Index for the "date" column
CREATE INDEX marketing_details_date_idx ON public.marketing_details USING btree (date);

-- Index for the combination of "destinationurl" and "channel_name"
CREATE INDEX marketing_details_destinationurl_channel_name_idx ON public.marketing_details USING btree (destinationurl, channel_name);

Example Table Structure:

CREATE TABLE IF NOT EXISTS public.marketing_details
(
    id SERIAL,
    date TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT (CURRENT_DATE - ((floor(((random() * (10)::double precision) * (365)::double precision)) || ' days'::TEXT))::INTERVAL),
    destinationurl TEXT COLLATE pg_catalog."default",
    adcost INTEGER DEFAULT 0,
    visitors INTEGER,
    visitors_after_landing_page INTEGER,
    sessions INTEGER,
    impressions INTEGER,
    clicks INTEGER,
    revenue INTEGER,
    expense INTEGER,
    channel_id INTEGER,
    channel_name TEXT COLLATE pg_catalog."default",
    product_used INTEGER,
    leads INTEGER,
    wons INTEGER,
    PRIMARY KEY (date, id) -- Include the 'date' column in the primary key
) PARTITION BY RANGE (date);

Table Partitioning:

Partitioning a table into smaller segments can lead to more efficient queries by minimizing the data scanned.

Sample Partition Creation:

CREATE TABLE marketing_details_2004 PARTITION OF marketing_details FOR VALUES FROM ('2004-01-01') TO ('2005-01-01');

Performance Metrics

The average loading times for various widgets in the dashboard are as follows:

Widget	Loading Time (seconds)
Grid	61
HeatMap	58
NumberCard	24
PieChart	58
Combobox	24
LineChart	58

Conclusion

By implementing indexing and partitioning strategies, you can significantly improve the performance of PostgreSQL databases handling large datasets. This leads to faster data access, efficient aggregation, and an overall better user experience with data-intensive applications.

Additional References

Best Practices for Optimizing Data Models and Dashboards for Server Performance
Connecting Materialized Views in Bold BI - MSSQL, PostgreSQL, Oracle and Google Big Query
PostgreSQL Materialized Views

Was this article useful?

Dislike

Written by Sangavi Eswaramoorthi

Updated:

Comments (0)