Optimizing Large-Scale PostgreSQL Databases for Performance
Introduction
In environments where large datasets are common, such as a TB dataset, it is essential to optimize the database to ensure efficient performance. This article outlines strategies to enhance the performance of PostgreSQL databases, particularly when dealing with large volumes of data and complex dashboards.
Database Configuration
For the purpose of this guide, we consider a PostgreSQL Server database setup with the following specifications:
Size | Operating System | Disk Size |
---|---|---|
Standard_D32as_v5 (32 vCPUs, 128 GiB memory) | Linux | 2 TB |
Example Dashboard
Please refer to the Marketing Analytics Dashboard mentioned in the article Optimizing Database Performance with Materialized Views and Pre-Aggregated Tables for further information. In this article, we will provide examples based on the dashboard.
Performance Enhancement Strategies
Indexing:
PostgreSQL offers several index types, including B-tree, Hash, GiST, GIN, and BRIN. Selecting the appropriate index type based on the specific use case can significantly improve query performance.
We have used below indexing in this case
-- Index for the "date" column
CREATE INDEX marketing_details_date_idx ON public.marketing_details USING btree (date);
-- Index for the combination of "destinationurl" and "channel_name"
CREATE INDEX marketing_details_destinationurl_channel_name_idx ON public.marketing_details USING btree (destinationurl, channel_name);
Example Table Structure:
CREATE TABLE IF NOT EXISTS public.marketing_details
(
id SERIAL,
date TIMESTAMP WITHOUT TIME ZONE NOT NULL DEFAULT (CURRENT_DATE - ((floor(((random() * (10)::double precision) * (365)::double precision)) || ' days'::TEXT))::INTERVAL),
destinationurl TEXT COLLATE pg_catalog."default",
adcost INTEGER DEFAULT 0,
visitors INTEGER,
visitors_after_landing_page INTEGER,
sessions INTEGER,
impressions INTEGER,
clicks INTEGER,
revenue INTEGER,
expense INTEGER,
channel_id INTEGER,
channel_name TEXT COLLATE pg_catalog."default",
product_used INTEGER,
leads INTEGER,
wons INTEGER,
PRIMARY KEY (date, id) -- Include the 'date' column in the primary key
) PARTITION BY RANGE (date);
Table Partitioning:
Partitioning a table into smaller segments can lead to more efficient queries by minimizing the data scanned.
Sample Partition Creation:
CREATE TABLE marketing_details_2004 PARTITION OF marketing_details FOR VALUES FROM ('2004-01-01') TO ('2005-01-01');
Performance Metrics
The average loading times for various widgets in the dashboard are as follows:
Widget | Loading Time (seconds) |
---|---|
Grid | 61 |
HeatMap | 58 |
NumberCard | 24 |
PieChart | 58 |
Combobox | 24 |
LineChart | 58 |
Conclusion
By implementing indexing and partitioning strategies, you can significantly improve the performance of PostgreSQL databases handling large datasets. This leads to faster data access, efficient aggregation, and an overall better user experience with data-intensive applications.
Additional References
Best Practices for Optimizing Data Models and Dashboards for Server Performance
Connecting Materialized Views in Bold BI - MSSQL, PostgreSQL, Oracle and Google Big Query
PostgreSQL Materialized Views