Optimizing Database Performance with Materialized Views and Pre-Aggregated Tables
Introduction
In a database with millions of records, efficiently extracting and analyzing data for dashboards is crucial for optimal performance. Raw tables with large datasets may result in slow query performance and increased processing times. This knowledge base article explores the benefits of using summary data through views, materialized views, and pre-aggregated tables in PostgreSQL to enhance dashboard performance.
What is pre-aggregated or summary data?
Pre-aggregated or summary data refers to the practice of storing aggregated information, such as totals or averages, in a separate structure to enhance query performance. This approach reduces the need for extensive calculations during query execution, making data retrieval faster and more efficient.
The Advantages of Summarized Data for Improved Dashboard Performance
Business Intelligence (BI) dashboards play a crucial role in helping organizations make informed decisions based on data analysis. While raw data provides a comprehensive view of information, utilizing summarized data in BI dashboards offers significant performance advantages.
Faster Query Execution:
Summarized data reduces the volume of information that needs to be processed during queries. Aggregated data allows for quicker retrieval and rendering of results in BI dashboards. With reduced query execution times, users experience improved responsiveness and a more seamless interactive experience.
Enhanced Dashboard Load Times:
Loading large amounts of raw data into BI dashboards can result in slower performance, especially when dealing with extensive datasets. Summarized data, being a condensed representation, significantly reduces the load times of dashboards. This ensures that users can access critical information swiftly without delays.
Optimized Resource Utilization:
Utilizing summarized data helps in optimizing resource utilization, as less computing power is required for processing and rendering. This is particularly beneficial for organizations with limited hardware resources, as it allows them to efficiently manage and scale their BI infrastructure.
Improved Scalability:
Summarized data is more scalable than raw data, especially when dealing with increasing data volumes. BI dashboards built on summarized data can handle larger datasets without sacrificing performance. This scalability is crucial for organizations experiencing data growth over time.
Reduced Network Latency:
Transmitting large volumes of raw data over a network can lead to latency issues, affecting the overall performance of BI dashboards. Summarized data, being more compact, minimizes the data transfer requirements and subsequently reduces network latency. This is particularly important for remote users accessing dashboards over the internet.
Enhanced Visualization Responsiveness:
Visualization components in BI dashboards, such as charts and graphs, benefit from summarized data by rendering more quickly. Users can interact with visualizations in real-time, enabling a more dynamic and responsive analytical experience.
Efficient Aggregation and Roll-Up:
Summarized data often includes pre-aggregated metrics and key performance indicators (KPIs), simplifying the presentation of high-level insights. This allows users to focus on critical information without the need to manually aggregate raw data, streamlining the decision-making process.
Example Dashboard
The Marketing Analytics Dashboard discussed in the example offers a visualized overview of campaign performance and financial metrics. Leveraging PostgreSQL, this tool provides actionable insights for optimizing marketing strategies.
Dataset Overview:
Utilizing the marketing_details
table, it captures key metrics like ad costs, clicks, revenue, and channel-specific data. With columns such as destinationurl
and product_used
, it enables granular analysis of campaign effectiveness.
Derived Metrics for BI:
Metrics include Conversion Rate (Wons/Leads), Cost Per Click (Ad Cost/Clicks), and Average Revenue Per Visitor channel-wise. Visualizations track Total Visitors, Sessions, and Engagement Rates, aiding in data-driven decision-making for marketers.
This Marketing Analytics Dashboard, driven by PostgreSQL data, provides marketers with a comprehensive view of performance metrics. Through insightful visualizations and derived KPIs, it empowers users to optimize spending, enhance engagement, and make informed decisions for successful campaigns.
Disclaimer:
Please note that the data presented in this example is sample data created for a knowledge base article on using summary data. The metrics and values shown are for illustrative purposes only and may not accurately represent real-world data.
How to use summary data for creating Dashboard in Bold BI
To use summary data or pre-aggregated data in Bold BI, you can employ the following methods:
- Creating Materialized Views in PostgreSQL: Enhance performance by storing precomputed results using materialized views, ensuring faster data access in your Bold BI dashboards.
- Creating Pre-aggregated Tables in PostgreSQL: Optimize query performance by manually maintaining pre-aggregated tables, offering a direct approach to data optimization within Bold BI.
Materialized Views:
A materialized view is a database object in PostgreSQL that stores the result of a query physically. Unlike regular views, materialized views are precomputed and stored, allowing for faster query performance by avoiding the need to recompute the result each time the view is queried.
Syntax for Creating Materialized View in PostgreSQL
CREATE MATERIALIZED VIEW view_name AS
SELECT
-- Your SELECT statement here
FROM
-- Your source tables or views here
GROUP BY
-- Your GROUP BY clause here;
How to Refresh Materialized Views
To refresh a materialized view, you can use the following command:
REFRESH MATERIALIZED VIEW view_name;
Creating Materialized View for the Marketing Analytics Dashboard
CREATE MATERIALIZED VIEW Marketing_Details_SummaryMaterializedView AS
SELECT
TO_CHAR(date, 'Mon YYYY') AS month_year,
destinationurl,
channel_name,
COUNT(destinationurl) AS total_unique_destinations,
SUM(adcost) AS total_adcost,
AVG(adcost) AS average_adcost,
SUM(visitors) AS total_visitors,
SUM(visitors_after_landing_page) AS total_visitors_after_landing_page,
SUM(sessions) AS total_sessions,
SUM(impressions) AS total_impressions,
SUM(clicks) AS total_clicks,
SUM(revenue) AS total_revenue,
AVG(revenue) AS average_revenue,
SUM(expense) AS total_expense,
AVG(expense) AS average_expense,
COUNT(channel_id) AS total_unique_channel_ids,
COUNT(channel_name) AS total_unique_channels,
COUNT(product_used) AS total_unique_products_used,
SUM(leads) AS total_leads,
AVG(leads) AS average_leads,
SUM(wons) AS total_wons,
AVG(wons) AS average_wons,
(SUM(leads)::float / SUM(visitors)) AS conversion_rate,
(SUM(clicks)::float / SUM(impressions)) AS click_through_rate,
(SUM(adcost)::float / SUM(clicks)) AS cost_per_click,
(SUM(revenue)::float / SUM(expense)) AS return_on_investment,
(SUM(sessions)::float / SUM(visitors)) AS average_session_duration,
(SUM(wons)::float / SUM(leads)) AS lead_conversion_rate,
(SUM(revenue)::float / SUM(visitors)) AS average_revenue_per_visitor,
(SUM(clicks + leads + wons)::float / SUM(impressions)) AS channel_engagement_rate
FROM public.marketing_details
GROUP BY month_year, destinationurl, channel_name;
When to Use Materialized Views for Improving Dashboard Performance
Choose Materialized Views When:
Dynamic Queries: Users need flexibility with ad-hoc or dynamic queries.
Infrequent Data Changes: Data changes infrequently, allowing for less frequent view refreshes.
Complex Calculations: Involves complex calculations or aggregations best precomputed for performance.
Space Efficiency: Prioritize space efficiency as materialized views store only aggregated results.
Considerations in Using Materialized Views
Refresh Frequency: Consider the frequency of data updates. If your data changes frequently, you may need to refresh the materialized views more often.
Storage Impact: Materialized views occupy physical storage space. Ensure that you have sufficient storage capacity for the materialized views.
Query Complexity: Materialized views are beneficial for complex queries, but for simple and frequently changing data, they might not be the most efficient solution.
Pre-aggregated Tables
Pre-aggregated tables are another approach to optimize query performance. In this approach, you create tables with aggregated data beforehand, eliminating the need to perform aggregations at runtime.
Creating Pre-aggregated Tables for the Marketing Analytics Dashboard
CREATE VIEW Marketing_Details_SummaryView AS
SELECT
TO_CHAR(date, 'Mon YYYY') AS month_year,
destinationurl,
channel_name,
COUNT(destinationurl) AS total_unique_destinations,
SUM(adcost) AS total_adcost,
AVG(adcost) AS average_adcost,
SUM(visitors) AS total_visitors,
SUM(visitors_after_landing_page) AS total_visitors_after_landing_page,
SUM(sessions) AS total_sessions,
SUM(impressions) AS total_impressions,
SUM(clicks) AS total_clicks,
SUM(revenue) AS total_revenue,
AVG(revenue) AS average_revenue,
SUM(expense) AS total_expense,
AVG(expense) AS average_expense,
COUNT(channel_id) AS total_unique_channel_ids,
COUNT(channel_name) AS total_unique_channels,
COUNT(product_used) AS total_unique_products_used,
SUM(leads) AS total_leads,
AVG(leads) AS average_leads,
SUM(wons) AS total_wons,
AVG(wons) AS average_wons,
(SUM(leads)::float / SUM(visitors)) AS conversion_rate,
(SUM(clicks)::float / SUM(impressions)) AS click_through_rate,
(SUM(adcost)::float / SUM(clicks)) AS cost_per_click,
(SUM(revenue)::float / SUM(expense)) AS return_on_investment,
(SUM(sessions)::float / SUM(visitors)) AS average_session_duration,
(SUM(wons)::float / SUM(leads)) AS lead_conversion_rate,
(SUM(revenue)::float / SUM(visitors)) AS average_revenue_per_visitor,
(SUM(clicks + leads + wons)::float / SUM(impressions)) AS channel_engagement_rate
FROM public.marketing_details
GROUP BY month_year, destinationurl, channel_name;
Considerations in Using Pre-aggregated Tables
Storage Requirements: Pre-aggregated tables may consume more storage space, as they store redundant aggregated data.
Maintenance: Ensure that the pre-aggregated tables are kept up-to-date with the source data to avoid inconsistencies.
Query Patterns: This approach is most effective when there are specific and recurring query patterns that benefit from precomputed aggregations.
When to Use Pre-aggregated Tables for Improving Dashboard Performance
Predictable Queries: Dashboard has predefined metrics and predictable query patterns.
Read-heavy Workloads: Optimizing for read-heavy workloads is a priority.
Limited Data Changes: Data changes infrequently, and the cost of updating tables is acceptable.
Simplicity in Maintenance: Simplifying maintenance is a key consideration.
Performance comparison between raw data vs materialized views
Performance Comparison: Raw Tables vs. Materialized Views
We have constructed identical dashboards in two modes:
- Dashboard using raw data, with KPIs created through Bold BI expressions and summary types.
- Dashboard using pre-aggregated materialized views, where required KPIs are calculated within the materialized view queries.
Performance Metrics
Data Count (50 Million Records):
Query Execution Time (in seconds) - Widget Names:
Widget Name | Design Mode using Raw Data | Design Mode using Materialized Views |
---|---|---|
Select Period (Combo box) | 00:00:09.018 | 00:00:00.001 |
Select Channel (Combo box) | 00:00:03.246 | 00:00:00.001 |
Revenue vs Expenses (KPI Card) | 00:00:01.326 | 00:00:00.001 |
Conversion Rate (Number Card) | 00:00:04.195 | 00:00:00.001 |
Cost Per Click (Number Card) | 00:00:03.196 | 00:00:00.001 |
Total Visitors (Number Card) | 00:00:01.212 | 00:00:00.001 |
Total Sessions (Number Card) | 00:00:01.907 | 00:00:00.005 |
Average Leads (Number Card) | 00:00:01.873 | 00:00:00.002 |
Average Click Through Rate (Number Card) | 00:00:03.729 | 00:00:00.001 |
Avg. Revenue Per Visitor Channel-wise (Chart) | 00:00:04.212 | 00:00:00.003 |
Average Session Duration (Heat Map) | 00:00:05.144 | 00:00:00.001 |
Top 5 Channels Based on Engagement Rate (Proportion Chart) | 00:00:30.908 | 00:00:00.001 |
Summary Details (Grid) | 00:00:21.081 | 00:00:00.011 |
Summary
The use of materialized views significantly improves dashboard performance, as evidenced by reduced query execution times across all widgets. The materialized views’ pre-aggregated nature allows for faster retrieval of data, especially in scenarios where complex calculations and aggregations are involved. Considerations should be given to the trade-offs in terms of storage space and update frequency when opting for materialized views. The choice ultimately depends on specific use case requirements and performance priorities.