Optimizing database performance in Google BigQuery using Materialized views
Introduction
In a database with millions of records, efficiently extracting and analyzing data for dashboards is crucial for optimal performance. Raw tables with large datasets may result in slow query performance and increased processing times. This knowledge-based article explores the benefits of using materialized views in Google BigQuery to enhance dashboard performance.
What is a Materialized view?
In BigQuery, materialized views are precomputed views that periodically cache the results of a query for increased performance and efficiency. BigQuery leverages precomputed results from materialized views and whenever possible reads only changes from the base tables to compute up-to-date results. Materialized views can be queried directly or can be used by the BigQuery optimizer to process queries to the base tables.
The advantages of Materialized view data for improved dashboard performance
Business Intelligence (BI) dashboards play a crucial role in helping organizations make informed decisions based on data analysis. While raw data provides a comprehensive view of information, utilizing summarized data in BI dashboards offers significant performance advantages. Queries that use materialized views are generally faster and consume fewer resources than queries that retrieve the same data only from the base tables. Materialized views can significantly improve the performance of workloads that have the characteristic of common and repeated queries.
There are several advantages to using materialized views in Google BigQuery. Here are some of them:
Faster query response times:
Materialized views are precomputed and stored on disk, which means that they don’t waste time resolving the query or joining in the query that creates the materialized view. As a result, materialized views respond faster in comparison to a view.
Reduced computation cost:
Queries that use materialized views consume fewer resources than queries that retrieve the same data only from the base tables. This is because materialized views can be queried directly or can be used by the BigQuery optimizer to process queries to the base tables.
Zero maintenance:
Materialized views are precomputed in the background when the base tables change. Any incremental data changes from the base tables are automatically added to the materialized views, with no user action required.
Easy to refresh:
Materialized views are easy to refresh. Simply run the REFRESH MATERIALIZED VIEW command.
Automatic query rewriting:
If any part of a query against the base table can be resolved by querying the materialized view, then BigQuery reroutes the query to use the materialized view for better performance and efficiency.
Queries with high computation cost and small dataset results:
Materialized views can significantly improve the performance of workloads that have the characteristic of common and repeated queries.
Example Dashboard
The Stack Overflow Analytics Dashboard discussed in the example offers a visualized overview of a platform synonymous with collaborative programming and knowledge sharing, is vital for users seeking real-time information and trends in the developer community. By leveraging materialized views, we aim to provide a seamless and efficient way to retrieve and present data, offering users a dynamic and responsive interface for navigating through the wealth of information available on Stack Overflow.
Dataset overview:
Utilizing the bigquery-public-data.stackoverflow. stackoverflow_posts
table, it captures key metrics like Answers count, Comments Count and Score data.
This Stack Overflow Analytics Dashboard, powered by Google BigQuery, offers a robust and data-driven platform for developers and community managers to gain valuable insights into user interactions, question trends, and overall platform engagement. Through dynamic visualizations and key performance indicators (KPIs) derived from BigQuery’s powerful analytics capabilities, this dashboard equips users with the tools to enhance the platform’s performance, identify community patterns, and make informed decisions for continuous improvement. Whether tracking user activity, question response times, or community growth, this analytics dashboard provides a comprehensive and actionable view of Stack Overflow’s dynamics, contributing to a more effective and responsive developer community.
Disclaimer:
Please note that the data presented in this example is sample data created for a knowledge base article using summary data. The metrics and values shown are for illustrative purposes only and may not accurately represent real-world data.
How to use summary data for creating a dashboard in Bold BI
To use summary data in Bold BI, you can Create Materialized Views in Google BigQuery: Enhance performance by storing precomputed results using materialized views, ensuring faster data access in your Bold BI dashboards.
Materialized Views:
A materialized view is a database object in Google BigQuery that stores the result of a query physically. Unlike regular views, materialized views are precomputed and stored, allowing for faster query performance by avoiding the need to recompute the result each time the view is queried.
Syntax for creating a materialized view in GoogleBigQuery
CREATE MATERIALIZED VIEW PROJECT_ID.DATASET.MATERIALIZED_VIEW_NAME AS (
SELECT – You Select statement here
FROM – Your source tables
GROUP BY – Your Group By clause here
);
How to refresh materialized views
To refresh a materialized view, you can use the following command:
REFRESH MATERIALIZED VIEW view_name;
Creating materialized view for StackOverflow analytics dashboard
CREATE MATERIALIZED VIEW `sampletester-216811.dataset.stackoverflowMatview5MIL` AS
SELECT
FORMAT_DATE('%b %Y', creation_date) AS creation_month,
title,
body,
Count(id) as total_id,
AVG(accepted_answer_id) AS accepted_answer_id,
SUM(answer_count) AS total_answer_count,
SUM(comment_count) AS total_comment_count,
count(community_owned_date) AS community_owned_date,
count(creation_date) AS creation_date,
count(favorite_count) AS favorite_count,
count(last_activity_date) AS last_activity_date,
count(last_edit_date) AS last_edit_date,
MAX(last_editor_display_name) AS last_editor_display_name,
AVG(last_editor_user_id) AS last_editor_user_id,
MAX(owner_display_name) AS owner_display_name,
SUM(owner_user_id) AS owner_user_id,
SUM(parent_id) AS parent_id,
SUM(post_type_id) AS post_type_id,
SUM(score) AS total_score,
Max(tags) AS tags,
SUM(view_count) AS total_view_count
FROM
`sampletester-216811.dataset.stackoverflow5M`
Where answer_count > 0
GROUP BY title, body, creation_month
When to use Materialized Views for improving dashboard performance
Choose Materialized Views When:
Dynamic Queries:
Users need flexibility with ad-hoc or dynamic queries.
Infrequent data changes:
Data changes infrequently, allowing for less frequent view refreshes.
Complex calculations:
Involves complex calculations or aggregations best precomputed for performance.
Space Efficiency: Prioritize space efficiency as materialized views store only aggregated results.
Considerations in using Materialized Views
Refresh frequency:
Consider the frequency of data updates. If your data changes frequently, you may need to refresh the materialized views more often.
Storage Impact:
Materialized views occupy physical storage space. Ensure that you have sufficient storage capacity for the materialized views.
Query complexity:
Materialized views are beneficial for complex queries, but for simple and frequently changing data, they might not be the most efficient solution.
Performance comparison: Raw tables vs. Materialized views
We have constructed identical dashboards in two modes:
- Dashboard using raw data, with KPIs created through Bold BI expressions and summary types.
- Dashboard using materialized views, where required KPIs are calculated within the materialized view queries.
Performance Metrics
Data Count is 5 Million Records
Query execution time (in Format - HH:mm:ss.sss) against respective widget names.
Widget name | Design Mode using Raw data | Design Mode using Materialized data |
---|---|---|
Post by year (Pie chart) | 00:00:03.309 | 00:00:01.042 |
User Growth Rate ( KPI Card) | 00:00:01.943 | 00:00:01.525 |
Post by month (Chart) | 00:00:03.311 | 00:00:01.104 |
Post Growth Rate (KPI Card) | 00:00:02.990 | 00:00:01.580 |
Post per user (KPI Card) | 00:00:01.446 | 00:00:01.066 |
Votes (Number Card) | 00:00:03.763 | 00:00:01.556 |
Posts (Number Card) | 00:00:03.243 | 00:00:01.557 |
Comments (Number Card) | 00:00:03.309 | 00:00:01.514 |
Post Types by month (Chart) | 00:00:03.311 | 00:00:01.066 |
Avg. score (Number Card) | 00:00:02.457 | 00:00:01.536 |
Views by month (Heat Map) | 00:00:03.428 | 00:00:01.078 |
Top 20 row (Grid) | 00:00:03.444 | 00:00:01.261 |
Summary
The use of materialized views significantly improves dashboard performance, as evidenced by reduced query execution times across all widgets. The materialized view nature allows for faster retrieval of data, especially in scenarios where complex calculations and aggregations are involved. Considerations should be given to the trade-offs in terms of storage space and update frequency when opting for materialized views. The choice ultimately depends on specific use case requirements and performance priorities.