Optimizing Database Performance with Indexed Views in SQL Server
Introduction
In a database with millions of records, efficiently extracting and analyzing data for dashboards is crucial for optimal performance. Raw tables with large datasets may result in slow query performance and increased processing times. This knowledge base article explores the benefits of using summary data through pre-aggregated views and indexed views in SQL Server to enhance dashboard performance.
What is pre-aggregated or summary data?
Pre-aggregated or summary data refers to the practice of storing aggregated information, such as totals or averages, in a separate structure to enhance query performance. This approach reduces the need for extensive calculations during query execution, making data retrieval faster and more efficient.
The Advantages of Summarized Data for Improved Dashboard Performance
Business Intelligence (BI) dashboards play a crucial role in helping organizations make informed decisions based on data analysis. While raw data provides a comprehensive view of information, utilizing summarized data in BI dashboards offers significant performance advantages.
Faster Query Execution:
Summarized data reduces the volume of information that needs to be processed during queries. Aggregated data allows for quicker retrieval and rendering of results in BI dashboards. With reduced query execution times, users experience improved responsiveness and a more seamless interactive experience.
Enhanced Dashboard Load Times:
Loading large amounts of raw data into BI dashboards can result in slower performance, especially when dealing with extensive datasets. Summarized data, being a condensed representation, significantly reduces the load times of dashboards. This ensures that users can access critical information swiftly without delays.
Optimized Resource Utilization:
Utilizing summarized data helps in optimizing resource utilization, as less computing power is required for processing and rendering. This is particularly beneficial for organizations with limited hardware resources, as it allows them to efficiently manage and scale their BI infrastructure.
Improved Scalability:
Summarized data is more scalable than raw data, especially when dealing with increasing data volumes. BI dashboards built on summarized data can handle larger datasets without sacrificing performance. This scalability is crucial for organizations experiencing data growth over time.
Reduced Network Latency:
Transmitting large volumes of raw data over a network can lead to latency issues, affecting the overall performance of BI dashboards. Summarized data, being more compact, minimizes the data transfer requirements and subsequently reduces network latency. This is particularly important for remote users accessing dashboards over the internet.
Enhanced Visualization Responsiveness:
Visualization components in BI dashboards, such as charts and graphs, benefit from summarized data by rendering more quickly. Users can interact with visualizations in real-time, enabling a more dynamic and responsive analytical experience.
Efficient Aggregation and Roll-Up:
Summarized data often includes pre-aggregated metrics and key performance indicators (KPIs), simplifying the presentation of high-level insights. This allows users to focus on critical information without the need to manually aggregate raw data, streamlining the decision-making process.
How to use indexed view data for creating Dashboard in Bold BI
To use indexed views in Bold BI, you can employ the following methods:
- Creating Views in SQL Server: Create view using WITH SCHEMABINDING option which binds the view to the schema of the underlying tables.
- Creating Clustered Index on the View in SQL Server: This materializes the view.
Creating Views:
Syntax for Creating View in SQL Server
CREATE VIEW <view name>
WITH SCHEMABINDING AS
SELECT
-- Your SELECT statement here
FROM
-- Your source tables or views here
GROUP BY -- Your GROUP BY clause here;
SCHEMABINDING keywords prevent tables used in the views to make any such modifications that can affect the view’s definition. When this keyword is used in the view it binds the view to the schema of the underlying tables. If you want to modify the table definition which may affect the view, you may have to drop the view first and then change the table definition.
CREATE UNIQUE CLUSTERED INDEX <index name>
ON <view name> (<view columns>);
This statement materializes the view, making it have a physical existence in the database.
How to Refresh Views
:The data is automatically refreshed when data changes in the underlying tables. Once a view is indexed, the data is kept in sync with the base tables by incorporating the changes to the view in the query plan which updates the base table. So, a query which adds a row to a table referenced by an indexed view will have additional operators added to the end of the plan to maintain the view. Therefore, you should only create an indexed view against the tables that have in-frequent data updates.
When executing UPDATE, DELETE or INSERT operations (Data Manipulation Language, or DML) on a table referenced by a large number of indexed views, or fewer but very complex indexed views, those referenced indexed views will have to be updated as well. As a result, DML query performance can degrade significantly, or in some cases, a query plan cannot even be produced. In such scenarios, test your DML queries before production use, analyze the query plan and tune/simplify the DML statement.
Considerations in Using Views
Pros:
- The view definition can reference one or more tables in the same database.
- Once the unique clustered index is created, additional non-clustered indexes can be created against the view.
- You can update the data in the underlying tables – including inserts, updates, deletes, and even truncates.
Cons:
- The view definition can’t reference other views, or tables in other databases.
- You can’t modify the underlying tables and columns. The view is created with the WITH SCHEMABINDING option.
- Below are some of the major limitations mentioned that a SELECT statement in the view must not contain the following:
- DISTINCT
- TOP
- OUTER JOIN
- HAVING
- COUNT(*)
- MIN() or MAX()
- Subqueries
- UNION or UNION ALL
- ROW_NUMBER(), RANK(), DENSE_RANK()
- Non-deterministic functions (e.g., NEWID(), RAND())
Refer this document for additional requirements in creating views, https://learn.microsoft.com/en-us/sql/relational-databases/views/create-indexed-views?view=sql-server-ver16#additional-requirements
Example Dashboard
The Marketing Analytics Dashboard discussed in the example offers a visualized overview of campaign performance and financial metrics. Leveraging PostgreSQL, this tool provides actionable insights for optimizing marketing strategies.
Dataset Overview:
Utilizing the marketing details table, it captures key metrics like ad costs, clicks, revenue, and channel-specific data. With columns such as destinationurl and product_used, it enables granular analysis of campaign effectiveness.
Derived Metrics for BI:
Metrics include Conversion Rate (Wons/Leads), Cost Per Click (Ad Cost/Clicks), and Average Revenue Per Visitor channel-wise. Visualizations track Total Visitors, Sessions, and Engagement Rates, aiding in data-driven decision-making for marketers.
This Marketing Analytics Dashboard, driven by PostgreSQL data, provides marketers with a comprehensive view of performance metrics. Through insightful visualizations and derived KPIs, it empowers users to optimize spending, enhance engagement, and make informed decisions for successful campaigns.
Please note that the data presented in this example is sample data created for a knowledge base article on using summary data. The metrics and values shown are for illustrative purposes only and may not accurately represent real-world data.
Creating View for the Marketing Analytics Dashboard
CREATE VIEW [dbo].[MarketingAnalytics_Indexed]
WITH SCHEMABINDING AS
SELECT
COUNT_BIG(*) AS cnt,
DATEPART(MONTH, [date]) AS [month],
DATEPART(YEAR, [date]) AS [year],
[destinationurl] as destinationurl,
[channel_name] as channel_name,
COUNT_BIG([destinationurl]) AS total_unique_destinations,
SUM(ISNULL(([adcost]), 0)) AS total_adcost,
SUM(ISNULL(CONVERT(BIGINT, [visitors]), 0)) AS total_visitors,
SUM(ISNULL([visitors_after_landing_page], 0)) AS total_visitors_after_landing_page,
SUM(ISNULL(CONVERT(BIGINT, [sessions]), 0)) AS total_sessions,
SUM(ISNULL(CONVERT(BIGINT, [impressions]), 0)) AS total_impressions,
SUM(ISNULL(CONVERT(BIGINT, [clicks]), 0)) AS total_clicks,
SUM(ISNULL(CONVERT(BIGINT, [revenue]), 0)) AS total_revenue,
SUM(ISNULL([expense], 0))AS total_expense,
COUNT_BIG(channel_id) AS total_unique_channel_ids,
COUNT_BIG(channel_name) AS total_unique_channels,
COUNT_BIG(product_used) AS total_unique_products_used,
SUM(ISNULL(CONVERT(BIGINT, [leads]), 0)) AS total_leads,
SUM(ISNULL(CONVERT(BIGINT, [wons]), 0)) AS total_wons
FROM [dbo].[MarketingAnalytics]
GROUP BY DATEPART(MONTH, [date]), DATEPART(YEAR, [date]), [destinationurl], [channel_name]
CREATE UNIQUE CLUSTERED INDEX IX_VMarketingAnalytics
ON MarketingAnalytics_Indexed ([month], [year],destinationurl, channel_name );
Performance comparison between raw data vs indexed views
We have constructed identical dashboards in two modes:
- Dashboard using raw data, with KPIs created through Bold BI expressions and summary types.
- Dashboard using indexed views, where required KPIs are calculated partially within the view queries and rest using Bold BI expressions and summary types.
Data Count (10 million Records):
Widget Name | Raw View (Query execution) | Indexed View (Query execution) |
---|---|---|
Select Period | 00:00:58 | 00:00:00.003 |
Select Channel | 00:01:24 | 00:00:00.009 |
Revenue VS Expenses | 00:00:52 | 00:00:00.001 |
Conversion Rate | 00:00:52 | 00:00:00.003 |
Cost per click | 00:00:31 | 00:00:00.001 |
Sum Of visitors | 00:00:25 | 00:00:00.003 |
Average Revenue Per Visitor Channel wise | 00:01:33 | 00:00:01 |
Top 5 Channel Based on Engagement Rate | 00:02:31 | 00:00:00.005 |
Average Session Duration | 00:01:56 | 00:00:00.009 |
Avg. leads | 00:00:42 | 00:00:01 |
click_through_rate | 00:00:42 | 00:00:00.006 |
Sum Of sessions | 00:00:51 | 00:00:00.009 |
Summary Details | 00:03:36 | 00:00:00.012 |
Conclusion
Indexed views significantly improves dashboard performance, as evidenced by reduced query execution times across all widgets. The indexed views’ created as pre-aggregated table allows for faster retrieval of data, especially in scenarios where complex calculations and aggregations are involved.