Articles in this section
Category / Section

Handling Large Datasets from MongoDB in Bold BI

Published:

When working with large collections in MongoDB, particularly those with over 10 million records, it is crucial to manage data efficiently to ensure smooth operation within Bold BI dashboards. Here are some recommendations to optimize performance and avoid timeouts when querying large datasets.

Optimizing Data Extraction

To minimize the load on the system and prevent timeouts during data extraction, consider the following strategies:

1. Creating Views with Required Data

MongoDB allows the creation of views that represent a subset of data from one or more collections. Views can be used to filter and aggregate data, which can significantly reduce the volume of data being transferred to Bold BI.

Sample data:

image.png

Steps to create views:

  1. Click on the table, navigate to the Aggregation tab, then click on the save drop-down, and click on “create view”.

image.png

  1. Click on the created views, go to the aggregation tab, and click on the “Add Stage” button.

image.png

  1. Next, choose the correct pipeline to create the views. Here, $match is used to filter the document stream to allow only matching documents to pass unmodified into the next pipeline stage. $match uses standard MongoDB queries. For each input document, it outputs either one document (a match) or zero documents (no match). we used to pass the status as “shipped” here.

Output:

image.png

Reference: Create MongoDB Views

2. Using On-Demand Materialized Views

Materialized views store the result of a query and can be refreshed on-demand to include the latest data. This approach is beneficial for frequently accessed data, as it avoids the need to perform complex calculations or aggregations on every query.

Example:
MongoDB does not support materialized views directly as a built-in feature, but you can achieve similar functionality using aggregation pipelines and scheduled updates.

Create an Aggregation Pipeline
Suppose you want to create a materialized view that stores the total quantity and revenue for each product:

    db.orders.aggregate([
        {
            $group: {
                _id: "$product",
                totalQuantity: { $sum: "$quantity" },
                totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }
            }
        },
        {
            $out: "materializedProductStats"
        }
    ])

This aggregation pipeline groups the orders by product, calculates the total quantity and revenue, and stores the result in a new collection named materializedProductStats.

In the Atlas, you do not need to write the full query; just choose the pipeline you need and run the stage you added. Similarly, as stated in creating views, navigate to the aggregation and add the stage.

image.png

The $out operator will save the results of the pipeline to a specified location, such as a collection, S3, or Atlas. If the collection already exists, it will be replaced, and the name of the output collection will be provided.

image.png

Output:

image.png

Reference: MongoDB Materialized Views

3. Filtering Data During Querying

When using Bold ETL, you can directly query MongoDB collections to filter and retrieve only the necessary data. This reduces the amount of data that needs to be processed and transferred to Bold BI.

Reference: Extracting Data from MongoDB Using Query in Bold ETL

Managing Dashboard Performance

To enhance the performance of dashboards that have already loaded large collections, consider the following:

1. Reducing Data Count During Extraction

Implement the data extraction optimization techniques mentioned above to reduce the initial data load.

2. Configuring Data Filters

Bold BI allows you to configure data filters to further reduce the data count. By applying filters, you can limit the data to only what is required for analysis, thus improving dashboard responsiveness and avoiding query timeouts.

Reference: Configuring Data Filters in Bold BI

Connecting in Bold BI:

  1. Establish a connection to the MongoDB data source using the necessary credentials.

image.png

  1. Materialized view name - OrdersView

image.png

By implementing these recommendations, you can significantly improve the performance of Bold BI when handling large datasets from MongoDB sources.

Additional References:

Was this article useful?
Like
Dislike
Help us improve this page
Please provide feedback or comments
GR
Written by Gayathri Ravichandran
Updated
Comments (0)
Please  to leave a comment
Access denied
Access denied