Working with pipeline using custom query in Data Hub

Published:

Data Hub allows the creation of pipelines using custom queries to obtain aggregated or filtered data from a large database, rather than extracting all of the data.

How to create pipeline using custom query in Data Hub

Steps on how to create the pipeline using a custom query:

Click Data Hub.
Click ‘+’ icon to Add Pipeline and provide the name for the pipeline. Save the pipeline.

Select the newly created pipeline and choose the SQL / MySQL / Oracle / PostgreSQL connector. Double-click or click on the Add Template option to add the template.
Add Query under Properties and include the query. Please note that when using Query in the template, the Select parameter is used to create the table for storing the query result instead of extracting the data from the name given in it.

Sample YAML Template

version: 1.0.1
encrypt_credentials: false
direct_target_import: false
union_all_tables: true
add_dbname_column: false
direct_load_to_destination: true
plugins:
 extractors:
   - name: MSSQL
     connectorname: MSSQL
     schemaname: dbo
     config:
       host: localhost
       port: 1433
       username: sa
       database: Retail
       password: Syncfusion@123
       drivername: mssql+pyodbc
       driver: ODBC+Driver+17+for+SQL+Server
     properties:
       query: SELECT AVG(unit_cost) AS average_unit_cost FROM [Retail].[dbo].[Product_Details];
     metadata: 
     select:
     - aggregated_table

Click Save and choose the desired destination to save the pipeline.

Now the pipeline will be saved and started automatically.
Click on Logs to see if the run is completed and a data source is created in Bold BI/Bold Reports.
An equivalent live data source will be created by Data Hub using the datastore details in Bold BI.
Click Edit DataSource Option to view the created tables.
To configure interval-based scheduling, click on the schedules tab and click on the schedule icon and configure it.