Working with pipeline using custom query in Data Hub
Data Hub allows the creation of pipelines using custom queries to obtain aggregated or filtered data from a large database, rather than extracting all of the data.
How to create pipeline using custom query in Data Hub
Steps on how to create the pipeline using a custom query:
- Click Data Hub.
- Click ‘+’ icon to Add Pipeline and provide the name for the pipeline. Save the pipeline.
-
Select the newly created pipeline and choose the SQL / MySQL / Oracle / PostgreSQL connector. Double-click or click on the Add Template option to add the template.
-
Add Query under Properties and include the query. Please note that when using Query in the template, the Select parameter is used to create the table for storing the query result instead of extracting the data from the name given in it.
Sample YAML Template
version: 1.0.1
encrypt_credentials: false
direct_target_import: false
union_all_tables: true
add_dbname_column: false
direct_load_to_destination: true
plugins:
extractors:
- name: MSSQL
connectorname: MSSQL
schemaname: dbo
config:
host:
port: 1433
username: sa
database: Retail
password:
drivername: mssql+pyodbc
driver: ODBC+Driver+17+for+SQL+Server
properties:
query: SELECT AVG(unit_cost) AS average_unit_cost FROM [Retail].[dbo].[Product_Details];
metadata:
select:
- aggregated_table
- Click Save and choose the desired destination to save the pipeline.
For using multiple line query, use folded block scalar parameter (>) as below,
Sample YAML Template
version: 1
encrypt_credentials: true
union_all_tables: true
add_dbname_column: false
use_snake_casing: true
plugins:
extractors:
- name: MSSQL
connectorname: MSSQL
schemaname: dbo
config:
host:
username: sa
password:
database: AdventureWorks2017
drivername: mssql+pyodbc
driver: ODBC+Driver+17+for+SQL+Server
properties:
query: >
SELECT
soh.SalesOrderID,
soh.OrderDate,
sod.OrderQty,
p.Name AS ProductName,
p.ListPrice,
sod.UnitPrice
FROM
Sales.SalesOrderHeader AS soh
JOIN
Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID
JOIN
Production.Product AS p ON sod.ProductID = p.ProductID
WHERE
soh.OrderDate BETWEEN '2013-01-01' AND '2013-12-31'
metadata:
select:
- QueryResult
IgnoreSourcetable: false
isDropTable: true
direct_target_import: false
direct_load_to_destination: true
isDeleteandUpdate: false
-
Now the pipeline will be saved and started automatically.
-
Click on Logs to see if the run is completed and a data source is created in Bold BI/Bold Reports.
-
An equivalent live data source will be created by Data Hub using the datastore details in Bold BI.
-
Click Edit DataSource Option to view the created tables.
-
To configure interval-based scheduling, click on the schedules tab and click on the schedule icon and configure it.
Limitations:
- Support only select queries.