Category / Section
How to Merge Data from Multiple Elasticsearch Indices Using Bold Data Hub
Published:
Overview
Elasticsearch does not natively support joins, making it challenging to work with data distributed across multiple indices. Bold Data Hub provides an effective solution by enabling users to extract, transform, and merge data from multiple Elasticsearch indices into a unified destination database.
Steps to Join Data from Two ElasticSearch Indices in Bold Data Hub & Bold BI
- Extract Data from Elasticsearch Indices:
- Configure Bold Data Hub to connect to your Elasticsearch instance.
- Use the interface to select specific indices and define the data fields you wish to extract.
version: 1.0.1
encrypt_credentials: true
union_all_tables: true
add_dbname_column: false
use_snake_casing: true
plugins:
extractors:
- name: es
connectorname: ElasticSearch
schemaname:
config:
host:
port:
username:
password:
index_name:
page_size: 10000
properties:
metadata:
select:
- name: es2
connectorname: ElasticSearch
schemaname:
config:
host:
port:
username:
password:
index_name:
page_size: 10000
properties:
metadata:
select:
IgnoreSourcetable: false
isDropTable: true
direct_target_import: false
direct_load_to_destination: true
isDeleteandUpdate: false
set_not_null_as_primary_key: false
- Save the pipeline.
-
Join the Tables from Elasticsearch Indices using Transformation:
- Now the data will be extracted and moved to transform layer in nested format.
- As we show the data as nested format in transform layer, we cannot join the data in Data Hub. Unnest the data using below query to view the columns extracted,
Note: Change the schema(elasticextract) and Table name (departments).
Select * unnest("_source") from "elasticextract"."departments"
- Create a Transform table and write the join query using above unnest query to join the tables and remove duplicate columns.
SELECT d.*,
e.emp_id,
e.emp_name
FROM (
SELECT
unnest("_source") from "elasticextract"."departments"
) d
JOIN (
SELECT
unnest("_source")
FROM elasticextract.employees
) e
ON d.department_id = e.department_id
- Now Save and Transform. The data will be moved to the destination.
- Creating a Pipeline in Bold Data Hub automatically creates a Data Source in Bold BI. The Bold BI Data Source is a live data source to the destination database used in Bold Data Hub.
- Now use the data source created in the dashboard.