Articles in this section
Category / Section

How to Merge Data from Multiple Elasticsearch Indices Using Bold Data Hub

Published:

Overview

Elasticsearch does not natively support joins, making it challenging to work with data distributed across multiple indices. Bold Data Hub provides an effective solution by enabling users to extract, transform, and merge data from multiple Elasticsearch indices into a unified destination database.

Steps to Join Data from Two ElasticSearch Indices in Bold Data Hub & Bold BI

  1. Extract Data from Elasticsearch Indices:
    • Configure Bold Data Hub to connect to your Elasticsearch instance.
    • Use the interface to select specific indices and define the data fields you wish to extract.
version: 1.0.1
encrypt_credentials: true
union_all_tables: true
add_dbname_column: false
use_snake_casing: true
plugins:
 extractors:
 - name: es
   connectorname: ElasticSearch
   schemaname: 
   config:
     host: 
     port: 
     username: 
     password: 
     index_name: 
     page_size: 10000
   properties: 
   metadata: 
   select: 
 - name: es2
   connectorname: ElasticSearch
   schemaname: 
   config:
     host: 
     port: 
     username: 
     password: 
     index_name: 
     page_size: 10000
   properties: 
   metadata: 
   select: 
IgnoreSourcetable: false
isDropTable: true
direct_target_import: false
direct_load_to_destination: true
isDeleteandUpdate: false
set_not_null_as_primary_key: false 
  • Save the pipeline.

image.png

  1. Join the Tables from Elasticsearch Indices using Transformation:

    • Now the data will be extracted and moved to transform layer in nested format.
    • As we show the data as nested format in transform layer, we cannot join the data in Data Hub. Unnest the data using below query to view the columns extracted,

Note: Change the schema(elasticextract) and Table name (departments).

Select * unnest("_source") from "elasticextract"."departments" 
  • Create a Transform table and write the join query using above unnest query to join the tables and remove duplicate columns.
SELECT  d.*,
   e.emp_id,
   e.emp_name
FROM (
   SELECT
unnest("_source") from "elasticextract"."departments"
) d
JOIN (
   SELECT 
unnest("_source")
   FROM elasticextract.employees
) e
ON d.department_id = e.department_id

image.png

  • Now Save and Transform. The data will be moved to the destination.
  • Creating a Pipeline in Bold Data Hub automatically creates a Data Source in Bold BI. The Bold BI Data Source is a live data source to the destination database used in Bold Data Hub.

image.png

  1. Now use the data source created in the dashboard.

image.png

Was this article useful?
Like
Dislike
Help us improve this page
Please provide feedback or comments
PG
Written by Pathivada Ganesh Pathivada Venki Naidu
Updated:
Comments (0)
Access denied
Access denied