Articles in this section
Category / Section

Connect PDF Files using python connector in Bold Data Hub

Published:

Adding PDF File Type Support in Filesystem Connector

Overview

This article outlines the steps for connecting and utilizing PDF files within Bold Data Hub using a Python connector.

Steps to Connect PDF Files in Bold Data Hub

Data Extraction Workflow

To efficiently extract data from PDF files, the following inputs are necessary:

Required Inputs
  • Delimiter: A character or string used to divide the data into rows. For example, a colon (:) can be used to separate different pieces of information within the PDF.

  • Schema (Optional): A predefined structure that outlines how the data should be divided into separate columns. If no schema is provided, the data will be split into a single column.

Example

For instance, if invoice data is extracted using a colon as a delimiter, the data might appear as follows:

InvoiceID: 12345
Date: 2023-10-01
Amount: 250.00

In this example, the delimiter (:) separates the different fields, while the schema can define how these fields are organized into columns.

The following custom script has been used to transfer data from the PDF, modify the delimiter and the schema in the file below.

  1. Create a New Pipeline

    Begin by creating a new pipeline in the Data Hub interface.

    sshot-1.png

  2. Choose PythonScript as Connector

    Select PythonScript as your connector and click on the “Add Template” button.

    sshot-2.png

  3. Upload the YAML File

    Use the “Upload File” button to upload your Python file.

    sshot-3.png

  4. Select and Upload the File

    Click the “Choose File” button to select the file from your local system, then click the “Upload” button.

    sshot-4.png

  5. Copy the Filepath

    After uploading, use the “Copy” button to copy the filepath of the uploaded YAML file.

    sshot-5.png

  6. Save and Schedule the Project

    Save your project and set up a schedule for it.

    sshot-6.png

  7. Check the Logs

    Navigate to the logs tab to check for any updates or errors.

    sshot-7.png

  8. Use the Data Source in Bold BI

    The data source created using Data Hub can now be utilized in Bold BI.

    sshot-8.png

Was this article useful?
Like
Dislike
Help us improve this page
Please provide feedback or comments
SE
Written by Sangavi Eswaramoorthi
Updated
Comments (0)
Please  to leave a comment
Access denied
Access denied