Connecting Amazon Redshift as destination using Bold Data Hub

Published:

Introduction

This guide will walk you through the steps to connect Amazon Redshift as an destination using Bold Data Hub.

Prerequisites

Credentials: Ensure that the Amazon Redshift credentials are correct and have the necessary permissions.

Steps

Step 1: Open Bold Data Hub

Launch the Bold Data Hub application.

Step 2: Set up the Amazon Redshift Credentials

Navigate to the Settings section in Bold Data Hub.
Click on the settings icon.

Step 3: Select Connection Type

Choose the connection type as New.

Step 4: Configure Amazon Redshift

Select the server type as Amazon Redshift.
Fill in your Amazon Redshift credentials as follows:
- Datastore Name: Enter a meaningful name; this is how the Amazon Redshift credentials will be stored in Bold Data Hub.
- Server: Enter the server name.
  Example: redshift-cluster-1706.cldsfy4.us-east-1.redshift.amazonaws.com
- Username: Enter your Amazon Redshift username.
- Password: Input your Amazon Redshift password.
- Database: Enter your Amazon Redshift database name.
- Staging: Optionally configure Redshift with Staging support for faster data access.

Connecting Redshift using S3 Bucket

Fill in your Amazon S3 credentials as follows:

Bucket Name: Enter the bucket name.
Key Name: Enter the key name (folder name).
Role: Specify the IAM role that grants Redshift permission to access the S3 bucket. The IAM role must have the necessary permissions to perform COPY operations, which typically includes:
- s3:GetObject to access the files in the specified S3 bucket.
- sts:AssumeRole to allow the Redshift cluster to assume the role.
Secret Access Key: Enter the secret access key for Amazon Redshift.
Access Key ID: Enter the access key ID for Amazon Redshift.
Region: Select the region of the S3 bucket from the list of available regions.

How to Assign the Role

If a role hasn’t been created yet for Redshift to access S3, follow these steps:

Create a new IAM role with the necessary S3 permissions (as described above).
Attach this role to your Redshift cluster through the Redshift Console under Cluster Properties -> Cluster Permissions -> Manage IAM roles.

Retrieve the IAM Role Name

To retrieve the IAM role name required for the COPY command, follow these steps:

Navigate to the IAM Console:
- Go to the AWS Management Console.
- In the search bar, type IAM and select IAM (Identity and Access Management) from the services list.
Find the Role:
- In the IAM dashboard, click on Roles in the left-hand menu.
- Look for the role that was created to grant Redshift access to the S3 bucket. This role should be associated with your Redshift cluster.
Copy the Role Name and ARN:
- In the IAM console, click on the role to open its details.
- You will find the Role name and the ARN (Amazon Resource Name) on the summary page of the role.
- The ARN format is:
  arn:aws:iam::account-id:role/role-name

Step 5: Save the Credentials

Click Save to store the credentials.

Step 6: Add Pipeline

Click on Add Pipeline, give the pipeline a meaningful name, and click the tick icon or press Enter.

Note: In Bold BI, the data source will be created under the given pipeline name.

Step 7: Add Template Details

Click the pipeline and then choose connector from which you want extract data.
Click the add template button and add the details in the template,

Step 8: Save Template Details

Click the Save button. Choose the data store name from the dropdown and click Yes. Once saved, validation will be completed, and the pipeline will start.

Step 9: Check Logs:

Navigate to the logs page to verify whether the data source has been created in Bold BI.

Note: When running the same pipeline again, a new property called isDropTable is added in the YAML template. By default, it is set to true.
- If isDropTable is set to true, the existing table will be dropped and recreated before the data is moved, ensuring that no duplicate rows are added.
- If isDropTable is set to false, the data will be moved again without deleting or modifying the existing data.