Extracting Data from MongoDB Using Query in Bold ETL
This article offers guidance on querying the MongoDB database, converting the results into a pandas DataFrame, and then running the data through the Bold ETL pipeline. This can be achieved by running a Python script.
Requirements:
Ensure you have PyMongo installed on your system. PyMongo is a Python distribution containing tools for working with MongoDB, and it is required to run the Python script. It will be pre-installed in future versions of “Bold BI”.
Installation commands for different environments are provided below:
Environment | Commands |
---|---|
Windows | C:\BoldServices\Python39\Scripts\pip.exe install pymongo |
Linux | pip install pymongo |
Python Script for Querying MongoDB:
Below is a sample of Python code for establishing a connection and querying data from MongoDB.
import pandas as pd
import pymongo
clientcon = pymongo.MongoClient("mongodb://user:password@host:port")
# Access the database and collection.
databasename = clientcon["yourdatabasename"]
collectionname = databasename["yourcollectionname"]
# Specify the query
query = {"createdAt": {"$gte": ISODate("2020-03-01"), "$lt": ISODate("2021-03-31")}}
mydoc = collection.find(query) // replace your query here
# Execute the query and process the results
data = list(mydoc)
for doc in data:
doc['_id'] = str(doc['_id'])
# Convert the dictionaries list into a DataFrame
df = pd.DataFrame(data)
pipeline.run(df, table_name="yourtablename")
- mongodb://user:password@host:port - Update your connection string.
- yourdatabasename - Update your Database name.
- yourcollectionname - Update your Collection name.
- yourcolumn1- Update your column name.
- yourtablename - Replace yourtablename with the desired table name in your destination database.
- query = update your query here.
Please adhere to the steps provided to execute the Python script on the Bold ETL platform.
Python DataFrame into Bold BI