How to access the Amazon S3 files from Athena data source

Published:

Accessing Amazon S3 Files in Amazon Athena

Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using standard SQL.
Athena is serverless, so there is no infrastructure to set up or manage and you can start analyzing your data immediately.
You don’t even need to load your data into Athena or have complex ETL processes. Athena works directly with data stored in S3.

Access Permission for Amazon S3:

If the users need to create tables and work with underlying data, they must have access to the Amazon S3 location of the data.
This access is in addition to the allowed actions for Athena that you define in IAM identity-based polices.
You can grant access to Amazon S3 locations using identity-based policies, bucket resource policies, or both.

Creating Table in Amazon Athena:

Athena reads all data stored in s3://bucketname/folder/'.
Athena does not modify your data in Amazon S3.
When you create a database and table in Athena, you are simply describing the schema and the location where the table data are located in Amazon S3 for read-time querying.
Database and table, therefore, have a slightly different meaning than they do for traditional relational database systems because the data isn’t stored along with the schema definition for the database and table.
Amazon Athena provides the easiest way to run ad-hoc queries for data in S3 without the need to setup or manage any servers.

Create table query:

Syntax:

CREATE EXTERNAL TABLE test_table(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION s3://bucketname/folder/

Example for table creation for a csv file:

CREATE EXTERNAL TABLE flight_delays_csv (
year INT,
quarter INT,
month INT,
dayofmonth INT,
dayofweek INT,
flightdate STRING,
uniquecarrier STRING,
airlineid INT,
carrier STRING,
tailnum STRING,
flightnum STRING )
PARTITIONED BY (year STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://athena-examples-myregion/flight/csv/';

All tables are considered external:
If you use CREATE TABLE without the EXTERNAL keyword, Athena issues an error; only tables with the EXTERNAL keyword can be created. We recommend that you always use the EXTERNAL keyword. When you drop a table in Athena, only the table metadata is removed; the data remains in Amazon S3.
We also have the option to create the table using Athena’s table wizard (accessible from the Catalog Manager).

Then we can specify the format:

And the name and data type of each column:

We can also set up the partition model:

We can query the table in Amazon Athena using SELECT Statements.

Considerations:

1.) Files treated as hidden – Athena treats sources files that start with an underscore (_) or a dot (.) as hidden. To work around this limitation, rename the files.
2.) When creating schemas for data on S3, the positional order is important. For example, if you have a source file with ID, DATE and RESPONSE columns, then your schema should reflect that structure.

Additional Reference:

https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
https://docs.aws.amazon.com/athena/latest/ug/tables-location-format.html
https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/

Accessing_Amazon_S3_Files_in_Amazon_Athena.docx

Was this article useful?

Dislike

Written by Siranjeevi Murugan

Updated:

Comments (0)

How to access the Amazon S3 files from Athena data source

Accessing Amazon S3 Files in Amazon Athena

Access Permission for Amazon S3:

Creating Table in Amazon Athena:

Create table query:

Access denied