Articles in this section
Category / Section

How to access the Amazon S3 files from Athena data source

Published:

Accessing Amazon S3 Files in Amazon Athena

  • Amazon Athena is an interactive query service that makes it easy to analyze data directly from Amazon S3 using standard SQL.
  • Athena is serverless, so there is no infrastructure to set up or manage and you can start analyzing your data immediately.
  • You don’t even need to load your data into Athena or have complex ETL processes. Athena works directly with data stored in S3.

Access Permission for Amazon S3:

  • If the users need to create tables and work with underlying data, they must have access to the Amazon S3 location of the data.
  • This access is in addition to the allowed actions for Athena that you define in IAM identity-based polices.
  • You can grant access to Amazon S3 locations using identity-based policies, bucket resource policies, or both.

Creating Table in Amazon Athena:

  • Athena reads all data stored in s3://bucketname/folder/'.
  • Athena does not modify your data in Amazon S3.
  • When you create a database and table in Athena, you are simply describing the schema and the location where the table data are located in Amazon S3 for read-time querying.
  • Database and table, therefore, have a slightly different meaning than they do for traditional relational database systems because the data isn’t stored along with the schema definition for the database and table.
  • Amazon Athena provides the easiest way to run ad-hoc queries for data in S3 without the need to setup or manage any servers.

Create table query:

Syntax:

CREATE EXTERNAL TABLE test_table(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION s3://bucketname/folder/

Example for table creation for a csv file:

CREATE EXTERNAL TABLE flight_delays_csv (
year INT,
quarter INT,
month INT,
dayofmonth INT,
dayofweek INT,
flightdate STRING,
uniquecarrier STRING,
airlineid INT,
carrier STRING,
tailnum STRING,
flightnum STRING )
PARTITIONED BY (year STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION 's3://athena-examples-myregion/flight/csv/';

All tables are considered external:
If you use CREATE TABLE without the EXTERNAL keyword, Athena issues an error; only tables with the EXTERNAL keyword can be created. We recommend that you always use the EXTERNAL keyword. When you drop a table in Athena, only the table metadata is removed; the data remains in Amazon S3.
We also have the option to create the table using Athena’s table wizard (accessible from the Catalog Manager).

am-1.png

Then we can specify the format:
am-2.png

And the name and data type of each column:
am-3.png

We can also set up the partition model:
am-4.png

We can query the table in Amazon Athena using SELECT Statements.

Considerations:

1.) Files treated as hidden – Athena treats sources files that start with an underscore (_) or a dot (.) as hidden. To work around this limitation, rename the files.
2.) When creating schemas for data on S3, the positional order is important. For example, if you have a source file with ID, DATE and RESPONSE columns, then your schema should reflect that structure.

Additional Reference:

https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
https://docs.aws.amazon.com/athena/latest/ug/tables-location-format.html
https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/

Accessing_Amazon_S3_Files_in_Amazon_Athena.docx
Was this article useful?
Like
Dislike
Help us improve this page
Please provide feedback or comments
SM
Written by Siranjeevi Murugan
Updated
Comments (0)
Please  to leave a comment
Access denied
Access denied