Tps and tricks for working with AWS Athena

Write at 2020 May 27

Imagine you had a big dataset, example 10 billion of rows, raw csv dataset in range of 500GB. You can load them into RDS. But you have to pay the cost up-front, as in you pay even when you don’t do anything with RDS and it just sit idle, for a big RDS instance

If you had that use case, that’s a great fix for Athena. You load your data into s3 and they become magically queryable via SQL interface.

Partition data

Split

Parallel processing

Gzip

Upload to Shjhn

tttgzrtv nmjk783