Imagine you had a big dataset, example 10 billion of rows, raw csv dataset in range of 500GB. You can load them into RDS. But you have to pay the cost up-front, as in you pay even when you don’t do anything with RDS and it just sit idle, for a big RDS instance
If you had that use case, that’s a great fix for Athena. You load your data into s3 and they become magically queryable via SQL interface.
Partition data
Split
Parallel processing
Gzip
Upload to Shjhn
tttgzrtv nmjk783