Download the dataset from RedShift

The right way to download data from Redshift is to connect to the database using Psql and use the Unload command to dump the results of an SQL query in S3. The following command exports all the tweets to the s3://aml.packt/data/veggies/results/ location using an appropriate role:

unload ('select * from tweets') to 's3://aml.packt/data/veggies/results/' iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole';

We can then download the files and aggregate them:

# Download$ aws s3 cp s3://aml.packt/data/veggies/results/0000_part_00 data/$ aws s3 cp s3://aml.packt/data/veggies/results/0001_part_00 data/# Combine$ cp data/0000_part_00 data/veggie_tweets.tmp$ cat data/0001_part_00 >> data/veggie_tweets.tmp ...

Get Effective Amazon Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.