Amazon AWS S3

S3, short for Simple Storage Service, is Amazon's storage as a service offering. It provides reliable storage for data by providing redundancy. The consumer is charged for storage of data on S3 based on the amount of storage used. Any download of data from S3 is also charged, but data upload and transfer of data between AWS properties are free of charge. This makes it extremely attractive for the user to run EMR (Elastic Map Reduce) on AWS and have data stored on S3.

S3 can be used as the input and output data store for MapReduce jobs. The intermediate files can be stored on local disks or the HDFS of the EMR cluster. This also allows easy sharing of input and results among different people in the organization without fearing data ...

Get Mastering Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.