4. Getting Data into Hadoop

You can have data without information, but you cannot have information without data.

Daniel Keys Moran

In This Chapter:

Images The data lake concept is presented as a new data processing paradigm.

Images Basic methods for importing CSV data into HDFS and Hive tables are presented.

Images Additional methods for using Spark to import data into Hive tables or directly for a Spark job are presented.

Apache Sqoop is introduced as a tool for ...

Get Practical Data Science with Hadoop® and Spark: Designing and Building Effective Analytics at Scale now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.