Video description
The standard for large-scale data processing, Hadoop makes your data truly accessible. This course offers an in-depth tour of the Hadoop ecosystem, providing detailed instruction on setting up and running a Hadoop cluster, batch processing data with Pig, Hive's SQL dialect, MapReduce, and everything else you need to parse, access, and analyze your data.
Table of contents
- Introduction
-
Hadoop Basics
- HDFS And MapReduce
- Hadoop Run Modes And Job Types
- Hadoop Software Requirements And Recommendations
- Hadoop in the Cloud - Amazon Web Services
- Lab - Installing Hadoop From CDH With Cloudera Manager - Part 1
- Lab - Installing Hadoop From CDH With Cloudera Manager - Part 2
- Lab - Installing Hadoop From CDH With Cloudera Manager - Part 3
- Lab - Installing Hadoop From CDH With Cloudera Manager - Part 4
- Introduction To Hive And Pig Interface
- Installing Cloudera Quickstart VM
- Hadoop Distributed File System (HDFS)
- MapReduce
- Logging And Debugging
- Hive, Pig, And Impala
- Data Import And Export
- Conclusion
- Introduction
- Core Hadoop Components
- YARN: Components And Architecture
- Scheduling, Running And Monitoring Applications In YARN
- Conclusion
- Introduction
- Connecting To Hive
- Creating Tables And Loading Data
- Manipulating Tables With HiveQL
- Views And Partitions
- Functions And Using Transform
- Hive Execution Engines
- Conclusion
- A Distributed Computing Environment
-
Computing with Hadoop
- How a MapReduce Job Works
- Mappers and Reducers in Detail
- Working with Hadoop via the Command Line: Starting HDFS and Yarn
- Working with Hadoop via the Command Line: Loading Data into HDFS
- Working with Hadoop via the Command Line: Running a MapReduce Job
- How To Use Our Github Goodies
- Working in Python with Hadoop Streaming
- Common MapReduce Tasks
- Spark on Hadoop 2
- Creating a Spark Application with Python
- The Hadoop Ecosystem
- Working with Data on Hive
-
Towards Last Mile Computing
- Decomposing Large Data Sets to a Computational Space
- Linear Regressions
- Summarizing Documents with TF-IDF
- Classification of Text
- Parallel Canopy Clustering
- Computing Recommendations via Linear Log-Likelihoods
- Introduction to Clickstream Case Study
- Requirements
- Data Modeling
- Data Ingest
- Data Processing Engines - Part 1
- Data Processing Engines - Part 2
- Data Processing Patterns
- Orchestration
- Putting It All Together
- Demo
- Q
Product information
- Title: Hadoop, 2nd Edition
- Author(s):
- Release date: November 2015
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781491952177
You might also like
book
Hadoop with Python
Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages …
video
Introduction to the Hadoop Technology Stack
In this Introduction to the Hadoop Technology Stack training course, expert author Justin Watkins will teach …
book
Apache Hadoop 3 Quick Start Guide
A fast paced guide that will help you learn about Apache Hadoop 3 and its ecosystem …
video
Learning Apache Hadoop
In this Introduction to Hadoop training course, expert author Rich Morrow will teach you the tools …