Hadoop, 2nd Edition

Video description

The standard for large-scale data processing, Hadoop makes your data truly accessible. This course offers an in-depth tour of the Hadoop ecosystem, providing detailed instruction on setting up and running a Hadoop cluster, batch processing data with Pig, Hive's SQL dialect, MapReduce, and everything else you need to parse, access, and analyze your data.

Table of contents

  1. Introduction
    1. What Is Big Data?
    2. About The Author
    3. Historical Approaches
    4. Big data In The Modern World
    5. The Hadoop Approach
    6. Hadoop Hardware Requirements
    7. Hadoop Core Vs. Ecosystem
    8. Hadoopable Problems
    9. Hadoop Support Companies
  2. Hadoop Basics
    1. HDFS And MapReduce
    2. Hadoop Run Modes And Job Types
    3. Hadoop Software Requirements And Recommendations
    4. Hadoop in the Cloud - Amazon Web Services
    5. Lab - Installing Hadoop From CDH With Cloudera Manager - Part 1
    6. Lab - Installing Hadoop From CDH With Cloudera Manager - Part 2
    7. Lab - Installing Hadoop From CDH With Cloudera Manager - Part 3
    8. Lab - Installing Hadoop From CDH With Cloudera Manager - Part 4
    9. Introduction To Hive And Pig Interface
    10. Installing Cloudera Quickstart VM
  3. Hadoop Distributed File System (HDFS)
    1. HDFS Architecture
    2. HDFS File Write Walkthrough
    3. Secondary Name Node
    4. Lab - Using HDFS - Part 1
    5. Lab - Using HDFS - Part 2
    6. HA And Federation Basics
    7. HDFS Access Controls
  4. MapReduce
    1. MapReduce Explained
    2. MapReduce Architecture
    3. MapReduce Code Walkthrough - Part 1
    4. MapReduce Code Walkthrough - Part 2
    5. MapReduce Job Walkthrough
    6. Rack Awareness
    7. Advanced MapReduce - Partioners, Combiners, Comparators And More
    8. Partitioner Code Walkthrough
    9. Java Concerns
  5. Logging And Debugging
    1. Debugging Basics
    2. Benchmarking With Teragen And Terasort
  6. Hive, Pig, And Impala
    1. Comparing Hive, Pig And Impala
    2. Hive Basics
    3. Hive Patterns And Anti-Patterns
    4. Lab - Hive Basic Usage
    5. Pig Basics
    6. Pig Patterns And Anti-Patterns
    7. Lab - Pig Basic Usage
    8. Impala Fundamentals
  7. Data Import And Export
    1. Import And Export Options
    2. Flume Introduction
    3. Lab - Using Flume
    4. HDFS Interaction Tools
    5. Sqoop Introduction
    6. Lab - Using Sqoop
    7. Oozie Introduction
  8. Conclusion
    1. Wrap-Up
  9. Introduction
    1. Course Agenda And Instructor
  10. Core Hadoop Components
    1. Basic Overview Of Hadoop Core Components: HDFS
    2. Hadoop Core Components Overview
    3. What Is Map/Reduce?
  11. YARN: Components And Architecture
    1. Pre-YARN Architecture
    2. YARN Architecture And Daemons
  12. Scheduling, Running And Monitoring Applications In YARN
    1. Running Jobs In YARN
    2. YARN Parameters
    3. YARN Cluster Resource Allocation
    4. Failure Handling
    5. YARN Logs
    6. Hands On With YARN
  13. Conclusion
    1. Summary
  14. Introduction
    1. What Is Apache Hive And Who Uses It?
    2. About The Author
    3. What You Should Expect From This Video
  15. Connecting To Hive
    1. Hive CLI
    2. Beeline
    3. HUE
    4. JDBC
  16. Creating Tables And Loading Data
    1. Creating A Table
    2. Loading Data
    3. Hive Record Structure
    4. Hive Data Types
  17. Manipulating Tables With HiveQL
    1. Select Statement - Part 1
    2. Select Statement - Part 2
    3. Inserting Data Into A Hive Table Using HiveQL
    4. Creating A Table Using HiveQL
  18. Views And Partitions
    1. Creating And Using Views
    2. Creating And Using Partitions
  19. Functions And Using Transform
    1. Built In Functions
    2. User Defined Functions
    3. Transforming Data With Custom Scripts
  20. Hive Execution Engines
    1. Map Reduce
    2. Tez
    3. Spark
  21. Conclusion
    1. Wrap Up
    2. Overview of the Video Course
  22. A Distributed Computing Environment
    1. The Motivation for Hadoop
    2. A Brief History of Hadoop
    3. Understanding the Hadoop Architecture
    4. Setting Up A Pseudo-Distributed Environment
    5. The Distributed File System (HDFS)
    6. Distributed Computing with MapReduce
    7. Word Count - the "Hello, World" of Hadoop!
  23. Computing with Hadoop
    1. How a MapReduce Job Works
    2. Mappers and Reducers in Detail
    3. Working with Hadoop via the Command Line: Starting HDFS and Yarn
    4. Working with Hadoop via the Command Line: Loading Data into HDFS
    5. Working with Hadoop via the Command Line: Running a MapReduce Job
    6. How To Use Our Github Goodies
    7. Working in Python with Hadoop Streaming
    8. Common MapReduce Tasks
    9. Spark on Hadoop 2
    10. Creating a Spark Application with Python
  24. The Hadoop Ecosystem
    1. The Hadoop Ecosystem
    2. Data Warehousing with Hadoop
    3. Higher Order Data Flows
    4. Other Notable Projects
  25. Working with Data on Hive
    1. Introduction to Hive
    2. Interacting with Data via the Hive Console
    3. Creating Databases, Tables, and Schemas for Hive
    4. Loading Data into Hive from HDFS
    5. Querying Data and Performing Aggregations With Hive
  26. Towards Last Mile Computing
    1. Decomposing Large Data Sets to a Computational Space
    2. Linear Regressions
    3. Summarizing Documents with TF-IDF
    4. Classification of Text
    5. Parallel Canopy Clustering
    6. Computing Recommendations via Linear Log-Likelihoods
    7. Introduction to Clickstream Case Study
    8. Requirements
    9. Data Modeling
    10. Data Ingest
    11. Data Processing Engines - Part 1
    12. Data Processing Engines - Part 2
    13. Data Processing Patterns
    14. Orchestration
    15. Putting It All Together
    16. Demo
    17. Q

Product information

  • Title: Hadoop, 2nd Edition
  • Author(s): Ben Lorica
  • Release date: November 2015
  • Publisher(s): O'Reilly Media, Inc.
  • ISBN: 9781491952177