O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Analytic Data Storage in Hadoop

Video Description

In this Analytic Data Storage in Hadoop training course, expert author Ryan Blue will teach you about typical storage and ingest patterns in Hadoop. This course is designed for users that are already familiar with Hadoop.

You will start by learning how to create the dataset, load sample data, and query a dataset. From there, Ryan will teach you about partitioning, formats, and Avro. This video tutorial also covers parquet, bulk data drops, and database snapshots and mirroring. Finally, you will learn about event stream processing, including how to build a test pipeline and move to production.

Once you have completed this computer based training course, you will have gained a solid understanding of typical storage and ingest patterns in Hadoop. Working files are included, allowing you to follow along with the author throughout the lessons.

Table of Contents

  1. Introduction
    1. Introduction And What To Expect 00:03:27
    2. About The Author 00:04:43
    3. Introduction To The Movielens Dataset 00:01:41
    4. How To Access Your Working Files 00:01:15
  2. Getting Started With Hadoop
    1. Your First Hadoop Dataset - CSV To SQL Query 00:06:45
    2. Describing Your Data With A Schema 00:10:06
    3. Creating The Dataset 00:10:20
    4. Loading Sample Data 00:09:56
    5. Querying A Dataset 00:05:58
  3. Partitioning
    1. Introduction to Partitioning 00:07:25
    2. Testing Partition Strategies 00:17:13
    3. Partitioning Patterns 00:13:17
  4. Formats
    1. Introduction To File Formats 00:14:02
    2. Hadoop File Formats - Why Splitability Matters 00:05:48
    3. Hadoop File Formats Review 00:13:44
  5. Avro
    1. Avro 00:03:46
    2. Avro File Format 00:07:32
    3. Avro Shemas 00:10:01
    4. Avro Object Models 00:11:05
    5. Avro Tools 00:07:44
  6. Parquet
    1. Parquet 00:14:57
    2. Parquet File Formats 00:12:01
    3. Parquet Object Models 00:05:15
    4. Parquet Tools 00:10:24
  7. Bulk Data Drops
    1. Application Pattern Overview - Bulk Data Drops 00:08:34
    2. Apache Nifi - Part 1 00:07:35
    3. Apache Nifi - Part 2 00:06:39
    4. Using Apache Nifi 00:05:13
    5. Preparing The Datasets 00:11:24
    6. Building The Data Flow - Part 1 00:15:10
    7. Building The Data Flow - Part 2 00:14:07
  8. Database Snapshots And Mirroring
    1. Application Pattern Database Snapshots And Mirroring 00:06:14
    2. Introduction To Apache Sqoop 00:08:51
    3. Table Snapshots 00:10:57
    4. Incremental Mirroring - Part 1 00:08:11
    5. Incremental Mirroring - Part 2 00:10:55
    6. Moving To Production 00:04:18
  9. Event Stream Processing
    1. Application Pattern - Event Streams 00:07:54
    2. Introduction To Apache Flume 00:06:51
    3. Building A Pipeline - Part 1 00:09:18
    4. Building A Pipeline - Part 2 00:14:51
    5. Moving to Production 00:14:14
  10. Conclusion
    1. Wrap Up 00:00:51