O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Path: Understanding Tool Integration for Big Data Architecture

Video Description

In this Learning Path, you’ll learn how to integrate Hadoop components to implement big data solutions for a variety of use cases, including clickstream analytics, time series problems, transferring data between Hadoop and relational databases, and applications in the finance sector.

Table of Contents

  1. Introduction to Clickstream Case Study 00:11:19
  2. Requirements 00:08:04
  3. Data Modeling 00:14:55
  4. Data Ingest 00:16:16
  5. Data Processing Engines - Part 1 00:16:23
  6. Data Processing Engines - Part 2 00:10:59
  7. Data Processing Patterns 00:09:32
  8. Orchestration 00:14:34
  9. Putting It All Together 00:03:08
  10. Demo 00:21:47
  11. Q&A 00:24:35
  12. Introduction
    1. Introduction to Time Series Problems 00:09:58
  13. Kafka
    1. Kafka Architecture and Deployment 00:11:33
    2. Kafka Usage 00:03:42
  14. Spark
    1. Introduction to Spark 00:15:43
    2. Spark Architecture 00:12:02
  15. Spark Streaming
    1. Spark Streaming: Windows & Slides 00:08:35
    2. Spark Streaming: Ingestion Sources & Using Kafka 00:08:32
    3. Sparks Streaming: Operations on the Stream 00:01:30
  16. Cassandra
    1. Introduction to Cassandra 00:08:56
    2. Cassandra Basic Architecture 00:11:59
    3. Replication, High Availability and Multi Datacenter 00:14:06
    4. Cassandra Weather Website Example 00:11:46
    5. Cassandra Query Language (CQL) 00:18:00
    6. Cassandra Partitions & Clustering 00:08:22
    7. Cassandra Read and Write Path 00:12:17
    8. Working with Cassandra 00:06:32
    9. Cassandra Drivers and Access Patterns 00:10:37
  17. Spark and Cassandra
    1. Spark and Cassandra Architecture 00:12:00
    2. Analyzing Cassandra Data & Spark SQL 00:12:12
    3. Spark and Cassandra DataStax Enterprise 00:04:31
  18. Real World Use Cases
    1. Real World Use Cases: Streaming Problems 00:17:11
    2. Real World Use Cases: In-place Analytic Problems 00:10:58
  19. Introduction
    1. Course Introduction 00:04:21
    2. About The Author 00:04:14
    3. What Is Big Data 00:11:07
    4. Historical Approaches 00:07:04
    5. Modern-Day Approach 00:12:42
    6. What Is Hadoop 00:11:05
    7. Hadoop Core Vs Ecosystem 00:05:03
    8. Hadoopable Problems 00:06:37
    9. How To Access Your Working Files 00:01:15
  20. Hadoop Basics
    1. HDFS And Yarn 00:08:14
    2. Hive And Pig Interface Introduction 00:05:59
    3. Introduction To Spark 00:04:37
    4. Hadoop In The Cloud (Amazon Web Services Intro) 00:08:49
    5. Installing Hadoop Into EMR Part - 1 00:15:31
    6. Installing Hadoop Into EMR Part - 2 00:15:34
    7. Installing Cloudera Quickstart VM 00:11:01
    8. Web GUIs 00:11:06
  21. Hadoop Distributed Filesystem (HDFS)
    1. HDFS Architecture 00:10:05
    2. HDFS File Write Walkthrough 00:17:57
    3. Secondary Name Node 00:06:38
    4. Basic HDFS Commands 00:09:23
    5. Using HDFS Commands Part - 1 00:07:34
    6. Using HDFS Commands Part - 2 00:09:27
    7. HA And Federation Basics 00:12:48
    8. HDFS Access Controls (Or Lack Thereof) 00:09:34
  22. Yarn
    1. Yarn Purpose 00:06:16
    2. Yarn Architecture 00:07:25
    3. Yarn With Spark 00:06:44
  23. MapReduce
    1. MapReduce Explained 00:11:52
    2. MapReduce Architecture 00:07:36
    3. MapReduce Code Walkthrough 00:11:59
    4. MapReduce Details Walkthrough 00:04:45
    5. Running MapReduce Job 00:08:59
  24. HDFS Data Import And Export
    1. Import/Export Options 00:11:12
    2. Flume Introduction 00:10:53
    3. Using Flume 00:13:43
    4. Sqoop Introduction 00:09:25
    5. Using Sqoop 00:17:01
    6. HDFS Interaction Tools 00:06:01
    7. Oozie Introduction 00:10:17
  25. Spark Basics
    1. Spark Value Propositions 00:08:30
    2. Spark Run Modes (Yarn, Standalone, Mesos) 00:07:33
    3. RDDs And Dataframes 00:17:24
    4. Hands On Spark Part - 1 00:08:12
    5. Hands On Spark Part - 2 00:10:38
    6. Running Spark Part - 1 00:09:58
    7. Running Spark Part - 2 00:13:55
    8. Optimizing And Debugging Spark 00:18:17
    9. Spark Libraries Overview 00:09:05
  26. Spark Built-In Libraries
    1. Spark SQL 00:09:01
    2. Spark SQL Usage 00:12:02
    3. MLlib Basics 00:15:30
    4. Common MLlib Usage Part - 1 00:15:02
    5. Common MLlib Usage Part - 2 00:08:23
    6. Spark Streaming 00:12:43
    7. GraphX 00:09:58
  27. Hive And Pig
    1. Hive Vs Pig 00:09:53
    2. Hive Basics 00:11:53
    3. Analysis With Hive 00:10:54
    4. Pig Basics 00:14:38
    5. ETL And Analytics With Pig 00:20:16
  28. Hadoop In The Cloud
    1. Hadoop/Cloud Use Cases 00:05:16
    2. Elastic MapReduce (EMR) 00:12:47
  29. Ecosystem
    1. HBase Basics 00:11:16
    2. Enterprise Integration 00:10:39
  30. Wrap Up
    1. Wrap Up 00:03:41
  31. Introduction to Sqoop
    1. Introduction 00:03:45
    2. About The Author 00:00:48
    3. Use Case #1: ELT 00:05:32
    4. Use Case #2: ETL From DWH 00:03:04
    5. Use Case #3: Data Analysis 00:03:38
    6. Use Case #4: Data Archival 00:02:02
    7. Use Case #5: Move Reports To Hadoop 00:05:26
    8. Use Case #6: Data Consolidation 00:02:54
  32. Importing Data To Hadoop From A Relational Database
    1. Command Line Basics: Importing Data Using Sqoop 00:09:13
    2. Importing Data With Column Filters, Row Filters, And Free Text Queries 00:06:12
    3. Parallel Imports 00:04:33
    4. Import Data Directory To HIVE Tables 00:07:25
    5. Incremental Data Import Overview 00:06:00
    6. Incremental Data Import And Using Sqoop Stored Jobs 00:11:05
  33. Sqoop Hands-On: Exporting Data From Hadoop To A Relational Database
    1. Exporting Data Back To A Relational Database Using Sqoop 00:05:42
    2. Exporting data from Hadoop back to RDBMS 00:01:57
  34. Advanced topics
    1. Introduction to Sqoop2 Server 00:04:18
  35. Course summary
    1. Wrap Up 00:04:07
    2. Continuous curation of event data for a customer event hub - Arvind Prabhakar (StreamSets) 00:40:27
    3. Big data governance - Steven Totman (Cloudera), Mark Donsky (Cloudera), Kristi Cunningham (Capital One), Ben Harden (CapTech Consulting) 00:42:12
    4. Preventing a big data security breach - Sam Heywood (Cloudera), Nick Curcuru (MasterCard Advisors), Ritu Kama (Intel) 00:39:23
    5. Hive + Amazon EMR + S3 = Elastic big data SQL analytics processing in the cloud, a real-world case study - Jaipaul Agonus (FINRA) 00:42:52