O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

From 0 to 1: Hive for Processing Big Data

Video Description

End-to-End Hive: HQL, Partitioning, Bucketing, UDFs, Windowing, Optimization, Map Joins, Indexes

About This Video

  • Analytical Processing: Joins, Subqueries, Views, Table Generating Functions, Explode, Lateral View, Windowing and more
  • Tuning Hive for better functionality: Partitioning, Bucketing, Join Optimizations, Map Side Joins, Indexes, Writing custom User Defined functions in Java. UDF, UDAF, GenericUDF, GenericUDTF, Custom functions in Python, Implementation of MapReduce for Select, Group by and Join

In Detail

Hive is like a new friend with an old face (SQL). This course is an end-to-end, practical guide to using Hive for Big Data processing. Let's parse that A new friend with an old face: Hive helps you leverage the power of Distributed computing and Hadoop for Analytical processing. Its interface is like an old friend: the very SQL like HiveQL. This course will fill in all the gaps between SQL and what you need to use Hive. End-to-End: The course is an end-to-end guide for using Hive: whether you are analyst who wants to process data or an Engineer who needs to build custom functionality or optimize performance - everything you'll need is right here. New to SQL? No need to look elsewhere. The course has a primer on all the basic SQL constructs, Practical: Everything is taught using real-life examples, working queries and code.

Table of Contents

  1. Chapter 1 : You, Us & This Course
    1. You, Us & This Course 00:02:03
  2. Chapter 2 : Introducing Hive
    1. Hive: An Open-Source Data Warehouse 00:12:59
    2. Hive and Hadoop 00:09:19
    3. Hive vs Traditional Relational DBMS 00:13:52
    4. HiveQL and SQL 00:07:21
  3. Chapter 3 : Hadoop and Hive Install
    1. Hadoop Install Modes 00:08:33
    2. Hadoop Install Step 1: Standalone Mode 00:15:47
    3. Hadoop Install Step 2: Pseudo-Distributed Mode 00:11:45
    4. Hive install 00:12:05
    5. Code-Along: Getting started 00:06:25
  4. Chapter 4 : Hadoop and HDFS Overview
    1. What is Hadoop? 00:07:25
    2. HDFS or the Hadoop Distributed File System 00:11:01
  5. Chapter 5 : Hive Basics
    1. Primitive Datatypes 00:17:08
    2. Collections_Arrays_Maps 00:09:29
    3. Structs and Unions 00:05:58
    4. Create Table 00:13:15
    5. Insert Into Table 00:12:05
    6. Insert into Table 2 00:06:51
    7. Alter Table 00:07:22
    8. HDFS 00:09:25
    9. HDFS CLI - Interacting with HDFS 00:10:59
    10. Code-Along: Create Table 00:09:54
    11. Code-Along: Hive CLI 00:03:07
  6. Chapter 6 : Built-in Functions
    1. Three types of Hive functions 00:06:46
    2. The Case-When statement, the Size function, the Cast function 00:10:10
    3. The Explode function 00:13:07
    4. Code-Along: Hive Built - in functions 00:04:28
  7. Chapter 7 : Sub-Queries
    1. Quirky Sub-Queries 00:07:14
    2. More on subqueries: Exists and In 00:15:14
    3. Inserting via subqueries 00:05:23
    4. Code-Along: Use Subqueries to work with Collection Datatypes 00:05:57
    5. Views 00:12:18
  8. Chapter 8 : Partitioning
    1. Indices 00:06:41
    2. Partitioning Introduced 00:06:37
    3. The Rationale for Partitioning 00:06:16
    4. How Tables are partitioned 00:09:53
    5. Using Partitioned Tables 00:05:27
    6. Dynamic Partitioning: Inserting data into partitioned tables 00:12:44
    7. Code-Along: Partitioning 00:04:04
  9. Chapter 9 : Bucketing
    1. Introducing Bucketing 00:11:57
    2. The Advantages of Bucketing 00:04:55
    3. How Tables are bucketed 00:12:37
    4. Using Bucketed Tables 00:07:22
    5. Sampling 00:11:13
  10. Chapter 10 : Windowing
    1. Windowing Introduced 00:12:59
    2. Windowing - A Simple Example: Cumulative Sum 00:09:39
    3. Windowing - A More Involved Example: Partitioning 00:11:55
    4. Windowing - Special Aggregation Functions 00:15:08
  11. Chapter 11 : Understanding MapReduce
    1. The basic philosophy underlying MapReduce 00:08:50
    2. MapReduce - Visualized and Explained 00:09:04
    3. MapReduce - Digging a little deeper at every step 00:10:21
  12. Chapter 12 : MapReduce logic for queries: Behind the scenes
    1. MapReduce Overview: Basic Select-From-Where 00:11:34
    2. MapReduce Overview: Group-By and Having 00:09:12
    3. MapReduce Overview: Joins 00:14:17
  13. Chapter 13 : Join Optimizations in Hive
    1. Improving Join performance with tables of different sizes 00:13:12
    2. The Where clause in Joins 00:04:53
    3. The Left Semi Join 00:12:11
    4. Map Side Joins: The Inner Join 00:09:42
    5. Map Side Joins: The Left, Right and Full Outer Joins 00:11:36
    6. Map Side Joins: The Bucketed Map Join and the Sorted Merge Join 00:07:52
  14. Chapter 14 : Custom Functions in Python
    1. Custom functions in Python 00:10:40
    2. Code-Along: Custom Function in Python 00:05:45
  15. Chapter 15 : Custom functions in Java
    1. Introducing UDFs - you're not limited by what Hive offers 00:04:38
    2. The Simple UDF: The standard function for primitive types 00:07:04
    3. The Simple UDF: Java implementation for replacetext() 00:08:35
    4. Generic UDFs, the Object Inspector and DeferredObjects 00:13:51
    5. The Generic UDF: Java implementation for containsstring() 00:09:11
    6. The UDAF: Custom aggregate functions can get pretty complex 00:14:09
    7. The UDAF: Java implementation for max() 00:09:21
    8. The UDAF: Java implementation for Standard Deviation 00:10:48
    9. The Generic UDTF: Custom table generating functions 00:07:38
    10. The Generic UDTF: Java implementation for namesplit() 00:10:21
  16. Chapter 16 : SQL Primer - Select Statements
    1. Select Statements 00:11:47
    2. Select Statements 2 00:14:12
    3. Operator Functions 00:06:55
  17. Chapter 17 : SQL Primer - Group By, Order by and Having
    1. Aggregation Operators Introduced 00:18:16
    2. The Group by Clause 00:17:20
    3. More Group by Examples 00:19:47
    4. Order by 00:16:15
    5. Having 00:19:52
  18. Chapter 18 : SQL Primer – Joins
    1. Introduction to SQL Joins 00:09:54
    2. Cross Joins and Cartesian Joins 00:17:03
    3. Inner Joins 00:19:53
    4. Left Outer Joins 00:15:31
    5. Right, Full Outer Joins, Natural Joins, Self Joins 00:16:08
  19. Chapter 19 : Appendix
    1. [For Linux/Mac OS Shell Newbies] Path and other Environment Variables 00:08:26
    2. Setting up a Virtual Linux Instance - For Windows Users 00:15:59