O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Taming Big Data with MapReduce and Hadoop - Hands On!

Video Description

Master the art of processing Big Data using Hadoop and MapReduce with the help of real-world examples

About This Video

  • Understand how MapReduce can be used to process and analyze large data sets

  • Frame your complex data analysis problems as multi-stage MapReduce jobs

  • Over 10 real-world examples to help you learn the concepts of Hadoop and MapReduce for Big Data processing

  • In Detail

    Big Data processing is creating a lot of buzz in the market lately, with organizations having to deal with large amounts of data on a daily basis. Processing such data and extracting actionable insights from it is a major challenge; that's where Hadoop and MapReduce comes to the rescue. This course will teach you how to use MapReduce for Big Data processing – with lots of practical examples and use-cases. You will start with understanding the Hadoop ecosystem and the basics of MapReduce. You will proceed to see how MapReduce can be used to process different types of data – whether it is analyzing movie ratings or your social network data. You will also learn how to run MapReduce jobs on Hadoop clusters using Amazon Elastic MapReduce. The course wraps up with an overview of other Hadoop-based technologies such as Hive, Pig, and the in-demand Apache Spark.

    Table of Contents

    1. Chapter 1 : Introduction, and Getting Started
      1. Introduction 00:03:22
      2. Getting Started – Run your First MapReduce Program! 00:07:19
    2. Chapter 2 : Understanding MapReduce
      1. MapReduce Basic Concepts 00:13:26
      2. A Walkthrough of Rating Histogram Code 00:10:39
      3. Understanding How MapReduce Scales / Distributed Computing 00:03:00
      4. Average Friends by Age Example – Part 1 00:03:05
      5. Average Friends by Age Example – Part 2 00:07:14
      6. Minimum Temperature by Location Example 00:09:40
      7. Maximum Temperature by Location Example 00:03:23
      8. Word Frequency in a Book Example 00:05:26
      9. Making the Word Frequency Mapper Better with Regular Expressions 00:03:16
      10. Sorting the Word Frequency Results Using Multi-Stage MapReduce Jobs 00:08:18
      11. Activity: Design a Mapper and Reducer for Total Spent by Customer 00:02:55
      12. Activity: Write Code for Total Spent by Customer 00:03:57
      13. Compare Your Code to Mine – Sort Results by Amount Spent 00:05:39
      14. Compare Your Code to Mine for Sorted Results 00:03:49
      15. Combiners 00:07:27
    3. Chapter 3 : Advanced MapReduce Examples
      1. Example – Most Popular Movie 00:07:23
      2. Including Ancillary Lookup Data in the Example 00:08:01
      3. Example – Most Popular Superhero Part 1 00:04:22
      4. Example – Most Popular Superhero Part 2 00:06:31
      5. Example: Degrees of Separation – Concepts 00:12:28
      6. Degrees of Separation – Preprocessing the Data 00:05:15
      7. Degrees of Separation – Code Walkthrough 00:06:34
      8. Degrees of Separation – Running and Analyzing the Results 00:05:41
      9. Example – Similar Movies Based on Ratings: Concepts 00:07:25
      10. Similar Movies – Code Walkthrough 00:07:17
      11. Similar Movies – Running and Analyzing the Results 00:06:37
      12. Learning Activity – Improving Our Movie Similarities MapReduce Job 00:03:58
    4. Chapter 4 : Using Hadoop and Elastic MapReduce
      1. Fundamental Concepts of Hadoop 00:06:00
      2. The Hadoop Distributed File System (HDFS) 00:03:10
      3. Apache YARN 00:04:20
      4. Hadoop Streaming – How Hadoop Runs Your Python Code 00:03:37
      5. Setting Up Your Amazon Elastic MapReduce Account 00:06:49
      6. Linking Your EMR Account with MRJob 00:03:40
      7. Exercise – Run Movie Recommendations on Elastic MapReduce 00:04:35
      8. Analyze the Results of Your EMR Job 00:04:59
    5. Chapter 5 : Advanced Hadoop and EMR
      1. Distributed Computing Fundamentals 00:04:33
      2. Activity – Running Movie Similarities on Four Machines 00:04:28
      3. Analyzing the Results of the Four-Machine Job 00:05:44
      4. Troubleshooting Hadoop Jobs with EMR and MRJob – Part 1 00:04:01
      5. Troubleshooting Hadoop Jobs – Part 2 00:10:28
      6. Analyzing One Million Movie Ratings across 16 Machines – Part 1 00:06:09
      7. Analyzing One Million Movie Ratings across 16 Machines – Part 2 00:08:03
    6. Chapter 6 : Other Hadoop Technologies
      1. Introducing Apache Hive 00:06:17
      2. Introducing Apache Pig 00:03:26
      3. Apache Spark – Concepts 00:09:37
      4. Spark Example – Part 1 00:11:15
      5. Spark Example – Part 2 00:03:22
      6. Congratualtion 00:00:41