O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Learning Path: Big Data Analytics

Video Description

Understanding Big Data Using Hadoop and Spark

In Detail

Massive amounts of data are being generated everyday, everywhere. As a result, a number of organizations are focusing on big data processing. In this course we’ll help you understand how Hadoop, as an ecosystem, helps us store, process, and analyze data. We will then smoothly move to developing large-scale distributed data processing applications using Apache Spark 2.

Prerequisites: Data scientists or big data architects interested in combining the data processing power of Hadoop and Apache Spark should be having prior knowledge of these technologies.

Resources: Code downloads and errata:

  • Learning Hadoop 2

  • Apache Spark 2 for Beginners

  • PATH PRODUCTS

    This path navigates across the following products (in sequential order):

  • Learning Hadoop 2 (1h 30m)

  • Apache Spark 2 for Beginners (5h 38m)

  • Table of Contents

    1. Chapter 1 : Learning Hadoop 2
      1. The Course Overview 00:01:52
      2. Overview of HDFS and YARN 00:07:25
      3. Overview of Sqoop and Flume 00:03:18
      4. Overview of MapReduce 00:03:39
      5. Overview of Pig 00:03:04
      6. Overview of Hive 00:06:34
      7. Downloading and Installing Hadoop 00:02:53
      8. Exploring Hue 00:05:24
      9. Manual Import 00:04:33
      10. Importing from Databases Using Sqoop 00:06:27
      11. Using Flume to Import Streaming Data 00:05:08
      12. Coding "Word Count" in MapReduce 00:05:55
      13. Coding "Word Count" in Pig 00:02:30
      14. Performing Common ETL Functions in Pig 00:08:48
      15. Using User-defined Functions in Pig 00:05:58
      16. Importing Data from HDFS into Hive 00:04:57
      17. Importing Data Directly from a Database 00:02:23
      18. Performing Basic Queries in Hive 00:06:59
      19. Putting It All Together 00:02:16
    2. Chapter 2 : Apache Spark 2 for Beginners
      1. The Course Overview 00:04:30
      2. An Overview of Apache Hadoop 00:05:50
      3. Understanding Apache Spark 00:05:14
      4. Installing Spark on Your Machines 00:13:49
      5. Functional Programming with Spark and Understanding Spark RDD 00:08:45
      6. Data Transformations and Actions with RDDs 00:05:22
      7. Monitoring with Spark 00:04:02
      8. The Basics of Programming with Spark 00:20:30
      9. Creating RDDs from Files and Understanding the Spark Library Stack 00:06:39
      10. Understanding the Structure of Data and the Need of Spark SQL 00:09:39
      11. Anatomy of Spark SQL 00:05:09
      12. DataFrame Programming 00:12:01
      13. Understanding Aggregations and Multi-Datasource Joining with SparkSQL 00:08:33
      14. Introducing Datasets and Understanding Data Catalogs 00:07:53
      15. The Need for Spark and the Basics of the R Language 00:08:09
      16. DataFrames in R and Spark 00:02:57
      17. Spark DataFrame Programming with R 00:04:43
      18. Understanding Aggregations and Multi- Datasource Joins in SparkR 00:04:12
      19. Charting and Plotting Libraries and Setting Up a Dataset 00:04:00
      20. Charts, Plots, and Histograms 00:05:36
      21. Bar Chart and Pie Chart 00:07:46
      22. Scatter Plot and Line Graph 00:04:53
      23. Data Stream Processing and Micro Batch Data Processing 00:08:36
      24. A Log Event Processor 00:16:22
      25. Windowed Data Processing and More Processing Options 00:07:27
      26. Kafka Stream Processing 00:10:44
      27. Spark Streaming Jobs in Production 00:09:09
      28. Understanding Machine Learning and the Need of Spark for it 00:06:22
      29. Wine Quality Prediction and Model Persistence 00:10:44
      30. Wine Classification 00:05:58
      31. Spam Filtering 00:07:08
      32. Feature Algorithms and Finding Synonyms 00:06:54
      33. Understanding Graphs with Their Usage 00:04:35
      34. The Spark GraphX Library 00:10:09
      35. Graph Processing and Graph Structure Processing 00:09:45
      36. Tennis Tournament Analysis 00:05:34
      37. Applying PageRank Algorithm 00:03:30
      38. Connected Component Algorithm 00:04:39
      39. Understanding GraphFrames and Its Queries 00:09:31
      40. Lambda Architecture 00:04:47
      41. Micro Blogging with Lambda Architecture 00:07:13
      42. Implementing Lambda Architecture and Working with Spark Applications 00:08:19
      43. Coding Style, Setting Up the Source Code, and Understanding Data Ingestion 00:09:09
      44. Generating Purposed Views and Queries 00:05:53
      45. Understanding Custom Data Processes 00:06:12