O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Advanced Analytics and Real-Time Data Processing in Apache Spark

Video Description

Implement high velocity streaming for real-time data processing along with machine learning, graph analysis operations using Spark MLlib, GraphX, SparkR on Apache Spark and explore some Analytical use-cases on Spark.

About This Video

  • Leverage the power of Apache Spark to perform efficient data processing and analytics on your data in real-time
  • Process and analyze streams of data with ease and perform machine learning efficiently
  • A comprehensive tutorial to help you get the most out of the trending Big Data framework for all your data processing needs

In Detail

This comprehensive tutorial will acquaint you with all the aspects of real-time analytics with Apache Spark, one of the trending Big Data processing frameworks on the market today. It will show you how to leverage the features of various components of the Spark framework to efficiently process, analyze, and visualize your data.

You will learn how to implement the high velocity streaming operation for data processing in order to perform efficient analytics on your real-time data. You’ll analyze data using machine learning techniques and graphs. You’ll learn about Spark Streaming and create real-world streaming processing that address all the problems that need to be solved. You’ll solve problems using Machine Learning techniques and find out about all the tools available in the MLlibtoolkit. You’ll find out how to leverage Graphs to solve real-world problems.

At the end of this video, you’ll also see some useful Machine Learning algorithms with the help of Spark MLlib and will integrate Spark with R. We’ll also make sure you’re confident and prepared for graph processing, as you’ll learn more about the GraphX API. By the end, you’ll be well-versed in the aspects of real-time analytics and implement them with Apache Spark.

Table of Contents

  1. Chapter 1 : Spark Streaming
    1. The Course Overview 00:02:58
    2. Introducing Spark Streaming 00:04:18
    3. Streaming Context 00:03:39
    4. Processing Streaming Data 00:02:40
    5. Use Cases 00:03:13
    6. Spark Streaming Word Count Hands-On 00:05:45
    7. Spark Streaming – Understanding Master URL 00:05:26
    8. Integrating Spark Streaming with Apache Kafka 00:05:26
    9. mapWithState Operation 00:06:56
    10. Transform and Window Operation 00:02:52
    11. Join and Output Operations 00:02:41
    12. Output Operations –Saving Results to Kafka Sink 00:02:56
  2. Chapter 2 : Advance Streaming and Use Cases
    1. Handling Time in High Velocity Streams 00:04:59
    2. Connecting External Systems That Works in At Least Once Guarantee – Deduplicaion 00:05:52
    3. Building Streaming Application –Handling Events That Are Not in Order 00:06:02
    4. Filtering Bots from Stream of Page View Events 00:06:54
  3. Chapter 3 : Spark MLlib and ML Pipelines
    1. Introducing Machine Learning with Spark 00:05:54
    2. Feature Extraction and Transformation 00:01:13
    3. Transforming Text into Vector of Numbers – ML Bag-of-Words Technique 00:04:44
    4. Logistic Regression 00:06:53
    5. Model Evaluation 00:02:43
    6. Clustering 00:02:42
    7. Gaussian Mixture Model 00:05:11
    8. Principal Component Analysis and Distributing the Singular Value Decomposition (SVD) 00:03:16
    9. Collaborative Filtering – Building Recommendation Engine 00:07:33
  4. Chapter 4 : Spark GraphX
    1. Introducing Spark GraphX–How to Represent a Graph? 00:03:04
    2. Limitations of Graph-Parallel System – Why Spark GraphX? 00:02:54
    3. Importing GraphX 00:01:46
    4. Create a Graph Using GraphX and Property Graph 00:05:03
    5. List of Operators 00:03:48
    6. Perform Graph Operations Using GraphX 00:04:08
    7. Triplet View 00:03:28
  5. Chapter 5 : Performing Spark GraphX Operations
    1. Perform Subgraph Operations 00:04:28
    2. Neighbourhood Aggregations – Collecting Neighbours 00:03:42
    3. Counting Degree of Vertex 00:04:21
    4. Caching and Uncaching 00:03:57
    5. GraphBuilder 00:02:53
    6. Vertex and Edge RDD 00:03:32
    7. Structural Operators – Connected Components 00:03:04
  6. Chapter 6 : SparkR
    1. Introduction to SparkR and How It’s Used? 00:04:14
    2. Setting Up from RStudio 00:01:57
    3. Creating Spark DataFrames from Data Sources 00:03:32
    4. SparkDataFrames Operations – Grouping, Aggregation 00:02:46
    5. Run a Given Function on a Large Dataset Using dapply or dapplyCollect 00:04:14
    6. Running Large Dataset by Input Column(s) and Using gapply or gapplyCollect 00:04:06
    7. Run Local R Functions Distributed Using spark.lapply 00:02:05
    8. Running SQL Queries from SparkR 00:03:02
  7. Chapter 7 : Analytical Use Cases
    1. PageRank Using Spark GraphX 00:06:54
    2. Sending Real-Time NotificationWhen User Want to Buy a Product on the E-Commerce Site 00:08:46