O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Solving 10 Hadoop'able Problems

Video Description

Need solutions to your big data problems? Here are 10 real-world projects demonstrating problems solved using Hadoop.

About This Video

  • Learn how to crack big data projects via the Hadoop Ecosystem in a nutshell.
  • Implement practical code to find a solution to your common business and technical problems.
  • Hands-on solutions to your perplexing, real-world big data problems

In Detail

The Apache Hadoop ecosystem is a popular and powerful tool to solve big data problems. With so many competing tools to process data, many users want to know which particular problems are well suited to Hadoop, and how to implement those solutions.

To know what types of problems are Hadoop-able it is good to start with a basic understanding of the core components of Hadoop. You will learn about the ecosystem designed to run on top of Hadoop as well as software that is deployed alongside it. These tools give us the building blocks to build data processing applications. This course covers the core parts of the Hadoop ecosystem, helping to give a broad understanding and get you up-and-running fast. Next, it describes a number of common problems as case-study projects Hadoop is able to solve. These sections are broken down into sections by different projects, each serving as a specific use case for solving big data problems.

By the end of this course, you will have been exposed to a wide variety of Hadoop software and examples of how it is used to solve common big data problems.

Table of Contents

  1. Chapter 1 : Core Components
    1. The Course Overview 00:02:53
    2. Hadoop Distributed File System (HDFS) 00:06:59
    3. Distributed Compute Capability YARN 00:04:47
  2. Chapter 2 : Downstream Ecosystem
    1. Apache Hive for ETL and SQL Like 00:07:23
    2. Message Queuing and Data Ingestion Kafka 00:03:51
    3. NoSQL Datastores – Hadoop HBase, Accumulo 00:05:32
    4. Machine Learning – Spark and Spark MLlib 00:06:41
    5. Stream Processing – Spark Streaming 00:04:42
  3. Chapter 3 : Financial, Trade, and Time Series Applications – Trade Surveillance
    1. Processing Payment Data from an Event Stream 00:04:50
    2. Advanced Aggregations Using Streaming API – PaymentAnalyzer 00:04:29
    3. Storing Time Series Data in HBase 00:06:58
  4. Chapter 4 : AdTech – Ad Targeting
    1. Detecting BOT Traffic Using Spark Streaming 00:06:08
    2. Make Web Log Data Queryable – Hive Sink 00:06:49
    3. Investigating Customers Data in Hive 00:04:19
  5. Chapter 5 : Business/Point of Sale – Transaction Analysis
    1. Trending Supply Chain – Finding Top Seller Item in a Streaming Way 00:08:02
    2. Enriching Top Sellers with Additional Information 00:05:17
  6. Chapter 6 : Customer Churn Analysis
    1. Analyzing Customer Churn (Quantitative) Using DataFrame Queries 00:05:36
    2. Analyzing Customer Churn (Amounts) Using DataFrame Queries 00:04:56
  7. Chapter 7 : Internet of Things
    1. Storing Low Granularity Structured Sensor Data in HBase 00:08:42
    2. Consuming Sensor Data Stored in HBase – Scan and Count 00:03:51
    3. Building Summaries on Data Streaming from Devices 00:06:35
  8. Chapter 8 : Scientific and High Performance Computing
    1. Introducing Spark GraphX – How to Represent a Graph? 00:02:13
    2. Perform Graph Operations Using GraphX 00:03:57
    3. Counting Degree of Vertices 00:03:20
    4. Neighborhood Aggregations – Collecting Neighbors 00:03:46
    5. Structural Operators – Connected Components 00:02:09
    6. Page Rank Using Spark GraphX 00:04:59
  9. Chapter 9 : Security Concerns Intrusion Detection – Threat Analysis
    1. Anomaly Detection 00:02:16
    2. Analyzing Web Logs for Suspicious Activity and Loading into Spark 00:02:12
    3. Implementing Clustering – Choosing Number of Clusters 00:04:00
    4. Detecting Anomalies in Network Traffic 00:04:12
  10. Chapter 10 : Text Analysis
    1. Analyzing Post for an Author 00:03:23
    2. Extracting Information from Unstructured Text 00:01:02
    3. Extracting Information Via Spark DataFrame 00:03:37
    4. Sentiment Analysis of Posts Using Logistic Regression 00:03:37
    5. Finding an Author of a Post 00:02:23
  11. Chapter 11 : Data Warehouse/Data Lake/ Data Sandbox
    1. Downloading and Setting Cloudera Sandbox 00:03:50
    2. Finding What Products Users Wants to Buy Using Cloudera Sandbox Toolkit 00:11:52
  12. Chapter 12 : Personalization
    1. Using Movies History to Suggest Interesting Content 00:02:34
    2. Testing and Experimenting with Recommendation Engine 00:08:00