O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Working with Big Data LiveLessons (Video Training): Infrastructure, Algorithms, and Visualizations

Video Description

Working with Big Data: Infrastructure, Algorithms, and Visualizations LiveLessons presents a high level overview of big data and how to use key tools to solve your data challenges. This introduction to the three areas of big data includes:

  • Infrastructure - how to store and process big data
  • Algorithms - how to integrate algorithms into your big data stack and an introduction to classification
  • Visualizations - an introduction to creating visualizations in JavaScript using D3.js

The goal was not to be exhaustive, but rather, to provide a higher level view of how all the pieces of a big data architecture work together.

About the Author:

Paul Dix is the author of “Service Oriented Design with Ruby and Rails.” He is a frequent speaker at conferences and user groups including Web 2.0, RubyConf, RailsConf, The Gotham Ruby Conference, and Scotland on Rails. Paul is the founder and organizer of the NYC Machine Learning Meetup, which has over 2,900 members. In the past he has worked at startups and larger companies like Google, Microsoft, and McAfee. Currently, Paul is a co-founder at Errplane, a cloud based service for monitoring and alerting on application performance and metrics. He lives in New York City.

Table of Contents

  1. Introduction
    1. Introduction to Working with Big Data LiveLessons 00:03:17
    2. What is Big Data? 00:05:26
  2. Lesson 1: Unstructured Storage and Hadoop
    1. Learning objectives 00:00:49
    2. 1.1 Set up a basic Hadoop installation 00:16:14
    3. 1.2 Write data into the Hadoop file system 00:07:41
    4. 1.3 Write a Hadoop streaming job to process text files 00:17:55
  3. Lesson 2: Structured Storage and Cassandra
    1. Learning objectives 00:01:00
    2. 2.1 Set up a basic Cassandra installation 00:10:16
    3. 2.2 Create a Cassandra schema for storing data 00:17:04
    4. 2.3 Store and retrieve data from Cassandra using the Ruby library 00:07:39
    5. 2.4 Write data into Cassandra from a Hadoop streaming job 00:20:14
    6. 2.5 Use the Hadoop reduce phase to parallelize writes 00:15:09
  4. Lesson 3: Real Time Processing and Messaging
    1. Learning objectives 00:01:07
    2. 3.1 Set up the Kafka messaging system 00:08:02
    3. 3.2 Publish and consume data from Kafka in Ruby 00:11:05
    4. 3.3 Aggregate log files into Hadoop using Kafka and a Ruby consumer 00:13:55
    5. 3.4 Create horizontally scalable message consumers 00:11:35
    6. 3.5 Sample messages using Kafka’s partitioning 00:10:47
    7. 3.6 Create redundant message consumers for high availability 00:27:50
  5. Lesson 4: Working with Machine Learning Algorithms
    1. Learning objectives 00:00:57
    2. 4.1 Grasp the concepts of machine learning and implement the k-nearest neighbors algorithm 00:25:47
    3. 4.2 Understand the basics of distance metrics and implement euclidean distance and cosine similarity 00:26:44
    4. 4.3 Transform raw data into a matrix and convert a text document into the vector space model 00:22:42
    5. 4.4 Use k-nearest neighbors to make predictions 00:18:41
    6. 4.5 Improve execution time by reducing the search space 00:11:08
  6. Lesson 5: Experimentation and Running Algorithms in Production
    1. Learning objectives 00:00:58
    2. 5.1 Use cross validation to test a predictive model 00:17:37
    3. 5.2 Integrate a trained model into production 00:09:06
    4. 5.3 Version a model and track feedback data 00:03:35
    5. 5.4 Write a test harness to compare versioned models 00:09:22
    6. 5.5 Test new predicted models in production 00:02:41
  7. Lesson 6: Basic Visualizations
    1. Learning objectives 00:00:53
    2. 6.1 Prepare raw data for use in visualizations 00:13:10
    3. 6.2 Use core functions of the D3 JavaScript visualizaiton toolkit 00:13:17
    4. 6.3 Use D3 to create a barchart 00:07:56
    5. 6.4 Use D3 to create a time series 00:15:29