O'Reilly logo

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Building Hadoop Clusters

Video Description

Deploy multi-node Hadoop clusters to harness the Cloud for storage and large-scale data processing

About This Video

  • Familiarize yourself with Hadoop and its services, and how to configure them

  • Deploy compute instances and set up a three-node Hadoop cluster on Amazon

  • Set up a Linux installation optimized for Hadoop

  • In Detail

    Hadoop is an Apache top-level project that allows the distributed processing of large data sets across clusters of computers using simple programming models. It allows you to deliver a highly available service on top of a cluster of computers, each of which may be prone to failures. While Big Data and Hadoop have seen a massive surge in popularity over the last few years, many companies still struggle with trying to set up their own computing clusters.

    This video series will turn you from a faltering first-timer into a Hadoop pro through clear, concise descriptions that are easy to follow.

    We'll begin this course with an overview of Amazon's cloud service and its use. We'll then deploy Linux compute instances and you'll see how to connect your client machine to Linux hosts and configure your systems to run Hadoop. Finally, you'll install Hadoop, download data, and examine how to run a query.

    This video series will go beyond just Hadoop; it will cover everything you need to get your own clusters up and running. You will learn how to make network configuration changes as well as modify Linux services. After you've installed Hadoop, we'll then go over installing HUE—Hadoop's UI. Using HUE, you will learn how to download data to your Hadoop clusters, move it to HDFS, and finally query that data with Hive.

    Learn everything you need to deploy Hadoop clusters to the Cloud through these videos. You'll grasp all you need to know about handling large data sets over multiple nodes.

    Table of Contents

    1. Chapter 1 : Deploying Cloud Instances for Hadoop 2.0
      1. Introduction to the Cloud and Hadoop 00:04:44
      2. Deploying a Linux Amazon Machine Image 00:05:44
      3. Setting Up Amazon Instances 00:04:13
    2. Chapter 2 : Setting Up Network and Security Settings
      1. Network and Security Settings Overview 00:04:54
      2. Identifying and Allocating Security Groups 00:05:50
      3. Configuration of Private Keys in a Windows Environment 00:05:40
    3. Chapter 3 : Connecting to Cloud Instances
      1. Overview of the Connectivity Options for Windows to the Amazon Cloud 00:04:53
      2. Installing and Using Putty for Connectivity to Windows Clients 00:04:47
      3. Transferring Files to Linux Nodes with PSCP 00:04:18
    4. Chpater 4 : Setting Up Network Connectivity and Access for Hadoop Clusters
      1. Defining the Hadoop Cluster 00:06:13
      2. Setting Up Password-less SSH on the Head Node 00:08:19
      3. Gathering Network Details and Setting Up the HOSTS File 00:08:26
    5. Chapter 5 : Setting Up Configuration Settings across Hadoop Clusters
      1. Setting Up Linux Software Repositories 00:05:11
      2. Using the Parallel Shell Utility (pdsh) 00:07:27
      3. Prepping for Hadoop Installation 00:08:59
    6. Chapter 6 : Creating a Hadoop Cluster
      1. Building a Hadoop Cluster 00:06:54
      2. Installing Hadoop 2 – Part 1 00:05:28
      3. Installing Hadoop 2 – Part 2 00:07:09
    7. Chapter 7 : Loading and Navigating the Hadoop File System (HDFS)
      1. Understanding the Hadoop File System 00:06:22
      2. Loading and Navigating the Hadoop File System 00:07:37
      3. Ambari Server and Dashboard 00:07:01
    8. Chapter 8 : Hadoop Tools and Processing Files
      1. Hadoop Tools and Processing Files 00:10:16
      2. Installing HUE 00:07:37
      3. Using HUE 00:06:04