Chapter 1. Getting Started with Hadoop 2.X

This chapter covers the following topics:

  • Installing a single-node Hadoop cluster
  • Installing a multi-node Hadoop cluster
  • Adding new nodes to existing Hadoop clusters
  • Executing the balancer command for uniform data distribution
  • Entering and exiting from the safe mode in a Hadoop cluster
  • Decommissioning DataNodes
  • Performing benchmarking on a Hadoop cluster

Introduction

Hadoop has been the primary platform for many people who deal with big data problems. It is the heart of big data. Hadoop was developed way back between 2003 and 2004 when Google published research papers on Google File System (GFS) and Map Reduce. Hadoop was structured around the crux of these research papers, and thus derived its shape. With the ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.