1. Introduction to Hadoop and Its Environment

Welcome to the fascinating subject of managing Apache Hadoop! Hadoop is the leading platform for processing massive sets of data, usually referred to as big data. Hadoop, an open-source project, was introduced roughly around 2005, and over the past few years, Hadoop has become the de facto standard for processing big amounts of data using parallel processing algorithms and simple data processing models that underlie a highly efficient and reliable computing architecture. Hadoop is exciting and powerful, and it’s a great time, indeed, to be a Hadoop administrator!

Hadoop has been clearly designed with the challenges of big data in mind. Companies desperately want to make sense out of the overwhelmingly ...

Get Expert Hadoop® Administration now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.