INTRODUCTION

Hadoop is an open source project available under the Apache License 2.0. It has the ability to manage and store very large data sets across a distributed cluster of servers. One of the most ­beneficial features is its fault tolerance, which enables big data applications to continue to operate properly in the event of a failure. Another benefit of using Hadoop is its scalability. This programming logic has the potential to expand from a single server to numerous servers, each with the ability to have local computation and storage options.

WHO IS THIS BOOK FOR?

This book is for anyone using Hadoop to perform a job that is data related, or if you have an interest in redefining how you can obtain meaningful information about any of your data stores. This includes big data solution architects, Linux system and big data engineers, big data platform engineers, Java programmers, and database administrators.

If you have an interest in learning more about Hadoop and how to extract specific elements for ­further analysis or review, then this book is for you.

WHAT YOU NEED TO USE THIS BOOK

You should have development experience and understand the basics of Hadoop, and should now be interested in employing it in real-world settings.

The source code for the samples is available for download at www.wrox.com/go/professionalhadoop or https://github.com/backstopmedia/hadoopbook.

HOW THIS BOOK IS STRUCTURED

This book was written in eight chapters as follows:

Get Professional Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.