You are previewing Securing Hadoop.
O'Reilly logo
Securing Hadoop

Book Description

Implement robust end-to-end security for your Hadoop ecosystem

  • Master the key concepts behind Hadoop security as well as how to secure a Hadoop-based Big Data ecosystem

  • Understand and deploy authentication, authorization, and data encryption in a Hadoop-based Big Data platform

  • Administer the auditing and security event monitoring system

In Detail

Security of Big Data is one of the biggest concerns for enterprises today. How do we protect the sensitive information in a Hadoop ecosystem? How can we integrate Hadoop security with existing enterprise security systems? What are the challenges in securing Hadoop and its ecosystem? These are the questions which need to be answered in order to ensure effective management of Big Data. Hadoop, along with Kerberos, provides security features which enable Big Data management and which keep data secure.

This book is a practitioner’s guide for securing a Hadoop-based Big Data platform. This book provides you with a step-by-step approach to implementing end-to-end security along with a solid foundation of knowledge of the Hadoop and Kerberos security models.

This practical, hands-on guide looks at the security challenges involved in securing sensitive data in a Hadoop-based Big Data platform and also covers the Security Reference Architecture for securing Big Data. It will take you through the internals of the Hadoop and Kerberos security models and will provide detailed implementation steps for securing Hadoop. You will also learn how the internals of the Hadoop security model are implemented, how to integrate Enterprise Security Systems with Hadoop security, and how you can manage and control user access to a Hadoop ecosystem seamlessly. You will also get acquainted with implementing audit logging and security incident monitoring within a Big Data platform.

Table of Contents

  1. Securing Hadoop
    1. Table of Contents
    2. Securing Hadoop
    3. Credits
    4. About the Author
    5. About the Reviewers
    6. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    7. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Errata
        2. Piracy
        3. Questions
    8. 1. Hadoop Security Overview
      1. Why do we need to secure Hadoop?
      2. Challenges for securing the Hadoop ecosystem
      3. Key security considerations
        1. Reference architecture for Big Data security
      4. Summary
    9. 2. Hadoop Security Design
      1. What is Kerberos?
        1. Key Kerberos terminologies
        2. How Kerberos works?
        3. Kerberos advantages
      2. The Hadoop default security model without Kerberos
      3. Hadoop Kerberos security implementation
        1. User-level access controls
        2. Service-level access controls
        3. User and service authentication
        4. Delegation Token
        5. Job Token
        6. Block Access Token
      4. Summary
    10. 3. Setting Up a Secured Hadoop Cluster
      1. Prerequisites
      2. Setting up Kerberos
        1. Installing the Key Distribution Center
          1. Configuring the Key Distribution Center
          2. Establishing the KDC database
          3. Setting up the administrator principal for KDC
          4. Starting the Kerberos daemons
          5. Setting up the first Kerberos administrator
          6. Adding the user or service principals
          7. Configuring LDAP as the Kerberos database
          8. Supporting AES-256 encryption for a Kerberos ticket
      3. Configuring Hadoop with Kerberos authentication
        1. Setting up the Kerberos client on all the Hadoop nodes
        2. Setting up Hadoop service principals
          1. Creating a keytab file for the Hadoop services
          2. Distributing the keytab file for all the slaves
          3. Setting up Hadoop configuration files
          4. HDFS-related configurations
          5. MRV1-related configurations
          6. MRV2-related configurations
          7. Setting up secured DataNode
          8. Setting up the TaskController class
      4. Configuring users for Hadoop
      5. Automation of a secured Hadoop deployment
      6. Summary
    11. 4. Securing the Hadoop Ecosystem
      1. Configuring Kerberos for Hadoop ecosystem components
        1. Securing Hive
          1. Securing Hive using Sentry
        2. Securing Oozie
        3. Securing Flume
          1. Securing Flume sources
          2. Securing Hadoop sink
          3. Securing a Flume channel
        4. Securing HBase
        5. Securing Sqoop
        6. Securing Pig
      2. Best practices for securing the Hadoop ecosystem components
      3. Summary
    12. 5. Integrating Hadoop with Enterprise Security Systems
      1. Integrating Enterprise Identity Management systems
        1. Configuring EIM integration with Hadoop
        2. Integrating Active-Directory-based EIM with the Hadoop ecosystem
      2. Accessing a secured Hadoop cluster from an enterprise network
        1. HttpFS
        2. HUE
        3. Knox Gateway Server
      3. Summary
    13. 6. Securing Sensitive Data in Hadoop
      1. Securing sensitive data in Hadoop
        1. Approach for securing insights in Hadoop
          1. Securing data in motion
          2. Securing data at rest
          3. Implementing data encryption in Hadoop
      2. Summary
    14. 7. Security Event and Audit Logging in Hadoop
      1. Security Incident and Event Monitoring in a Hadoop Cluster
        1. The Security Incident and Event Monitoring (SIEM) system
      2. Setting up audit logging in a secured Hadoop cluster
        1. Configuring Hadoop audit logs
      3. Summary
    15. A. Solutions Available for Securing Hadoop
      1. Hadoop distribution with enhanced security support
      2. Automation of a secured Hadoop cluster deployment
        1. Cloudera Manager
        2. Zettaset
      3. Different Hadoop data encryption options
        1. Dataguise for Hadoop
        2. Gazzang zNcrypt
        3. eCryptfs for Hadoop
      4. Securing the Hadoop ecosystem with Project Rhino
      5. Mapping of security technologies with the reference architecture
        1. Infrastructure security
        2. OS and filesystem security
        3. Application security
        4. Network perimeter security
        5. Data masking and encryption
        6. Authentication and authorization
        7. Audit logging, security policies, and procedures
        8. Security Incident and Event Monitoring
    16. Index