You are previewing Instant Apache Hive Essentials How-to.
O'Reilly logo
Instant Apache Hive Essentials How-to

Book Description

Leverage your knowledge of SQL to easily write distributed data processing applications on Hadoop using Apache Hive

  • Learn something new in an Instant! A short, fast, focused guide delivering immediate results

  • Learn to use SQL to write Hadoop jobs

  • Add support for data to Hive in your own file formats

  • Understand how the Hive query processor works to optimize common queries

In Detail

Hadoop provides a robust framework for building distributed applications, but working directly with Hadoop requires writing a lot of code. Adding structure to data and using a higher-level language such as SQL makes working with Hadoop both easier and faster.

"Instant Apache Hive Essentials How-to" contains a series of practical recipes that introduce the power and flexibility of Hive. Starting with your first query, this book will provide step-by-step instructions and behind-the-scenes explanations for how to effectively write MapReduce jobs with SQL.

This book looks at how Hive transforms SQL statements into MapReduce jobs and demonstrates how you can extend Hive to support your own use cases. Its recipes will teach you how to leverage the scale of Hadoop while retaining the benefits of using a structured query language.You will learn how Hive translates a query into MapReduce jobs and explore how to structure your queries for better performance. You will extend Hive to understand your own file formats, simplifying the loading of data into the warehouse. You will finally add your own custom functions to Hive to support whatever use cases you may have.

"Instant Apache Hive Essentials How-to" is a quick introduction for adding Hive to your data toolkit. It is packed with high-level instructions for making Hive work as well as drawing connections to the underlying Hadoop framework to explain how things happen.

Table of Contents

  1. Instant Apache Hive Essentials How-to
    1. Instant Apache Hive Essentials How-to
    2. Credits
    3. About the Author
    4. About the Reviewer
    5. www.PacktPub.com
      1. Support files, eBooks, discount offers and more
        1. Why Subscribe?
        2. Free Access for Packt account holders
    6. Preface
      1. What this book covers
      2. What you need for this book
      3. Who this book is for
      4. Conventions
      5. Reader feedback
      6. Customer support
        1. Downloading the example code
        2. Errata
        3. Piracy
        4. Questions
    7. 1. Instant Apache Hive Essentials How-to
      1. Tables and queries (Simple)
        1. How to do it...
        2. How it works...
        3. There's more...
      2. Understanding complex data types (Simple)
        1. How to do it...
        2. How it works...
        3. There's more...
      3. Using Hive non-interactively (Simple)
        1. Getting ready
        2. How to do it...
        3. How it works...
      4. Join optimizations (Medium)
        1. How to do it...
        2. How it works...
          1. Map join
          2. Bucketed tables
        3. There's more...
          1. Multiple joins
          2. Skew joins
          3. Multiple selects
          4. List bucketing and skew joins
      5. Setting the file format (Simple)
        1. How to do it...
        2. How it works...
      6. Writing a custom SerDe (Intermediate)
        1. How to do it...
        2. How it works...
          1. Object inspectors
          2. Initialization
          3. Serialization
          4. Deserialization
      7. Using static partitions (Intermediate)
        1. Getting ready
        2. How to do it...
        3. How it works...
        4. There's more...
          1. Loading data into partitioned internal tables
          2. Writing data into specific partitions from queries
      8. Using dynamic partitions (Intermediate)
        1. How to do it...
        2. How it works...
      9. Using functions (Simple)
        1. How to do it...
        2. How it works...
      10. Adding custom logic with streaming (Intermediate)
        1. Getting ready
        2. How to do it...
        3. How it works...
      11. Simple user-defined functions (Intermediate)
        1. Getting ready
        2. How to do it...
        3. How it works...
      12. Advanced user-defined functions (Advanced)
        1. How to do it...
        2. How it works...
          1. Initialization
          2. Evaluation
          3. The display string
      13. User-defined table-generating functions (Advanced)
        1. How to do it...
        2. How it works...
        3. Initialization
          1. Processing inputs
          2. Final output
      14. User-defined aggregation functions (Advanced)
        1. Getting ready
        2. How to do it...
        3. How it works...
          1. The resolver
          2. The modes of evaluation
          3. Initializing the evaluator
          4. Aggregation buffers
          5. The logic of aggregation
        4. There's more...