Glossary

Amazon Elastic MapReduce

Amazon’s EMR is a hosted Hadoop service on top of Amazon EC2 (Elastic Compute Cloud).

Avro

Avro is a new serialization format developed to address some of the common problems associated with evolving other serialization formats. Some of the benefits are: rich data structures, fast binary format, support for remote procedure calls, and built-in schema evolution.

Bash

The “Bourne-Again Shell” that is the default interactive command shell for Linux and Mac OS X systems.

S3 Bucket

The term for the top-level container you own and manage when using S3. A user may have many buckets, analogous to the root of a physical hard drive.

Command-Line Interface

The command-line interface (CLI) can run “scripts” of Hive statements or all the user to enter statements interactively.

Data Warehouse

A repository of structured data suitable for analysis for reports, trends, etc. Warehouses are batch mode or offline, as opposed to providing real-time responsiveness for online activity, like ecommerce.

Derby

A lightweight SQL database that can be embedded in Java applications. It runs in the same process and saves its data to local files. It is used as the default SQL data store for Hive’s metastore. See http://db.apache.org/derby/ for more information.

Dynamic Partitions

A HiveQL extension to SQL that allows you to insert query results into table partitions where you leave one or more partition column values unspecified and they are determined dynamically from the query results themselves. ...

Get Programming Hive now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.