Skip to main content

Get full access to Big Data Glossary and 60K+ other titles, with a free 10-day trial of O'Reilly.

There are also live events, courses curated by job role, and more.

Start your free trial

Big Data Glossary

Big Data Glossary

by Pete Warden

Released September 2011

Publisher(s): O'Reilly Media, Inc.

ISBN: 9781449314590

Buy on Amazon Buy on ebooks.com

Start your free trial

Book description

To help you navigate the large number of new data tools available, this guide describes 60 of the most recent innovations, from NoSQL databases and MapReduce approaches to machine learning and visualization tools. Descriptions are based on first-hand experience with these tools in a production environment.

This handy glossary also includes a chapter of key terms that help define many of these tool categories:

NoSQL Databases—Document-oriented databases using a key/value interface rather than SQL
MapReduce—Tools that support distributed computing on large datasets
Storage—Technologies for storing data in a distributed way
Servers—Ways to rent computing power on remote machines
Processing—Tools for extracting valuable information from large datasets
Natural Language Processing—Methods for extracting information from human-created text
Machine Learning—Tools that automatically perform data analyses, based on results of a one-off analysis
Visualization—Applications that present meaningful data graphically
Acquisition—Techniques for cleaning up messy public data sources
Serialization—Methods to convert data structure or object state into a storable format

Publisher resources

View/Submit Errata

Table of contents

Preface
1. Terms
2. NoSQL Databases
1. MongoDB
2. CouchDB
3. Cassandra
4. Redis
5. BigTable
6. HBase
7. Hypertable
8. Voldemort
9. Riak
10. ZooKeeper
3. MapReduce
1. Hadoop
2. Hive
3. Pig
4. Cascading
5. Cascalog
6. mrjob
7. Caffeine
8. S4
9. MapR
10. Acunu
11. Flume
12. Kafka
13. Azkaban
14. Oozie
15. Greenplum
4. Storage
1. S3
2. Hadoop Distributed File System
5. Servers
6. Processing
1. R
2. Yahoo! Pipes
3. Mechanical Turk
4. Solr/Lucene
5. ElasticSearch
6. Datameer
7. BigSheets
8. Tinkerpop
7. NLP
8. Machine Learning
9. Visualization
1. Gephi
2. GraphViz
3. Processing
4. Protovis
5. Fusion Tables
6. Tableau
10. Acquisition
11. Serialization
1. JSON
2. BSON
3. Thrift
4. Avro
5. Protocol Buffers
About the Author
Copyright

Product information

Title: Big Data Glossary
Author(s): Pete Warden
Release date: September 2011
Publisher(s): O'Reilly Media, Inc.
ISBN: 9781449314590

You might also like

book

Fundamentals of Data Visualization

by Claus O. Wilke

Effective visualization is the best way to communicate information from the increasingly large and complex datasets …

book

Data Management at Scale

by Piethein Strengholt

As data management and integration continue to evolve rapidly, storing all your data in one place, …

book

Big Data Architect's Handbook

by Syed Muhammad Fahad Akhtar

A comprehensive end-to-end guide that gives hands-on practice in big data and Artificial IntelligenceAbout This Book …

book

The Enterprise Big Data Lake

by Alex Gorelik

The data lake is a daring new approach for harnessing the power of big data technology …

Don’t leave empty-handed

Get Mark Richards’s Software Architecture Patterns ebook to better understand how to design components—and how they should interact.

It’s yours, free.

Get it now

Cover of Software Architecture Patterns

Check it out now on O’Reilly

Dive in for free with a 10-day trial of the O’Reilly learning platform—then explore all the other resources our members count on to build skills and solve problems every day.

Start your free trial Become a member now