You are previewing MongoDB: The Definitive Guide.

MongoDB: The Definitive Guide

Cover of MongoDB: The Definitive Guide by Michael Dirolf... Published by O'Reilly Media, Inc.
  1. MongoDB: The Definitive Guide
  2. Foreword
  3. Preface
    1. How This Book Is Organized
      1. Getting Up to Speed with MongoDB
      2. Developing with MongoDB
      3. Advanced Usage
      4. Administration
      5. Developing Applications with MongoDB
      6. Appendixes
    2. Conventions Used in This Book
    3. Using Code Examples
    4. Safari® Books Online
    5. How to Contact Us
    6. Acknowledgments
      1. Acknowledgments from Kristina
      2. Acknowledgments from Michael
  4. 1. Introduction
    1. A Rich Data Model
    2. Easy Scaling
    3. Tons of Features…
    4. …Without Sacrificing Speed
    5. Simple Administration
    6. But Wait, That’s Not All…
  5. 2. Getting Started
    1. Documents
    2. Collections
      1. Schema-Free
      2. Naming
    3. Databases
    4. Getting and Starting MongoDB
    5. MongoDB Shell
      1. Running the Shell
      2. A MongoDB Client
      3. Basic Operations with the Shell
      4. Tips for Using the Shell
    6. Data Types
      1. Basic Data Types
      2. Numbers
      3. Dates
      4. Arrays
      5. Embedded Documents
      6. _id and ObjectIds
  6. 3. Creating, Updating, and Deleting Documents
    1. Inserting and Saving Documents
      1. Batch Insert
      2. Inserts: Internals and Implications
    2. Removing Documents
      1. Remove Speed
    3. Updating Documents
      1. Document Replacement
      2. Using Modifiers
      3. Upserts
      4. Updating Multiple Documents
      5. Returning Updated Documents
    4. The Fastest Write This Side of Mississippi
      1. Safe Operations
      2. Catching “Normal” Errors
    5. Requests and Connections
  7. 4. Querying
    1. Introduction to find
      1. Specifying Which Keys to Return
      2. Limitations
    2. Query Criteria
      1. Query Conditionals
      2. OR Queries
      3. $not
      4. Rules for Conditionals
    3. Type-Specific Queries
      1. null
      2. Regular Expressions
      3. Querying Arrays
      4. Querying on Embedded Documents
    4. $where Queries
    5. Cursors
      1. Limits, Skips, and Sorts
      2. Avoiding Large Skips
      3. Advanced Query Options
      4. Getting Consistent Results
    6. Cursor Internals
  8. 5. Indexing
    1. Introduction to Indexing
      1. Scaling Indexes
      2. Indexing Keys in Embedded Documents
      3. Indexing for Sorts
      4. Uniquely Identifying Indexes
    2. Unique Indexes
      1. Dropping Duplicates
      2. Compound Unique Indexes
    3. Using explain and hint
    4. Index Administration
      1. Changing Indexes
    5. Geospatial Indexing
      1. Compound Geospatial Indexes
      2. The Earth Is Not a 2D Plane
  9. 6. Aggregation
    1. count
    2. distinct
    3. group
      1. Using a Finalizer
      2. Using a Function as a Key
    4. MapReduce
      1. Example 1: Finding All Keys in a Collection
      2. Example 2: Categorizing Web Pages
      3. MongoDB and MapReduce
  10. 7. Advanced Topics
    1. Database Commands
      1. How Commands Work
      2. Command Reference
    2. Capped Collections
      1. Properties and Use Cases
      2. Creating Capped Collections
      3. Sorting Au Naturel
      4. Tailable Cursors
    3. GridFS: Storing Files
      1. Getting Started with GridFS: mongofiles
      2. Working with GridFS from the MongoDB Drivers
      3. Under the Hood
    4. Server-Side Scripting
      1. db.eval
      2. Stored JavaScript
      3. Security
    5. Database References
      1. What Is a DBRef?
      2. Example Schema
      3. Driver Support for DBRefs
      4. When Should DBRefs Be Used?
  11. 8. Administration
    1. Starting and Stopping MongoDB
      1. Starting from the Command Line
      2. File-Based Configuration
      3. Stopping MongoDB
    2. Monitoring
      1. Using the Admin Interface
      2. serverStatus
      3. mongostat
      4. Third-Party Plug-Ins
    3. Security and Authentication
      1. Authentication Basics
      2. How Authentication Works
      3. Other Security Considerations
    4. Backup and Repair
      1. Data File Backup
      2. mongodump and mongorestore
      3. fsync and Lock
      4. Slave Backups
      5. Repair
  12. 9. Replication
    1. Master-Slave Replication
      1. Options
      2. Adding and Removing Sources
    2. Replica Sets
      1. Initializing a Set
      2. Nodes in a Replica Set
      3. Failover and Primary Election
    3. Performing Operations on a Slave
      1. Read Scaling
      2. Using Slaves for Data Processing
    4. How It Works
      1. The Oplog
      2. Syncing
      3. Replication State and the Local Database
      4. Blocking for Replication
    5. Administration
      1. Diagnostics
      2. Changing the Oplog Size
      3. Replication with Authentication
  13. 10. Sharding
    1. Introduction to Sharding
    2. Autosharding in MongoDB
      1. When to Shard
    3. The Key to Sharding: Shard Keys
      1. Sharding an Existing Collection
      2. Incrementing Shard Keys Versus Random Shard Keys
      3. How Shard Keys Affect Operations
    4. Setting Up Sharding
      1. Starting the Servers
      2. Sharding Data
    5. Production Configuration
      1. A Robust Config
      2. Many mongos
      3. A Sturdy Shard
      4. Physical Servers
    6. Sharding Administration
      1. config Collections
      2. Sharding Commands
  14. 11. Example Applications
    1. Chemical Search Engine: Java
      1. Installing the Java Driver
      2. Using the Java Driver
      3. Schema Design
      4. Writing This in Java
      5. Issues
    2. News Aggregator: PHP
      1. Installing the PHP Driver
      2. Using the PHP Driver
      3. Designing the News Aggregator
      4. Trees of Comments
      5. Voting
    3. Custom Submission Forms: Ruby
      1. Installing the Ruby Driver
      2. Using the Ruby Driver
      3. Custom Form Submission
      4. Ruby Object Mappers and Using MongoDB with Rails
    4. Real-Time Analytics: Python
      1. Installing PyMongo
      2. Using PyMongo
      3. MongoDB for Real-Time Analytics
      4. Schema
      5. Handling a Request
      6. Using Analytics Data
      7. Other Considerations
  15. A. Installing MongoDB
    1. Choosing a Version
    2. Windows Install
      1. Installing as a Service
    3. POSIX (Linux, Mac OS X, and Solaris) Install
      1. Installing from a Package Manager
  16. B. mongo: The Shell
    1. Shell Utilities
  17. C. MongoDB Internals
    1. BSON
    2. Wire Protocol
    3. Data Files
    4. Namespaces and Extents
    5. Memory-Mapped Storage Engine
  18. Index
  19. About the Authors
  20. Colophon
  21. Copyright

Chapter 1. Introduction

MongoDB is a powerful, flexible, and scalable data store. It combines the ability to scale out with many of the most useful features of relational databases, such as secondary indexes, range queries, and sorting. MongoDB is also incredibly featureful: it has tons of useful features such as built-in support for MapReduce-style aggregation and geospatial indexes.

There is no point in creating a great technology if it’s impossible to work with, so a lot of effort has been put into making MongoDB easy to get started with and a pleasure to use. MongoDB has a developer-friendly data model, administrator-friendly configuration options, and natural-feeling language APIs presented by drivers and the database shell. MongoDB tries to get out of your way, letting you program instead of worrying about storing data.

A Rich Data Model

MongoDB is a document-oriented database, not a relational one. The primary reason for moving away from the relational model is to make scaling out easier, but there are some other advantages as well.

The basic idea is to replace the concept of a “row” with a more flexible model, the “document.” By allowing embedded documents and arrays, the document-oriented approach makes it possible to represent complex hierarchical relationships with a single record. This fits very naturally into the way developers in modern object-oriented languages think about their data.

MongoDB is also schema-free: a document’s keys are not predefined or fixed in any way. Without a schema to change, massive data migrations are usually unnecessary. New or missing keys can be dealt with at the application level, instead of forcing all data to have the same shape. This gives developers a lot of flexibility in how they work with evolving data models.

Easy Scaling

Data set sizes for applications are growing at an incredible pace. Advances in sensor technology, increases in available bandwidth, and the popularity of handheld devices that can be connected to the Internet have created an environment where even small-scale applications need to store more data than many databases were meant to handle. A terabyte of data, once an unheard-of amount of information, is now commonplace.

As the amount of data that developers need to store grows, developers face a difficult decision: how should they scale their databases? Scaling a database comes down to the choice between scaling up (getting a bigger machine) or scaling out (partitioning data across more machines). Scaling up is often the path of least resistance, but it has drawbacks: large machines are often very expensive, and eventually a physical limit is reached where a more powerful machine cannot be purchased at any cost. For the type of large web application that most people aspire to build, it is either impossible or not cost-effective to run off of one machine. Alternatively, it is both extensible and economical to scale out: to add storage space or increase performance, you can buy another commodity server and add it to your cluster.

MongoDB was designed from the beginning to scale out. Its document-oriented data model allows it to automatically split up data across multiple servers. It can balance data and load across a cluster, redistributing documents automatically. This allows developers to focus on programming the application, not scaling it. When they need more capacity, they can just add new machines to the cluster and let the database figure out how to organize everything.

Tons of Features…

It’s difficult to quantify what a feature is: anything above and beyond what a relational database provides? Memcached? Other document-oriented databases? However, no matter what the baseline is, MongoDB has some really nice, unique tools that are not (all) present in any other solution.


MongoDB supports generic secondary indexes, allowing a variety of fast queries, and provides unique, compound, and geospatial indexing capabilities as well.

Stored JavaScript

Instead of stored procedures, developers can store and use JavaScript functions and values on the server side.


MongoDB supports MapReduce and other aggregation tools.

Fixed-size collections

Capped collections are fixed in size and are useful for certain types of data, such as logs.

File storage

MongoDB supports an easy-to-use protocol for storing large files and file metadata.

Some features common to relational databases are not present in MongoDB, notably joins and complex multirow transactions. These are architectural decisions to allow for scalability, because both of those features are difficult to provide efficiently in a distributed system.

…Without Sacrificing Speed

Incredible performance is a major goal for MongoDB and has shaped many design decisions. MongoDB uses a binary wire protocol as the primary mode of interaction with the server (as opposed to a protocol with more overhead, like HTTP/REST). It adds dynamic padding to documents and preallocates data files to trade extra space usage for consistent performance. It uses memory-mapped files in the default storage engine, which pushes the responsibility for memory management to the operating system. It also features a dynamic query optimizer that “remembers” the fastest way to perform a query. In short, almost every aspect of MongoDB was designed to maintain high performance.

Although MongoDB is powerful and attempts to keep many features from relational systems, it is not intended to do everything that a relational database does. Whenever possible, the database server offloads processing and logic to the client side (handled either by the drivers or by a user’s application code). Maintaining this streamlined design is one of the reasons MongoDB can achieve such high performance.

Simple Administration

MongoDB tries to simplify database administration by making servers administrate themselves as much as possible. Aside from starting the database server, very little administration is necessary. If a master server goes down, MongoDB can automatically failover to a backup slave and promote the slave to a master. In a distributed environment, the cluster needs to be told only that a new node exists to automatically integrate and configure it.

MongoDB’s administration philosophy is that the server should handle as much of the configuration as possible automatically, allowing (but not requiring) users to tweak their setups if needed.

But Wait, That’s Not All…

Throughout the course of the book, we will take the time to note the reasoning or motivation behind particular decisions made in the development of MongoDB. Through those notes we hope to share the philosophy behind MongoDB. The best way to summarize the MongoDB project, however, is through its main focus—to create a full-featured data store that is scalable, flexible, and fast.

The best content for your career. Discover unlimited learning on demand for around $1/day.