Chapter 15. Choosing a Shard Key

The most important and difficult task when using sharding is choosing how your data will be distributed. To make intelligent choices about this, you have to understand how MongoDB distributes data. This chapter helps you make a good choice of shard key by covering:

  • How to decide among multiple possible shard keys

  • Shard keys for several use cases

  • What you can’t use as a shard key

  • Some alternative strategies if you want to customize how data is distributed

  • How to manually shard your data

This chapter assumes that you understand the basic components of sharding as covered in the previous chapters.

Taking Stock of Your Usage

When you shard a collection you choose a field or two to use to split up the data. This key (or keys) is called a shard key. Once you have more than a few shards, it’s almost impossible to change your shard key, so it is important to choose correctly (or at least notice any issues quickly).

To choose a good shard key, you need to understand your workload and how your shard key is going to distribute your application’s requests. This can be difficult to picture, so try to work out some examples or, even better, try it out on a backup data set with sample traffic. This section has lots of diagrams and explanations, but there is no substitute for trying it on your own data set.

For each collection that you’re planning to shard, start by answering the following questions:

  • How many shards are you planning to grow to? A three-shard cluster has a ...

Get MongoDB: The Definitive Guide, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.