The most important and difficult task when using sharding is choosing how your data will be distributed. To make intelligent choices about this, you have to understand how MongoDB distributes data. This chapter helps you make a good choice of shard key by covering:
How to decide among multiple possible shard keys
Shard keys for several use cases
What you can’t use as a shard key
Some alternative strategies if you want to customize how data is distributed
How to manually shard your data
This chapter assumes that you understand the basic components of sharding as covered in the previous chapters.
When you shard a collection you choose a field or two to use to split up the data. This key (or keys) is called a shard key. Once you have more than a few shards, it’s almost impossible to change your shard key, so it is important to choose correctly (or at least notice any issues quickly).
To choose a good shard key, you need to understand your workload and how your shard key is going to distribute your application’s requests. This can be difficult to picture, so try to work out some examples or, even better, try it out on a backup data set with sample traffic. This section has lots of diagrams and explanations, but there is no substitute for trying it on your own data set.
For each collection that you’re planning to shard, start by answering the following questions:
How many shards are you planning to grow to? A three-shard cluster has a ...