Chapter 11. Inside a Shard

In Chapter 2, we introduced the shard, and described it as a low-level worker unit. But what exactly is a shard and how does it work? In this chapter, we answer these questions:

  • Why is search near real-time?

  • Why are document CRUD (create-read-update-delete) operations real-time?

  • How does Elasticsearch ensure that the changes you make are durable, that they won’t be lost if there is a power failure?

  • Why does deleting documents not free up space immediately?

  • What do the refresh, flush, and optimize APIs do, and when should you use them?

The easiest way to understand how a shard functions today is to start with a history lesson. We will look at the problems that needed to be solved in order to provide a distributed durable data store with near real-time search and analytics.

Making Text Searchable

The first challenge that had to be solved was how to make text searchable. Traditional databases store a single value per field, but this is insufficient for full-text search. Every word in a text field needs to be searchable, which means that the database needs to be able to index multiple values—words, ...

Get Elasticsearch: The Definitive Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.