Chapter 5. elasticsearch

Most of the technology stacks we use are not yet designed for the cloud. In Ubuntu (Linux) every so often you have to jump through hoops to get your root volume checked. Replication in most databases, synchronous or asynchronous, is notoriously hard to get right. And once running, scaling up resources takes an extreme amount of courage.

Most of these problems make your systems less resilient and reliable, because you just can’t be flexible with resources.

elasticsearch is the first infrastructural component that gets it right. They really understand what it takes to operate datastores (it is much more than search). And therefore this is the first example we’ll talk about.

Introduction

“It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Apache Lucene.”

The operational unit of elasticsearch is a cluster, not a server. Note that this is already different from many other datastore technologies. You can have a cluster of one, for development or test, or for transient data. But in production you will want to have at least two nodes most of the time.

elasticsearch holds json data in indexes. These indexes are broken up into shards. If you have a cluster with multiple nodes, shards are distributed in such a way that you can lose a node. You can manipulate almost everything in elasticsearch, so changing the sharding is not too difficult.

To add a document to an index (if the index doesn’t exist it is created): 

$ curl -XPOST 'http://elasticsearch.heystaq.com:9200/heystaq/snapshot/?pretty=true' ...

Get Resilience and Reliability on AWS now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.