Setting the analyzer

Analyzers constitute an important part of indexing. To understand what analyzers do, let's consider three documents:

  • Document1 (tokens): { This , is , easy }
  • Document2 (tokens): { This , is , fast }
  • Document3 (tokens): { This , is , easy , and , fast }

Here, terms such as This, is, as well as and are not relevant keywords. The chances of someone wanting to search for such words are very less, as these words don't contribute to the facts or context of the document. Hence, it's safe to avoid these words while indexing or rather you should avoid making these words searchable.

So, the tokenization would be as follows:

  • Document1 (tokens): { easy }
  • Document2 (tokens): { fast }
  • Document3 (tokens): { easy , fast }

Words such as the, or, as ...

Get Elasticsearch Blueprints now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.