Analyzer

We have already learned about an inverted index. We know that Elasticsearch stores a document into an inverted index. This transformation is known as analysis. This is required for a successful response of the index search query.

Also, many of the times, we need to use some kind of transformation before sending that document to Elasticsearch index. We may need to change the document to lowercase, stripping off HTML tags if any from the document, remove white space between two words, tokenize the fields based on delimiters, and so on.

Elasticsearch offers the following built-in analyzers:

  • Standard analyzer: It is a default analyzer. This uses standard tokenizer to divide text. It normalizes tokens, lowercases tokens, and also removes ...

Get Modern Big Data Processing with Hadoop now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.