Obtaining a common analyzer

Lucene provides a set of default analyzers in the lucene-analyzers-common package. Let's take a look at them in detail.

Getting ready

The following are five common analyzers Lucene provides in the lucene-analyzers-common module:

  • WhitespaceAnalyzer: Splits text at whitespaces, just as the name indicates. In fact, this is the only thing this analyzer does.
  • SimpleAnalyzer: Splits text at non-letter characters and lowercases resulting tokens.
  • StopAnalyzer: Splits text at non-letter characters, lowercases resulting tokens, and removes stopwords. This analyzer is useful for pure text content and is not ideal if the content contains words with special characters such as product model number. This analyzer comes with a default set ...

Get Lucene 4 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.