Chapter 2. Analyzing Your Text

In this chapter, we will cover the following recipes:

  • Obtaining a common analyzer
  • Obtaining a TokenStream
  • Obtaining TokenAttribute values
  • Using PositionIncrementAttribute
  • Using PerFieldAnalyzerWrapper
  • Defining custom TokenFilters
  • Defining custom analyzers
  • Defining custom tokenizers
  • Defining custom attributes

Introduction

Before we begin, let's review Lucene's analysis process. We learned about various components in creating and searching an index using IndexWriter and IndexSearcher in the previous chapter. We also looked at analyzer; how it's leveraged in tokenizing and cleansing data; and Lucene's internal index structure, the inverted index for high-performance lookup. We touched on Term and how it's used in querying.

A

Get Lucene 4 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.