Chapter 2. Analyzing Your Text

In this chapter, we will cover the following recipes:

Obtaining a common analyzer
Obtaining a TokenStream
Obtaining TokenAttribute values
Using PositionIncrementAttribute
Using PerFieldAnalyzerWrapper
Defining custom TokenFilters
Defining custom analyzers
Defining custom tokenizers
Defining custom attributes

Introduction

Before we begin, let's review Lucene's analysis process. We learned about various components in creating and searching an index using IndexWriter and IndexSearcher in the previous chapter. We also looked at analyzer; how it's leveraged in tokenizing and cleansing data; and Lucene's internal index structure, the inverted index for high-performance lookup. We touched on Term and how it's used in querying.

Get Lucene 4 Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Lucene 4 Cookbook by Edwood Ng, Vineeth Mohan

Chapter 2. Analyzing Your Text

Introduction

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly