Chapter 3. Analyzing Your Text Data

In this chapter, we will cover the following topics:

  • Using the enumeration type
  • Removing HTML tags during indexing
  • Storing data outside of Solr index
  • Using synonyms
  • Stemming different languages
  • Using nonaggressive stemmers
  • Using the n-gram approach to do performant trailing wildcard searches
  • Using position increment to divide sentences
  • Using patterns to replace tokens

Introduction

The process of data indexing can be divided into parts. One of the parts is data analysis. It's one of the crucial parts of data preparation. It defines how your data will be divided into terms from text, and what type it will be. The Solr data parsing behavior is defined by types. A type's behavior can be defined in the context of the indexing ...

Get Solr Cookbook - Third Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.