Advanced topics with Solr

We have dealt with various data and their types. Most of the cases in enterprise search can be addressed by the different techniques we have gone through. In this section, we will go through some advanced topics for analyzing your data with Solr. We will also try to explore integration with NLP tools to make the incoming data more sensible and effective.

Deduplication

Deduplication in Apache Solr is all about avoiding duplicate documents from entering in the storage of Apache Solr. Apache Solr prevents these duplicates at the document as well as the field level. This is a new feature of Apache Solr 4.x release. The duplicates in the storage can be avoided by means of hashing techniques. Apache Solr supports native de-duplication ...

Get Scaling Apache Solr now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.