O'Reilly logo

Apache Solr 4 Cookbook by Rafal Kuc'

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Detecting and omitting duplicate documents

Imagine your data consists of duplicates because they come from different sources. For example, you have books that come from different suppliers, but you are only interested in a single book with the same name. Of course you could use the field collapsing feature during the query, but that affects query performance and we would like to avoid that. This recipe will show you how to use the Solr deduplication functionality.

How to do it...

  1. We start with the simple index structure. This should be placed in the fields section of your schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="name" type="text" indexed="true" stored="true" ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required