Cover by Rafal Kuc'

Safari, the world’s most comprehensive technology and business learning platform.

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required

O'Reilly logo

Detecting and omitting duplicate documents

Imagine your data consists of duplicates because they come from different sources. For example, you have books that come from different suppliers, but you are only interested in a single book with the same name. Of course you could use the field collapsing feature during the query, but that affects query performance and we would like to avoid that. This recipe will show you how to use the Solr deduplication functionality.

How to do it...

  1. We start with the simple index structure. This should be placed in the fields section of your schema.xml file:
    <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> <field name="name" type="text" indexed="true" stored="true" ...

Find the exact information you need to solve a problem on the fly, or go deeper to master the technologies and skills you need to succeed

Start Free Trial

No credit card required