Configuring Apache Tika in Solr

Let's go ahead and create a new core called tika-example in our Solr instance. To make things easier, you can copy the core from the Chapter 6 folder of the ZIP file that comes with this book. After creating the core, we'll need to configure solrconfig.xml.

We need to add the extraction libraries that are available in the %SOLR_HOME/contrib/extraction/lib folder, and also the solr-cell library in solrconfig.xml:

<lib dir="${solr.install.dir:../../..}/contrib/extraction/lib" regex=".*\.jar"/>
<lib dir="${solr.install.dir:../../..}/dist/" regex="solr-cell-\d.*\.jar"/>

We can then configure ExtractingRequestHandler in solrconfig.xml:

<requestHandler name="/update/extract" class="solr.extraction.ExtractingRequestHandler"> ...

Get Apache Solr for Indexing Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.