UIMA integration with Solr

Solr can also be integrated with Apache UIMA (short for Unstructured Information Management Architecture), which can be used to define a custom pipeline to add metadata to documents.

Note

More information about Solr UIMA integration can be found at https://wiki.apache.org/solr/SolrUIMA.

In Solr, UIMA can be configured by following these steps:

  1. In solrconfig.xml, we can add the following libraries:
    <lib dir="../../contrib/uima/lib" />
    <lib dir="../../dist/" regex="solr-uima-\d.*\.jar" />
  2. After adding the libraries, we can add the following fields to schema.xml, which will contain the language, concept, and sentence fields:
    <field name="language" type="string" indexed="true" stored="true" required="false"/> <field name="concept" ...

Get Apache Solr for Indexing Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.