Chapter 12. Text Indexing and Lookup

Beside the “basic” indexing capabilities, as handled in Chapter 11, eXist also has a full-text index based on the Apache Lucene text search-engine library. Lucene allows eXist to offer search capabilities like words near each other, words like other words, Boolean operators, and more. Full-text indexes allow you to do much more with your content than you can do using straight XPath expressions.

If your application needs search based on human input, such as searching documentation or the like, full-text indexes can really help. But things get even better: on top of the full-text index searches, eXist offers “keyword in context,” or KWIC, functionality. This makes it extremely easy to display the results of your searches in context, showing the search results within the surrounding text. KWIC is handled in Using Keywords in Context.

Full-Text Index and KWIC Example

The examples for this book contain a simple full-text search example. This example searches, using the full-text index, over some ancient Encyclopedia Britannica entries. Important components of the example are:

  • The index definition in /db/system/config/db/apps/exist-book/indexing/data/collection.xconf defines a full-text index on tei:p elements:

    <collection xmlns="http://exist-db.org/collection-config/1.0">
      <index xmlns:tei="http://www.tei-c.org/ns/1.0">
        
        <!-- other indexes -->
        
        <lucene>
          <text qname="tei:p"/>
        </lucene>
      </index>
    </collection>
  • An extremely simple HTML form that allows you to ...

Get eXist now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.