Part 2. Applied Lucene

Lucene itself is just a JAR, with the real fun and power coming from what you build around it. Part 2 explores ways to leverage Lucene. Projects commonly demand full-text searching of Microsoft Office, PDF, HTML, XML, and other document formats. “Extracting text with Tika” (chapter 7) illuminates ways to index these document types into Lucene. So many extensions have been developed to augment and extend Lucene that we dedicate two chapters, “Essential Lucene Extensions” (chapter 8) and “Further Lucene extensions” (chapter 9) to them. Although Java is the primary language used with Lucene, the index format is language neutral. “Using Lucene from other programming languages,” (chapter 10) explores Lucene usage from languages ...

Get Lucene in Action, Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.