Chapter 14. Working with Text 2: Searching

Chapter 13 discussed one common approach to working with text, defining a structure through a language called regular expressions and checking an input against this structure. One application of this approach is finding portions of text from a collection that meet a certain precisely defined criteria.

A very common related problem is finding documents from within a large collection that meet a less precisely defined requirement—for example, finding all Web pages that discuss JavaServer Pages or finding all e-mails from John Smith. This is called a text search or free text search to enforce the idea that the desired text is “free” to appear anywhere in the documents.

Text searches could be tackled by the ...

Get Apache Jakarta and Beyond: A Java Programmer’s Introduction now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.