O'Reilly logo

Natural Language Processing with Java and LingPipe Cookbook by Krishna Dayanidhi, Breck Baldwin

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Paragraph detection

The typical containing structure of a set of sentences is a paragraph. It can be set off explicitly in a markup language such as <p> in HTML or with two or more new lines, which is how paragraphs are usually rendered. We are in the part of NLP where no hard-and-fast rules apply, so we apologize for the hedging. We will handle some common examples in this chapter and leave it to you to generalize.

How to do it...

We have never set up an evaluation harness for paragraph detection, but it can be done in ways similar to sentence detection. This recipe, instead, will illustrate a simple paragraph-detection routine that does something very important—maintain offsets into the original document with embedded sentence detection. This ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required