O'Reilly logo

Apache Spark for Data Science Cookbook by Padma Priya Chitturi

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Implementing openNLP - sentence detector over Spark

Partitioning text into sentences is called Sentence Boundary Disambiguation (SBD) or Sentence Detection. This process is useful for many downstream NLP tasks, which require analysis within sentences; for instance POS and phrase analysis. This Sentence Detection process is language dependent. Most search engines are not concerned with Sentence Detection. They are only interested in query's tokens and their respective positions. POS taggers and other NLP tasks that perform extraction of data will frequently process individual sentences. The detection of sentence boundaries will help separate phrases that might appear to span sentences.

Getting ready

To step through this recipe, you will need a running ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required