Chapter 6. NLP with Spark

In this chapter, we will see how to run NLP algorithms over Spark. You will learn the following recipes:

  • Installing NLTK on Linux
  • Installing Anaconda on Linux
  • Anaconda for cluster management
  • POS tagging with PySpark on an Anaconda cluster
  • Named Entity Recognition with IPython over Spark
  • Implementing openNLP - chunker over Spark
  • Implementing openNLP - sentence detector over Spark
  • Implementing stanford NLP - lemmatization over Spark
  • Implementing sentiment analysis using stanford NLP over Spark

Introduction

The study of natural language processing is called NLP. It is about the application of computers on different language nuances and building real-world applications using NLP techniques. NLP is analogous to teaching a language to ...

Get Apache Spark for Data Science Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.