Summary

In this chapter, we have studied, implemented, and evaluated common algorithms that are used in natural language processing. We have preprocessed a corpus of documents using feature transformers and generated feature vectors from the resulting processed corpus using feature extractors. We have also applied these common NLP algorithms to machine learning. We trained and tested a sentiment analysis model that we used to predict the underlying sentiment of tweets so that organizations may improve their product and service offerings. In Chapter 8, Real-Time Machine Learning Using Apache Spark, we will extend our sentiment analysis model to operate in real time using Spark Streaming and Apache Kafka.

In the next chapter, we will take a ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.