In the case study for this chapter, we will extend the sentiment analysis model that we developed in Chapter 6, Natural Language Processing Using Apache Spark, to operate in real time. In Chapter 6, Natural Language Processing Using Apache Spark, we trained a decision tree classifier to predict and classify the underlying sentiment of tweets based on a training dataset of historic tweets about airlines. In this chapter, we will apply this trained decision tree classifier to real-time tweets in order to predict their sentiment and identify negative tweets so that airlines may act on them as soon as possible.
Our end-to-end stream processing pipeline can therefore be extended, as illustrated in Figure ...