We are finally ready to develop our Kafka consumer application using the Spark Structured Streaming engine in order to apply our trained decision tree classifier to the stream of real-time tweets in order to deliver real-time sentiment analysis!
In regards to our Spark Structured-Streaming-based Kafka consumer application, we perform the following steps (numbered to correspond to the numbered comments in our Python code file):
- First, we import the configuration from our config.py file. We also import the Python functions containing the logic for our preprocessing and vectorization ...