Improving the performance of the Spark job

In the previous recipe, we wrote a simple Spark job that filters out invalid geolocations and pushes the valid geolocations into a Kafka topic. In this recipe, we will see how we can improve the performance of our Spark job.

How to do it...

There are several ways in which you can improve the performance of your Spark job. There are a lot many configurations that Spark provides that can be tweaked to achieve desired performance. For example, based on the amount of data that your topic receives, you could change the batch duration of your stream. Also, deploying your Spark job on a Mesos or YARN cluster opens up a lot of opportunities for performance improvement. In fact, running your Spark job in local standalone ...

Get Microservices Deployment Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.