O'Reilly logo

Real-Time Big Data Analytics by Shilpi Saxena, Sumit Gupta

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Performance tuning and best practices

In this section, we will discuss various strategies for optimizing the performance of our Spark jobs. We will also discuss a few best practices with respect to Spark and Spark SQL.

Performance tuning is very subjective and a wide open statement. The very first step in performance tuning is to answer the question, "Do we really need to performance tune our jobs?" Now before we answer this question, we need to consider the following aspects:

  • Are our jobs meeting SLAs specified by the business?

    If yes, then no need for performance tuning.

  • What do we want to achieve and is it realistic?

    For example, expecting all Spark jobs (irrespective of data size or computations performed) to be completed in milliseconds is unrealistic. ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required