Spark GraphX in Action

Chapter 9. Performance and monitoring

This chapter covers

Monitoring Spark applications
Performance-related configuration options
Tuning your application for maximum performance
Using graph partitioning to boost large-scale processing

Most of the examples we’ve looked at so far have been small-scale. They would run on one machine and complete their processing without requiring a large amount of computing resources. But one of the key reasons to use Apache Spark is to take advantage of its distributed processing model. Spark’s ability to distribute data and processing across a cluster of many machines is the key to its capacity to run the type of processing we’ve discussed on large datasets.

Once you have a cluster with plenty of resources ...

Get Spark GraphX in Action now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Spark GraphX in Action by Robin East, Michael Malak

Chapter 9. Performance and monitoring

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly