Stage

If the Spark job and, hence, the action that resulted in the launching of that job, involves the shuffling of data (that is, the redistribution of data), then that job is broken down into stages. A new stage begins when network communication is required between the worker nodes. An individual stage is therefore defined as a collection of tasks processed by an individual executor with no dependency on other executors.

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.