Local working directories

Each node in a Spark cluster (in our case, just the single node) will generate log files as well as local working files, such as when shuffling and serializing RDD data. The following commands will create defined local directories in which to store these local working outputs, the paths of which you can edit as per your preferences and which will be used in later configuration files:

> mkdir -p /data/spark/local/data> mkdir -p /data/spark/local/logs> mkdir -p /data/spark/local/pid> mkdir -p /data/spark/local/worker

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.