CHAPTER 5

image

Scheduling and Workflow

When you’re working with big data in a distributed, parallel processing environment like Hadoop, job scheduling and workflow management are vital for efficient operation. Schedulers enable you to share resources at a job level within Hadoop; in the first half of this chapter, I use practical examples to guide you in installing, configuring, and using the Fair and Capacity schedulers for Hadoop V1 and V2. Additionally, at a higher level, workflow tools enable you to manage the relationships between jobs. For instance, a workflow might include jobs that source, clean, process, and output a data source. Each job ...

Get Big Data Made Easy: A Working Guide to the Complete Hadoop Toolset now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.