Scaling

You have data and you have a running Hadoop cluster; now you get more of the former and need more of the latter. We have said repeatedly that Hadoop is an easily scalable system. So let us add some new capacity.

Adding capacity to a local Hadoop cluster

Hopefully, at this point, you should feel pretty underwhelmed at the idea of adding another node to a running cluster. All through Chapter 6, When Things Break, we constantly killed and restarted nodes. Adding a new node is really no different; all you need to do is perform the following steps:

  1. Install Hadoop on the host.
  2. Set the environment variables shown in Chapter 2, Getting Up and Running.
  3. Copy the configuration files into the conf directory on the installation.
  4. Add the host's DNS name or ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.