In this chapter, we will install, configure, and deploy a local analytical development environment by provisioning a self-contained single-node cluster that will allow us to do the following:
- Prototype and develop machine learning models and pipelines in Python
- Demonstrate the functionality and usage of Apache Spark's machine learning library, MLlib, via the Spark Python API (PySpark)
- Develop and test machine learning models on a single-node cluster using small sample datasets, and thereafter scale up to multi-node clusters processing much larger datasets with little or no code changes required
Our single-node cluster will host the following technologies:
- Operating system: CentOS Linux 7 https://www.centos.org/download/ ...