To get the most out of this book

Though this book aims to explain everything from first principles, it would be advantageous (though not strictly required) to have a basic knowledge of mathematical notation and basic programming skills in a language that can be used for data transformation, such as SQL, Base SAS, R, or Python. A good website for beginners to learn about SQL and Python is https://www.w3schools.com.

It is assumed that you have access to a physical or virtual machine provisioned with the CentOS Linux 7 (or Red Hat Linux) operating system. If you do not, Chapter 2, Setting Up a Local Development Environment, describes the various options available to provision a CentOS 7 virtual machine (VM), including via cloud-computing platforms ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.