Distributed systems

Horizontal scaling allows organizations to become much more cost efficient when data and processing requirements grow beyond a certain point. But simply adding more machines to a cluster would not be of much value by itself. What we now need are systems that are capable of taking advantage of horizontal scalability and that work across multiple machines seamlessly, irrespective of whether the cluster contains one machine or 10,000 machines.

Distributed systems do precisely that—they work seamlessly across a cluster of machines and automatically deal with the addition (or removal) of resources from that cluster. Distributed systems can be broken down into the following types:

  • Distributed filesystems
  • Distributed databases ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.