Introduction to distributed versus parallel computing

Distributed computing is a subfield of computer science that studies distributed systems and models in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal.

It is worthwhile to discuss another phrase: parallel computing. Parallel computing is more tightly coupled to multi-threading, or how to make full use of a single CPU, while distributed computing refers to the notion of divide and conquer, executing subtasks on different machines, and then merging the results.

Since we have entered a so-called big data era, it seems that the distinction is melting. In fact, ...

Get Hands-On Data Science with Anaconda now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.