Chapter 7. Analyzing Co-Occurrence Networks with GraphX

It’s a small world. It keeps recrossing itself.

David Mitchell

Data scientists come in all shapes and sizes from a remarkably diverse set of academic backgrounds. Although many have some training in disciplines like computer science, mathematics, and physics, others have studied neuroscience, sociology, and political science. Although these fields study different things (e.g., brains, people, political institutions) and have not traditionally required students to learn how to program, they all share two important characteristics that have made them fertile training ground for data scientists.

First, all of these fields are interested in understanding relationships between entities, whether between neurons, individuals, or countries, and how these relationships affect the observed behavior of the entities. Second, the explosion of digital data over the past decade has given researchers access to vast quantities of information about these relationships and required that they develop new skills in order to acquire and manage these data sets.

As these researchers began to collaborate with each other and with computer scientists, they also discovered that many of the techniques they were using to analyze relationships could be applied to problems across domains, and the field of network science was born. Network science applies tools from graph theory, the mathematical discipline that studies the properties of pairwise ...

Get Advanced Analytics with Spark, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.