Time for action – representing the graph

Let's define a textual representation of the graph that we'll use in the following examples.

Create the following as graph.txt:

12,3,40C
21,4
31,5,6
41,2
53,6
63,5
76

What just happened?

We defined a file structure that will represent our graph, based somewhat on the adjacency list approach. We assumed that each node has a unique ID and the file structure has four fields, as follows:

  • The node ID
  • A comma-separated list of neighbors
  • The distance from the start node
  • The node status

In the initial representation, only the starting node has values for the third and fourth columns: its distance from itself is 0 and its status is "C", which we'll explain later.

Our graph is directional—more formally referred to as a directed ...

Get Hadoop: Data Processing and Modelling now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.