Building the graph
Now that we've imported our data, let's build our graph. To do this, we're going to build the structure for our vertices and edges. At the time of writing, GraphFrames requires a specific naming convention for vertices and edges:
- The column representing the vertices needs to have the name of
id
. In our case, the vertices of our flight data are the airports. Therefore, we will need to rename the IATA airport code toid
in ourairports
DataFrame. - The columns representing the edges need to have a source (
src
) and destination (dst
). For our flight data, the edges are the flights, therefore thesrc
anddst
are the origin and destination columns from thedepartureDelays_geo
DataFrame.
To simplify the edges for our graph, we will create ...
Get Learning PySpark now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.