OCCASIONALLY I GET THE QUESTION: “HOW DO YOU ACTUALLY WORK?” OR “HOW DO YOU COME UP WITH THIS stuff?” As an answer, I want to take you on a tour through a new data set. I will use gnuplot, which is my preferred tool for this kind of interactive data analysis—you will see why. And I will share my observations and thoughts as we go along.
The data set is a classic: the CO2 measurements above Mauna Loa on Hawaii. The inspiration for this section comes from Cleveland’s Elements of Graphical Analysis, but the approach is entirely mine.
First question: what’s in the data set? I see that the first column represents the date (month and year) while the second contains the measured CO2 concentration in parts per million. Here are the first few lines:
Jan-1959 315.42 Feb-1959 316.32 Mar-1959 316.49 Apr-1959 317.56 ...
The measurements are regularly spaced (in fact, monthly), so I don’t need to parse the date in the first column; I simply plot the second column by itself. (In the figure, I have added tick labels on the horizontal axis for clarity, but I am omitting the commands required here—they are not essential.)
Figure 6-1. The first look at the data: plot “data”u1wl
plot "data" u 2 w l
The plot shows a rather regular short-term variation overlaid on a nonlinear upward trend. (See Figure 6-1.)
The coordinate system is not convenient ...