- Data Analysis with Open Source Tools
- Dedication
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- A Note Regarding Supplemental Files
- Preface
- 1. Introduction
- I. Graphics: Looking at Data
- II. Analytics: Modeling Data
- III. Computation: Mining Data
- IV. Applications: Using Data
- A. Programming Environments for Scientific Computation and Data Analysis
- B. Results from Calculus
- C. Working with Data
- D. About the Author
- Index
- About the Author
- Colophon
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- Copyright

**OCCASIONALLY I GET THE QUESTION: “HOW DO YOU ACTUALLY
WORK?” OR “HOW DO YOU COME UP WITH THIS** stuff?” As
an answer, I want to take you on a tour through a new data set. I will
use gnuplot, which is my preferred tool for this kind of interactive
data analysis—you will see why. And I will share my observations and
thoughts as we go along.

The data set is a classic: the CO_{2}
measurements above Mauna Loa on Hawaii. The inspiration for this
section comes from Cleveland’s *Elements of Graphical
Analysis*,^{[12]} but the approach is entirely mine.

First question: what’s in the data set? I see that the first
column represents the date (month and year) while the second contains
the measured CO_{2} concentration in parts per
million. Here are the first few lines:

Jan-1959 315.42 Feb-1959 316.32 Mar-1959 316.49 Apr-1959 317.56 ...

The measurements are regularly spaced (in fact, monthly), so I don’t need to parse the date in the first column; I simply plot the second column by itself. (In the figure, I have added tick labels on the horizontal axis for clarity, but I am omitting the commands required here—they are not essential.)

Figure 6-1. The first look at the data: *plot
“data”u1wl*

plot "data" u 2 w l

The plot shows a rather regular short-term variation overlaid on a nonlinear upward trend. (See Figure 6-1.)

The coordinate system is not convenient ...