- Data Analysis with Open Source Tools
- Dedication
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- A Note Regarding Supplemental Files
- Preface
- 1. Introduction
- I. Graphics: Looking at Data
- II. Analytics: Modeling Data
- III. Computation: Mining Data
- IV. Applications: Using Data
- A. Programming Environments for Scientific Computation and Data Analysis
- B. Results from Calculus
- C. Working with Data
- D. About the Author
- Index
- About the Author
- Colophon
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- Copyright

**MOST DATA ANALYSIS INVOLVES A GOOD
DEAL OF DATA MANIPULATION AND NUMERICAL COMPUTATION. OF** course,
we use computers for these tasks, hence we also need appropriate
software.

This appendix is intended to give a brief survey of several popular software systems suitable for the kind of data analysis discussed in the rest of the book. I am mostly interested in open source software, although I also mention some of the most important commercial players.

The emphasis here is on *programming
environments* for scientific applications
(*i.e.*, libraries or packages intended for general
data manipulation and computation) because being able to operate with data
easily and conveniently is a fundamental capability for all data analysis
efforts. On the other hand, I do not include programs intended exclusively
for graphing data: not because visualization is not important (it is), but
because the choice of plotting or visualization software is less
fundamental.

In many ways, our choice of a data manipulation environment determines what problems we can solve; it certainly determines which problems we consider to be “easy” problems. For data analysis, the hard problem that we should be grappling with is always the data and what it is trying to tell us—the mechanics of handling it should be sufficiently convenient that we don’t even think about them.

Properties I look for in a tool or programming environment include: ...