- Data Analysis with Open Source Tools
- Dedication
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- A Note Regarding Supplemental Files
- Preface
- 1. Introduction
- I. Graphics: Looking at Data
- II. Analytics: Modeling Data
- III. Computation: Mining Data
- IV. Applications: Using Data
- A. Programming Environments for Scientific Computation and Data Analysis
- B. Results from Calculus
- C. Working with Data
- D. About the Author
- Index
- About the Author
- Colophon
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- Copyright

**WHEN
DEALING WITH SOME OF THE MORE COMPUTATIONALLY INTENSIVE DATA ANALYSIS OR
MINING** algorithms, you may encounter an unexpected
obstacle: *the brick wall*. Programs or algorithms
that seemed to work just fine turn out not to work once in production.
And I don’t mean that they work slower than expected. I mean they do not
work at all!

Of course, performance and scalability problems are familiar to most enterprise developers. However, the kinds of problems that arise in data-centric or computationally intensive applications are different, and most enterprise programmers (and, in fact, most computer science graduates) are badly prepared for them.

Let’s try an example: Table 15-1 shows the time required to perform 10 matrix multiplications for square matrices of various size. (The details of matrix multiplication don’t concern us here; suffice it to say that it’s the basic operation in almost all problems involving matrices and is at the heart of operator decomposition problems, including the principal component analysis introduced in Chapter 14.)

Table 15-1. Time required to perform 10 matrix multiplications for square matrices of different sizes

Size
| Time [seconds] |
---|---|

100 | 0.00 |

200 | 0.06 |

500 | 2.12 |

1,000 | 22.44 |

2,000 | 176.22 |

Would you agree that the data in Table 15-1 does not look too threatening? For a 2,000 × 2,000 matrix, the time required is a shade under three minutes. How long might it take to perform the same operation for a 10,000 × 10,000 matrix? Five, ...