WHEN DEALING WITH SOME OF THE MORE COMPUTATIONALLY INTENSIVE DATA ANALYSIS OR MINING algorithms, you may encounter an unexpected obstacle: the brick wall. Programs or algorithms that seemed to work just fine turn out not to work once in production. And I don’t mean that they work slower than expected. I mean they do not work at all!
Of course, performance and scalability problems are familiar to most enterprise developers. However, the kinds of problems that arise in data-centric or computationally intensive applications are different, and most enterprise programmers (and, in fact, most computer science graduates) are badly prepared for them.
Let’s try an example: Table 15-1 shows the time required to perform 10 matrix multiplications for square matrices of various size. (The details of matrix multiplication don’t concern us here; suffice it to say that it’s the basic operation in almost all problems involving matrices and is at the heart of operator decomposition problems, including the principal component analysis introduced in Chapter 14.)
Table 15-1. Time required to perform 10 matrix multiplications for square matrices of different sizes
Would you agree that the data in Table 15-1 does not look too threatening? For a 2,000 × 2,000 matrix, the time required is a shade under three minutes. How long might it take to perform the same operation for a 10,000 × 10,000 matrix? Five, ...