In This Chapter
Understanding how Scikit-learn works with classes
Using sparse matrices and the hashing trick
Testing performances and memory consumption
Saving time with multicore algorithms
If you’ve gone through the previous chapters, by this point you’ve dealt with all the basic data loading and manipulation methods offered by Python. Now it’s time to start using some more complex instruments for data wrangling (or munging) and for machine learning. The final step of most data science projects is to build a data tool able to automatically summarize, predict, and recommend directly from your data.
Before taking that final step, you still have to massage your data by enforcing transformations that are even more radical. That’s the data wrangling or data munging part, where sophisticated transformations are followed by visual and statistical explorations, and then again by further transformations. In the following sections, you learn how to handle huge streams of text, explore the basic characteristics of a dataset, optimize the speed ...