- Machine Learning for Hackers
- Preface
- 1. Using R
- 2. Data Exploration
- 3. Classification: Spam Filtering
- 4. Ranking: Priority Inbox
- 5. Regression: Predicting Page Views
- 6. Regularization: Text Regression
- 7. Optimization: Breaking Codes
- 8. PCA: Building a Market Index
- 9. MDS: Visually Exploring US Senator Similarity
- 10. kNN: Recommendation Systems
- 11. Analyzing Social Graphs
- 12. Model Comparison
- Works Cited
- Index
- About the Authors
- Colophon
- Copyright

Whenever you work with data, it’s helpful to imagine breaking up
your analysis into two completely separate parts: exploration and
confirmation. The distinction between exploratory data analysis and
confirmatory data analysis comes down to us from the famous John Tukey,^{[6]} who emphasized the importance of designing simple tools
for practical data analysis. In Tukey’s mind, the exploratory steps in
data analysis involve using summary tables and basic visualizations to
search for hidden patterns in your data. In this chapter, we describe
some of the basic tools that R provides for summarizing your data
numerically, and then we teach you how to make sense of the results.
After that, we show you some of the tools that exist in R for
visualizing your data, and at the same time, we give you a whirlwind
tour of the basic visual patterns that you should keep an eye out for in
any gization.

But before you start searching through your first data set, we should warn you about a real danger that’s present whenever you explore data: you’re likely to find patterns that aren’t really there. The human mind is designed to find patterns in the world and will do so even when those patterns are just quirks of chance. You don’t need a degree in statistics to know that we human beings will easily find shapes in clouds after looking at them for only a few seconds. And plenty of people have convinced themselves that they’ve discovered hidden messages ...