Chapter 5. Data Preparation

After providing solid foundations for an understanding of the two basic linear models for regression and classification, we devote this chapter to a discussion about the data feeding the model. In the next pages, we will describe what can routinely be done to prepare the data in the best way and how to deal with more challenging situations, such as when data is missing or outliers are present.

Real-world experiments produce real data, which, in contrast to synthetic or simulated data, is often very varied. Real data is also quite messy, and frequently it proves wrong in ways that are obvious and some that are, initially, quite subtle. As a data practitioner, you will almost never find your data already prepared in the ...

Get Regression Analysis with Python now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.