4. Data Assembly

4.1 Introduction

By now, you should be able to load data into Pandas and do some basic visualizations. This part of the book focuses on various data cleaning tasks. We begin with assembling a data set for analysis by combining various data sets together.

Concept Map

1. Prior knowledge

a. loading data

b. subsetting data

c. functions and class methods

Objectives

This chapter will cover:

1. Tidy data

2. Concatenating data

3. Merging data sets

4.2 Tidy Data

Hadley Wickham,1 one of the more prominent members of the R community, talks about the idea of tidy data. In fact, he’s written a paper about this concept in the Journal of Statistical Software.2 Tidy data is a framework to structure data sets so they can be easily analyzed. ...

Get Pandas for Everyone: Python Data Analysis, First Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.