Much of the programming work in data analysis and modeling is spent on data preparation: loading, cleaning, transforming, and rearranging. Sometimes the way that data is stored in files or databases is not the way you need it for a data processing application. Many people choose to do ad hoc processing of data from one form to another using a general purpose programming, like Python, Perl, R, or Java, or UNIX text processing tools like sed or awk. Fortunately, pandas along with the Python standard library provide you with a high-level, flexible, and high-performance set of core manipulations and algorithms to enable you to wrangle data into the right form without much trouble.
If you identify a type of data manipulation that isn’t anywhere in this book or elsewhere in the pandas library, feel free to suggest it on the mailing list or GitHub site. Indeed, much of the design and implementation of pandas has been driven by the needs of real world applications.
Data contained in pandas objects can be combined together in a number of built-in ways:
pandas.merge connects rows in
DataFrames based on one or more keys. This will be familiar to users
of SQL or other relational databases, as it implements database
or stacks together objects along an axis.
combine_first instance method enables splicing together overlapping data to fill in missing values in one object with ...