Converting categories to numbers in Pandas for a speed boost

When you have text categories in your data, you can dramatically speed up the processing of that data using Pandas categoricals. Categoricals encode the text as numerics, which allows us to take full advantage of Pandas' fast C code. Examples of times when you'd use categoricals are stock symbols, gender, experiment outcomes, states, and in this case, a customer loyalty level.

Getting ready

Import Pandas, and create a new DataFrame to work with.

import pandas as pd import numpy as np lc = pd.DataFrame({ 'people' : ["cole o'brien", "lise heidenreich", "zilpha skiles", "damion wisozk"], 'age' : [24, 35, 46, 57], 'ssn': ['6439', '689 24 9939', '306-05-2792', '992245832'], 'birth_date': ['2/15/54', ...

Get Python Business Intelligence Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.