Estimating the income bracket

We will build a classifier to estimate the income bracket of a person based on 14 attributes. The possible output classes are higher than 50K or lower than or equal to 50K. There is a slight twist in this dataset in the sense that each datapoint is a mixture of numbers and strings. Numerical data is valuable, and we cannot use a label encoder in these situations. We need to design a system that can deal with numerical and non-numerical data at the same time. We will use the census income dataset available at https://archive.ics.uci.edu/ml/datasets/Census+Income.

How to do it…

We will use the income.py file already provided to you as a reference. We will use a Naive Bayes classifier to achieve this. Let's import a couple ...

Get Python: Real World Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Python: Real World Machine Learning by Prateek Joshi, John Hearty, Bastiaan Sjardin, Luca Massaron, Alberto Boschetti

Estimating the income bracket

How to do it…

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly