Estimating the income bracket
We will build a classifier to estimate the income bracket of a person based on 14 attributes. The possible output classes are higher than 50K or lower than or equal to 50K. There is a slight twist in this dataset in the sense that each datapoint is a mixture of numbers and strings. Numerical data is valuable, and we cannot use a label encoder in these situations. We need to design a system that can deal with numerical and non-numerical data at the same time. We will use the census income dataset available at https://archive.ics.uci.edu/ml/datasets/Census+Income.
How to do it…
- We will use the
income.py
file already provided to you as a reference. We will use a Naive Bayes classifier to achieve this. Let's import a couple ...
Get Python: Real World Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.