Train a little, learn a little – active learning
Active learning is a super power to quickly develop classifiers. It has saved many a project in the real world. The idea is very simple and can be broken down as follows:
- Assemble a packet of raw data that is way bigger than you can annotate manually.
- Annotate an embarrassingly small amount of the raw data.
- Train the classifier on the embarrassingly small amount of training data.
- Run the trained classifier on all the data.
- Put the classifier output into a
.csv file ranked by confidence of best category.
- Correct another embarrassingly small amount of data, starting with the most confident classifications.
- Evaluate the performance.
- Repeat the process until the performance is acceptable, or you run out of ...