How it works...

The Bag-of-Words model works in two phases. In the training phase, one collects local image descriptors for training images (img0 and img1, in our case) and clusters them into vocabulary. In the second phase, local descriptors found in the input image are compared with all vocabulary words, alongside a list of how often each word appeared (for example, was selected as the closest one) within the image, for example, the frequencies vector, which forms the global image descriptor.

The following is the expected output:

Get OpenCV 3 Computer Vision with Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.