Learning OpenCV, 2nd Edition

Histograms and Matching

In the course of analyzing images, objects, and video information, we frequently want to represent what we

are looking at as a histogram. Histograms can be used to represent such diverse things as the color

distribution of an object, an edge gradient template of an object [Freeman95], and the distribution of

probabilities representing our current hypothesis about an object’s location. Figure 7-1 shows the use of

histograms for rapid gesture recognition. Edge gradients were collected from “up,” “right,” “left,” “stop”

and “OK” hand gestures. A webcam was then set up to watch a person who used these gestures to control

web videos. In each frame, color interest regions were detected from the incoming video; then edge

gradient directions were computed around these interest regions, and these directions were collected into

orientation bins within a histogram. The histograms were then matched against the gesture models to

recognize the gesture. The vertical bars in Figure 7-1 show the match levels of the different gestures. The

gray horizontal line represents the threshold for acceptance of the “winning” vertical bar corresponding to a

gesture model.

Histograms find uses in many computer vision applications. Histograms are used to detect scene transitions

in videos by marking when the edge and color statistics markedly change from frame to frame. They are

used to identify interest points in images by assigning each interest point a “tag” consisting of histograms

of nearby features. Histograms of edges, colors, corners, and so on form a general feature type that is

passed to classifiers for object recognition. Sequences of color or edge histograms are used to identify

whether videos have been copied on the web, where scenes change in a movie, in image retrieval from

massive databases, and the list goes on. Histograms are one of the classic tools of computer vision.

Histograms are simply collected counts of the underlying data organized into a set of predefined bins. They

can be populated by counts of features computed from the data, such as gradient magnitudes and directions,

color, or just about any other characteristic. In any case, they are used to obtain a statistical picture of the

underlying distribution of data. The histogram usually has fewer dimensions than the source data. Figure 7-

2 depicts a typical situation. The figure shows a two-dimensional distribution of points (upper-left); we

impose a grid (upper-right) and count the data points in each grid cell, yielding a one-dimensional

histogram (lower-right). Because the raw data points can represent just about anything, the histogram is a

handy way of representing whatever it is that you have learned from your image.

Figure 7-1: Local histograms of gradient orientations are used to find the hand and its gesture; here the

“winning” gesture (longest vertical bar) is a correct recognition of “L” (move left)

Histograms that represent continuous distributions do so by quantizing the points into each grid cell.

This

is where problems can arise, as shown in Figure 7-3. If the grid is too wide (upper-left), then the output is

too coarse and we lose the structure of the distribution. If the grid is too narrow (upper-right), then there is

not enough averaging to represent the distribution accurately and we get small, “spiky” cells.

Figure 7-2: Typical histogram example: starting with a cloud of points (upper-left), a counting grid is

imposed (upper-right) that yields a one-dimensional histogram of point counts (lower-right)

OpenCV has a data type for representing histograms. The histogram data structure is capable of

representing histograms in one or many dimensions, and it contains all the data necessary to track bins of

both uniform and non-uniform sizes. And, as you might expect, it comes equipped with a variety of useful

functions that allow us to easily perform common operations on our histograms.

This is also true of histograms representing information that falls naturally into discrete groups when the histogram

uses fewer bins than the natural description would suggest or require. An example of this is representing 8-bit intensity

values in a 10-bin histogram: each bin would then combine the points associated with approximately 25 different

intensities, (erroneously) treating them all as equivalent.

Get Learning OpenCV, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Learning OpenCV, 2nd Edition by

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly