Support-vector machines (SVMs) were introduced in Chapter 9, and are probably the most sophisticated classification method covered by this book. SVMs take datasets with numerical inputs and try to predict which category they fall into. You might, for example, want to decide positions for a basketball team from a list of people's heights and running speeds. To simplify, consider just two possibilities—front-court positions in which tall players are required, and back-court positions where you need the faster movers.
An SVM builds a predictive model by finding the dividing line between the two categories. If you plot a set of values for height versus speed and the best position for each person, you get a graph like the one shown in Figure 12-7. Front-court players are shown as Xs and back-court players are shown as Os. Also shown on the graph are a few lines that separate the data into the two categories.
Figure 12-7. Plot of basketball players and dividing lines
A support-vector machine finds the line that separates the data most cleanly, which means that it is the greatest possible distance from points near the dividing line. In Figure 12-7, although all the different lines separate the data, the one that does this best is the one labeled "Best." The only points necessary to determine where the line should be are the points closest to it, and these are known as ...