Consider what would happen if you tried to use the linear classifier on a dataset similar to the one in Figure 9-7.
Figure 9-7. A class encircling another class
Where would the average points be for each class? They would both be in exactly the same place! Even though it's clear to you and me that anything inside the circle is an X and everything outside the circle is an O, the linear classifier is unable to distinguish these two classes.
But consider what happens if you square every x and y value first. A point that was at (−1,2) would now be at (1,4), a point that was at (0.5,1) would now be at (0.25,1), and so on. The new plot would look like Figure 9-8.
Figure 9-8. Moving the points into a different space
All the Xs have moved into the corner and all the Os are outside that corner. It's now very easy to divide the Xs and Os with a straight line, and any time a new piece of data has to be classified, you can just square its x and y values and see on which side of the line it falls.
This example shows that by transforming the points first, it's possible to create a new dataset that can be divided with a straight line. However, this example was chosen precisely because it can be transformed very easily; in real problems, the transformation will likely be a lot ...