Much like regression, there are problems where linear methods don’t work well for classification. This section describes some machine learning algorithms for classification problems.

One of the simplest techniques for classification problems
is *k* nearest neighbors. Here’s how the algorithm
works:

The analyst specifies a “training” data set.

To predict the class of a new value, the algorithm looks for the

*k*observations in the training set that are closest to the new value.The prediction for the new value is the class of the “majority” of the

*k*nearest neighbors.

To use *k* nearest neighbors in R, use the
function `knn`

in the `class`

package:

libary(class) knn(train, test, cl, k = 1, l = 0, prob = FALSE, use.all = TRUE)

Here is the description of the arguments to the `knn`

function.

Argument | Description | Default |
---|---|---|

train | A matrix or data frame containing the training data. | |

test | A matrix or data frame containing the test data. | |

cl | A factor specifying the classification of observations in the training set. | |

k | A numeric value specifying the number of neighbors to consider. | `1` |

l | When k > 0, specifies the minimum
vote for a decision. (If there aren’t enough votes, the value
`doubt` is returned.) | `0` |

prob | If `prob=TRUE` , then the
proportion of votes for the winning class is returned as
attribute `prob` . | `FALSE` |

use.all | Controls the handling of ties when selecting nearest
neighbors. If `use.all=TRUE` ,
then all distances equal to the kth largest
are included. If `use.all=FALSE ...` |

Start Free Trial

No credit card required