7

Advanced Analytical Theory and Methods: Classification

Key Concepts

Classification learning

Naïve Bayes

Decision tree

ROC curve

Confusion matrix

In addition to analytical methods such as clustering (Chapter 4, “Advanced Analytical Theory and Methods: Clustering”), association rule learning Chapter 5, “Advanced Analytical Theory and Methods: Association Rules”, and modeling techniques like regression (Chapter 6, “Advanced Analytical Theory and Methods: Regression”), classification is another fundamental learning method that appears in applications related to data mining. In classification learning, a classifier is presented with a set of examples that are already classified and, from these examples, the classifier learns to assign unseen examples. In other words, the primary task performed by classifiers is to assign class labels to new observations. Logistic regression from the previous chapter is one of the popular classification methods. The set of labels for classifiers is predetermined, unlike in clustering, which discovers the structure without a training set and allows the data scientist optionally to create and assign labels to the clusters.

Most classification methods are supervised, in that they start with a training set of prelabeled observations to learn how likely the attributes of these observations may contribute to the classification of future unlabeled observations. For example, existing marketing, sales, and customer demographic data can be used to develop ...

Get Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.