Classification

Supervised learning problems can be further divided into two groups: classification and regression. For the classification problem, the output variable, such as y, could be a binary variable, that is, 0 or 1, or several categories. For a regression, variables or values could be discrete or continuous. In the Titanic example, we have 1 for survived and 0 for not survived. For a regression problem, the output could be a value, such as, 2.5 or 0.234. In the previous chapter, we discussed the concept of distance between group members within the same group and between groups.

The logic for classification is that the distance between (among) group members is shorter than the distance between different groups. Alternatively speaking, ...

Get Hands-On Data Science with Anaconda now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.