There's more...

The Wisconsin Breast Cancer dataset is widely used in the machine learning community. The dataset contains limited attributes and most of them are discrete numbers. It's very easy to apply a classification algorithm and regression model to the dataset.

More than 20 research papers and publications already cite this dataset, and it is available publicly and very easy to use.

The dataset has the multivariate datatype, where attributes are integers, and the number of attributes are only 10. This makes it one of the typical datasets for classification and regression analysis for this chapter.

Get Apache Spark 2.x Machine Learning Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.