Case study – predicting breast cancer

Let's now apply logistic regression to a very important real-world use case; predicting patients who may have breast cancer. Approximately 1 in 8 women are diagnosed with breast cancer during their lifetime (with the disease also affecting men), resulting in the premature deaths of hundreds of thousands of women annually across the world. In fact, it is projected that over 2 million new cases of breast cancer will have been reported worldwide by the end of 2018 alone. Various factors are known to increase the risk of breast cancer, including age, weight, family history, and previous diagnoses.

Using a dataset of quantitative predictors, along with a binary dependent variable indicating the presence or ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.