MLPs in Apache Spark

Let's return to our dataset and train an MLP in Apache Spark to recognize and classify letters from the English alphabet. If you open ocr-data/letter-recognition.data in any text editor, from either the GitHub repository accompanying this book or from UCI's machine learning repository, you will find 20,000 rows of data, described by the following schema:

Column name

Data type

Description

lettr

String

English letter (one of 26 values, from A to Z)

x-box

Integer

Horizontal position of box

y-box

Integer

Vertical position of box

width

Integer

Width of box

high

Integer

Height of box

onpix

Integer

Total number of on pixels

x-bar

Integer

Mean x of on pixels in the ...

Get Machine Learning with Apache Spark Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.