Creating the dataset

In this chapter, we will take on the role of the bad guy. We want to create a program that can beat CAPTCHAs, allowing our comment spam program to advertise on someone's website. It should be noted that our CAPTCHAs will be a little easier that those used on the web today and that spamming isn't a very nice thing to do.

Our CAPTCHAs will be individual English words of four letters only, as shown in the following image:

Creating the dataset

Our goal will be to create a program that can recover the word from images like this. To do this, we will use four steps:

  1. Break the image into individual letters.
  2. Classify each individual letter.
  3. Recombine the letters ...

Get Python: Real-World Data Science now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.