Getting ready

Make sure you have pytesseract installed:

pip install pytesseract

You will also need to install tesseract-ocr.  On Windows, there is an executable installer, which you can get here: https://github.com/tesseract-ocr/tesseract/wiki/4.0-with-LSTM#400-alpha-for-windows.  On a Linux system, you can use apt-get:

sudo apt-get tesseract-ocr

The easiest means of installation on a Mac is using brew:

brew install tesseract

The code for this recipe is in 04/10_perform_ocr.py.

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.