Performing OCR on an image with pytesseract

It is possible to extract text from within images using the pytesseract library.  In this recipe, we will use pytesseract to extract text from an image. Tesseract is an open source OCR library sponsored by Google.  The source is available here: https://github.com/tesseract-ocr/tesseract, and you can also find more information on the library there. 0;pytesseract is a thin python wrapper that provides a pythonic API to the executable.

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.