How it works

Let's first discuss the Dockerfile by walking through what it told Docker to do during the build process. The first line:

FROM python:3

This informs Docker that we want to build our container image based on the Python:3 image found on Docker Hub. This is a prebuilt Linux image with Python 3 installed. The next line informs Docker that we want all of our file operations to be relative to the /usr/src/app folder.

WORKDIR /usr/src/app

At this point in building the image we have a base Python 3 install in place. We need to then install the various libraries that our scraper uses, so the following tells Docker to run pip to install them:

RUN pip install nameko BeautifulSoup4 nltk lxml

We also need to install the NLTK data files:

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.