Getting the data

We will programmatically download the data using Python's standard inbuilt toolkit called urlretrieve from urllib.request. The following is our download-from-internet piece:

from pathlib import Pathimport pandas as pdimport gzipfrom urllib.request import urlretrievefrom tqdm import tqdmimport osimport numpy as npclass TqdmUpTo(tqdm):    def update_to(self, b=1, bsize=1, tsize=None):        if tsize is not None: self.total = tsize        self.update(b * bsize - self.n)

If you are using the fastAI environment, all of these imports work. The second block simply sets up Tqdm for us to visualize the download progress. Let's now download the data using urlretrieve, as follows:

def get_data(url, filename):    """ Download data if the filename does ...

Get Natural Language Processing with Python Quick Start Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.