Synchronous web-scraping

The synchronous scraper only uses Python standard libraries such as urllib. It downloads the home page of three popular sites and a fourth site whose loading time can be delayed to simulate a slow connection. It prints the respective page sizes and the total running time.

Here's the code for the synchronous scraper located at src/extras/sync.py:

"""Synchronously download a list of webpages and time it""" from urllib.request import Request, urlopen from time import time sites = [ "http://news.ycombinator.com/", "https://www.yahoo.com/", "http://www.aliexpress.com/", "http://deelay.me/5000/http://deelay.me/", ] def find_size(url): req = Request(url) with urlopen(req) as response: page = response.read() return len(page) ...

Get Django Design Patterns and Best Practices - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.