The initial code base

Let's now list all of the code that we'll optimize in future, based on the earlier description.

The first of the following points is quite simple: a single file script that takes care of scraping and saving in JSON format like we discussed earlier. The flow is simple, and the order is as follows:

  1. It will query the list of questions page by page.
  2. For each page, it will gather the question's links.
  3. Then, for each link, it will gather the information listed from the previous points.
  4. It will move on to the next page and start over again.
  5. It will finally save all of the data into a JSON file.

The code is as follows:

from bs4 import BeautifulSoup import requests import json SO_URL = "http://scifi.stackexchange.com" QUESTION_LIST_URL = ...

Get Mastering Python High Performance now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.