The crawl starts with defining a one level of depth crawl:
crawl_depth = 1process = CrawlerProcess({ 'LOG_LEVEL': 'ERROR', 'DEPTH_LIMIT': crawl_depth})process.crawl(WikipediaSpider)spider = next(iter(process.crawlers)).spiderspider.max_items_per_page = 5spider.max_crawl_depth = crawl_depthprocess.start()for pm in spider.linked_pages: print(pm.depth, pm.link, pm.child_link)print("-"*80)
This information is similar to the previous recipe, and new we need to convert it into a model that NetworkX can use for a graph. This starts with creating a NetworkX graph model:
g = nx.Graph()
A NetworkX graph consists of nodes and edges. From the data collected we must crate a set of unique nodes (the pages) and the edges (the fact that a page ...