O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

A spider that crawls based on an Excel file

Most of the time you have one spider per source web-site, but there are cases where you want to scrape data from many websites and the only thing that changes between them is the XPath expressions you use. In these cases, it feels like overkill to have a spider for every site. Can you crawl through them all with a single spider? The answer is yes.

Let's create a new project for this experiment as the items that we crawl are very different (actually we won't define any in this project!). I assume that we were in the properties directory of ch05. Let's go one level up, as follows:

$ pwd
/root/book/ch05/properties
$ cd ..
$ pwd
/root/book/ch05

We can create a new project named generic and a spider named ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required