O'Reilly logo

Learning Scrapy by Dimitrios Kouzis-Loukas

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Chapter 2. Understanding HTML and XPath

In order to extract information from web pages, you have to understand a little bit more about their structure. We will have a quick look at HTML, the tree representation of HTML, and XPath as a way of selecting information on web pages.

HTML, the DOM tree representation, and the XPath

Let's spend some time understanding the process that takes place from when a user types a URL on the browser (or more often, when he/she clicks on a link or a bookmark) until a page is displayed on the screen. From the perspective of this book, this process has four steps:

  • A URL is typed on the browser. The first part of the URL (the domain name, such as gumtree.com) is used to find the appropriate server on the web, and the ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required