How it works

XPath is a element of the XSLT (eXtensible Stylesheet Language Transformation) standard and provides the ability to select nodes in an XML document. HTML is a variant of XML, and hence XPath can work on on HTML document (although HTML can be improperly formed and mess up XPath parsing in those cases).

XPath itself is designed to model the structure of XML nodes, attributes, and properties. The syntax provides means of finding items in the XML that match the expression. This can include matching or logical comparison of any of the nodes, attributes, values, or text in the XML document.

XPath expressions can be combined to form very complex paths within the document. It is also possible to navigate the document based upon relative ...

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.