How to parse websites and navigate the DOM using BeautifulSoup

When the browser displays a web page it builds a model of the content of the page in a representation known as the document object model (DOM). The DOM is a hierarchical representation of the page's entire content, as well as structural information, style information, scripts, and links to other content.

It is critical to understand this structure to be able to effectively scrape data from web pages. We will look at an example web page, its DOM, and examine how to navigate the DOM with Beautiful Soup.

Get Python Web Scraping Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.