Chapter 11. Scraping JavaScript

Client-side scripting languages are languages that are run in the browser itself, rather than on a web server. The success of a client-side language depends on your browser’s ability to interpret and execute the language correctly. (This is why it’s so easy to disable JavaScript in your browser.)

Partly because of the difficulty of getting every browser manufacturer to agree on a standard, there are far fewer client-side languages than there are server-side languages. This is a good thing when it comes to web scraping: the fewer languages there are to deal with, the better.

For the most part, you’ll frequently encounter only two languages online: ActionScript (which is used by Flash applications) and JavaScript. ActionScript is used far less frequently today than it was 10 years ago, and is often used to stream multimedia files, as a platform for online games, or to display intro pages for websites that haven’t gotten the hint that no one wants to watch an intro page. At any rate, because there isn’t much demand for scraping Flash pages, this chapter instead focuses on the client-side language that’s ubiquitous in modern web pages: JavaScript.

JavaScript is, by far, the most common and most well-supported client-side scripting language on the web today. It can be used to collect information for user tracking, submit forms without reloading the page, embed multimedia, and even power entire online games. Even deceptively simple-looking pages can often ...

Get Web Scraping with Python, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.