HTML and screen scraping

Although more and more services are offering their data through APIs, when a service doesn't do this then the only way of getting the data programmatically is to download its web pages and then parse the HTML source code. This technique is called screen scraping.

Though it sounds simple enough in principle, screen scraping should be approached as a last resort. Unlike XML, where the syntax is strictly enforced and data structures are usually reasonably stable and sometimes even documented, the world of web page source code is a messy one. It is a fluid place, where the code can change unexpectedly and in a way that can completely break your script and force you to rework the parsing logic from scratch.

Still, it is sometimes ...

Get Learning Python Network Programming now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.