Problem details

We often forget that websites are not just used by humans. A significant percentage of web traffic comes from other programs such as crawlers, bots, or scrapers. Sometimes, you will need to write such programs yourself to extract information from another website.

Generally, pages designed for human consumption are cumbersome for mechanical extraction. HTML pages have information surrounded by markup, requiring extensive cleanup. Sometimes, information will be scattered, needing extensive data collation and transformation.

A machine interface would be ideal in such situations. You cannot only reduce the hassle of extracting information, but also enable the creation of mashups. The longevity of an application will be greatly ...

Get Django Design Patterns and Best Practices - Second Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.