Accessing Community Features

Amazon of course provides access to all of their community features through their web site. As more and more sites integrate closely with Amazon, though, there is more demand to tap into the community via code.

Accessing Through Web Services

The Web Services API (see Chapter 6) offers some access. When accessing an individual product’s information through the API, you can find the following community data:

  • The three latest reviews

  • ASINs of five related items

  • Three lists that contain the item

This is fantastic information to have access to. Developers are building tools that work with this data in many creative ways. But when compared with the volume of information that’s available on Amazon’s site, the community information in the API is only a small window into the larger community. That leaves one route for integration-minded developers: screen scraping.

Accessing Through Screen Scraping

The term screen scraping refers to requesting a web page programmatically with a script, and picking through the resulting HTML for the interesting data. Finding the data itself involves writing complex regular expressions . Regular expressions are a pattern-matching syntax that can become complicated quickly. For example, here’s a regular expression that extracts a list of books from a purchase circle page [Hack #44]:

<td.*?<b><a.*?-/(.*?)/.*?>(.*?)</a></b>.*?by (.*?)<br>.*?</td>

You can see some HTML there, and the expressions are based on where the data is within the HTML ...

Get Amazon Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.