Collecting Restaurant Reviews

This section concludes our studies of microformats—and Thai food—by briefly introducing hReview. Yelp is a popular service that implements hReview so that the ratings customers have left for restaurants can be exposed. Example 2-9 demonstrates how to extract hReview information as implemented by Yelp. A sample URL you might try is in the sample code and represents a Thai restaurant you definitely don’t want to miss if you ever have the opportunity to visit it.

Warning

Although the spec is pretty stable, hReview implementations seem to vary and include arbitrary deviations. In particular, Example 2-9 does not parse the reviewer as an hCard because Yelp’s implementation did not include it as such.

Example 2-9. Parsing hReview data for a Pad Thai recipe (microformats__yelp_hreview.py)

# -*- coding: utf-8 -*- import sys import re import urllib2 import json import HTMLParser from BeautifulSoup import BeautifulSoup # Pass in a URL that contains hReview info such as # http://www.yelp.com/biz/bangkok-golden-fort-washington-2 url = sys.argv[1] # Parse out some of the pertinent information for a Yelp review # Unfortunately, the quality of hReview implementations varies # widely so your mileage may vary. This code is *not* a spec # parser by any stretch. See http://microformats.org/wiki/hreview def parse_hreviews(url): try: page = urllib2.urlopen(url) except urllib2.URLError, e: print 'Failed to fetch ' + url raise e try: soup = BeautifulSoup(page) except ...

Get Mining the Social Web now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.