Module: Fetching Latitude/Longitude Data from the Web

Credit: Will Ware

Given a list of cities, Example 11-1 fetches their latitudes and longitudes from one web site (http://www.astro.ch, a database used for astrology, of all things) and uses them to dynamically build a URL for another web site (http://pubweb.parc.xerox.com), which, in turn, creates a map highlighting the cities against the outlines of continents. Maybe someday a program will be clever enough to load the latitudes and longitudes as waypoints into your GPS receiver.

The code can be vastly improved in several ways. The main fragility of the recipe comes from relying on the exact format of the HTML page returned by the http://www.astro.com site, particularly in the rather clumsy for x in inf.readlines( ) loop in the findcity function. If this format ever changes, the recipe will break. You could change the recipe to use htmllib.HTMLParser instead, and be a tad more immune to modest format changes. This helps only a little, however. After all, HTML is meant for human viewers, not for automated parsing and extraction of information. A better approach would be to find a site serving similar information in XML (including, quite possibly, XHTML, the XML/HTML hybrid that combines the strengths of both of its parents) and parse the information with Python’s powerful XML tools (covered in Chapter 12).

However, despite this defect, this recipe still stands as an example of the kind of opportunity already afforded today by existing services on the Web, without having to wait for the emergence of commercialized web services.

Example 11-1. Fetching latitude/longitude data from the Web

import string, urllib, re, os, exceptions, webbrowser

JUST_THE_US = 0

class CityNotFound(exceptions.Exception): pass

def xerox_parc_url(marklist):
    """ Prepare a URL for the xerox.com map-drawing service, with marks
    at the latitudes and longitudes listed in list-of-pairs marklist. """
    avg_lat, avg_lon = max_lat, max_lon = marklist[0]
    marks = ["%f,%f" % marklist[0]]
    for lat, lon in marklist[1:]:
        marks.append(";%f,%f" % (lat, lon))
        avg_lat = avg_lat + lat
        avg_lon = avg_lon + lon
        if lat > max_lat: max_lat = lat
        if lon > max_lon: max_lon = lon
    avg_lat = avg_lat / len(marklist)
    avg_lon = avg_lon / len(marklist)
    if len(marklist) == 1:
        max_lat, max_lon = avg_lat + 1, avg_lon + 1
    diff = max(max_lat - avg_lat, max_lon - avg_lon)
    D = {'height': 4 * diff, 'width': 4 * diff,
         'lat': avg_lat, 'lon': avg_lon,
         'marks': ''.join(marks)}
    if JUST_THE_US:
        url = ("http://pubweb.parc.xerox.com/map/db=usa/ht=%(height)f" +
               "/wd=%(width)f/color=1/mark=%(marks)s/lat=%(lat)f/" +
               "lon=%(lon)f/") % D
    else:
        url = ("http://pubweb.parc.xerox.com/map/color=1/ht=%(height)f" +
               "/wd=%(width)f/color=1/mark=%(marks)s/lat=%(lat)f/" +
               "lon=%(lon)f/") % D
    return url

def findcity(city, state):
    Please_click = re.compile("Please click")
    city_re = re.compile(city)
    state_re = re.compile(state)
    url = ("""http://www.astro.ch/cgi-bin/atlw3/aq.cgi?expr=%s&lang=e"""
           % (string.replace(city, " ", "+") + "%2C+" + state))
    lst = [ ]
    found_please_click = 0
    inf = urllib.FancyURLopener(  ).open(url)
    for x in inf.readlines(  ):
        x = x[:-1]
        if Please_click.search(x) != None:
            # Here is one assumption about unchanging structure
            found_please_click = 1
        if (city_re.search(x) != None and
            state_re.search(x) != None and
            found_please_click):
            # Pick apart the HTML pieces
            L = [ ]
            for y in string.split(x, '<'):
                L = L + string.split(y, '>')
            # Discard any pieces of zero length
            lst.append(filter(None, L))
    inf.close(  )
    try:
        # Here's a few more assumptions
        x = lst[0]
        lat, lon = x[6], x[10]
    except IndexError:
        raise CityNotFound("not found: %s, %s"%(city, state))
    def getdegrees(x, dividers):
        if string.count(x, dividers[0]):
            x = map(int, string.split(x, dividers[0]))
            return x[0] + (x[1] / 60.)
        elif string.count(x, dividers[1]):
            x = map(int, string.split(x, dividers[1]))
            return -(x[0] + (x[1] / 60.))
        else:
            raise CityNotFound("Bogus result (%s)" % x)
    return getdegrees(lat, "ns"), getdegrees(lon, "ew")

def showcities(citylist):
    marklist = [ ]
    for city, state in citylist:
        try:
            lat, lon = findcity(city, state)
            print ("%s, %s:" % (city, state)), lat, lon
            marklist.append((lat, lon))
        except CityNotFound, message:
            print "%s, %s: not in database? (%s)" % (city, state, message)
    url = xerox_parc_url(marklist)
    # Print URL
    # os.system('netscape "%s"' % url)
    webbrowser.open(url)

# Export a few lists for test purposes

citylist = (("Natick", "MA"),
            ("Rhinebeck", "NY"),
            ("New Haven", "CT"),
            ("King of Prussia", "PA"))

citylist1 = (("Mexico City", "Mexico"),
             ("Acapulco", "Mexico"),
             ("Abilene", "Texas"),
             ("Tulum", "Mexico"))

citylist2 = (("Munich", "Germany"),
             ("London", "England"),
             ("Madrid", "Spain"),
             ("Paris", "France"))

if _ _name_ _=='_ _main_ _':
    showcities(citylist1)

See Also

Documentation for the standard library module htmlllib in the Library Reference; information about the Xerox PARC map viewer is at http://www.parc.xerox.com/istl/projects/mapdocs/; AstroDienst hosts a worldwide server of latitude/longitude data (http://www.astro.com/cgi-bin/atlw3/aq.cgi).

Get Python Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.