Scrape Google AdWords

Scrape the AdWords from a saved Google results page into a form suitable for importing into a spreadsheet or database.

Google’s AdWords—the text ads that appear to the right of the regular search results—are delivered on a cost-per-click basis, and purchasers of the AdWords are allowed to set a ceiling on the amount of money that they spend on their ad. This means that, even if you run a search for the same query word multiple times, you won’t necessarily get the same set of ads each time.

If you’re considering using Google AdWords to run ads, you might want to gather up and save the ads that are running for the query words that interest you. Google AdWords is not included in the functionality provided by the Google API, so you’re left to a little scraping to get at that data.

Tip

Be sure to read “A Note on Spidering and Scraping” in Chapter 9 for some understanding of what scraping means.

This hack will let you scrape the AdWords from a saved Google results page and export them to a comma-separated (CSV) file, which you can then import into Excel or your favorite spreadsheet program.

Tip

This hack requires a Perl module called HTML::TokeParser (http://search.cpan.org/search?query=htmL%3A%3Atokeparser&mode=all). You’ll need to install it before the hack will run.

The Code

Save this code to a text file named adwords.pl:

#!/usr/bin/perl # usage: perl adwords.pl results.html # use strict; use HTML::TokeParser; die "I need at least one file: $!\n" unless @ARGV; my @Ads; ...

Get Google Hacks, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.