Link Scrape

Link Scrape

linkScrape.rb

Scraping links off of web pages has many uses. As with any problem, there are many ways to solve it. In Chapter 2 we wrote a script to validate links on a website. Because of the need to validate the links, the script required far more lines of code than if it had needed to simply scrape all of the links. We aren't going to be building a web spider, but I'll cover some of the basic components—the first of which is a link scraper.

The Code

 require 'mechanize'

 unless ARGV[0]
     puts "You must supply a website."
     puts "USAGE: ruby linkScrape.rb <url to scrape>"
     exit
 end

 agent = WWW::Mechanize.new agent.set_proxy('localhost',8080) ...

Get Wicked Cool Ruby Scripts now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.