O'Reilly logo

PHP Hacks by Jack D. Herrington

Stay ahead with the world's most comprehensive technology and business learning platform.

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, tutorials, and more.

Start Free Trial

No credit card required

Hack #84. Spider Your Site

Use the HTTP_Client PEAR module to create a spider that walks all of the pages on your web site.

This hack demonstrates using PHP to write a spider for checking out the pages on your site. This is ideal for testing purposes and makes it simple to ensure that all of the PHP and HTML on your site still responds properly after an update.

The Code

Save the code in Example 8-12 as spider.php.

Example 8-12. A simple spider

<?php
require_once 'HTTP/Client.php';
require_once 'HTTP/Request/Listener.php';

$baseurl = "http://localhost/phphacks/spider/test/index.html";
$pages = array();

add_urls( $baseurl );

while( ( $page = next_page() ) != null )
{
  add_urls( $page );
}

function next_page()
{
  global $pages;
  foreach( array_keys( $pages ) as $page )
  {
	if ( $pages[ $page ] == null )
	  return $page;
  }
  return null;
}

function add_urls( $page )
{
  global $pages;

  $start = microtime();
  $urls = get_urls( $page );
  $resptime = microtime() - $start;

  print "$page…\n";$pages[ $page ] = array( 'resptime' => floor( $resptime * 1000 ), 'url' =>
$page );

  foreach( $urls as $url )
  {
	if ( !array_key_exists( $url, $pages ) )
	  $pages[ $url ] = null;
  }
}

function get_urls( $page )
{
  $base = preg_replace( "/\/([^\/]*?)$/", "/", $page );

  $client = new HTTP_Client(); $client->get( $page ); $resp = $client->currentResponse(); $body = $resp['body']; $out = array(); preg_match_all( "/(\<a.*?\>)/is", $body, $matches ); foreach( $matches[0] as $match ) { preg_match( "/href=(.*?)[\s|\>]/i", $match, $href ...

With Safari, you learn the way you learn best. Get unlimited access to videos, live online training, learning paths, books, interactive tutorials, and more.

Start Free Trial

No credit card required