Adding the Payload

The payload used by this spider is an extension of the library used in Chapter 8 to download all the images found on a web page. This time, however, we'll download all the images referenced by the entire website. The code that adds the payload to the spider is shown in Listing 18-7. You can tack this code directly onto the end of the script for the earlier spider.

 # Add the payload to the simple spider // Include download and directory creation lib include("LIB_download_images.php"); // Download images from pages referenced in $spider_array for($penetration_level=1; $penetration_level<=$MAX_PENETRATION; $penetration_level++) { for($xx=0; $xx<count($spider_array[$previous_level]); $xx++) { download_images_for_page($spider_array[$previous_level][$xx]); ...

Get Webbots, Spiders, and Screen Scrapers now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.