2.7. Downloading All Files from a Site

Problem

You need to create a backup, mirror, or offline copy of your web site.

Solution

Use the Unix utility wget to mirror the files on the server to another location either by HTTP with this command:

	wget --mirror http://yourwebsite.com

or by FTP:

	wget --mirror ftp://username:password@yourwebsite.com

Alternatively, you can use GUI-based utilities on your PC. Some choices are listed in the "See Also" section of this Recipe.

Discussion

With wget, you can perform heroic feats of webmastering, whether it's copying a single file from one site to another, or an entire site to another server.

Warning

When spidering a site over HTTP, wget will only copy files it finds links to. Unused images and old web pages still lingering on the server will be skipped. Using FTP, wget will copy everything.

Some scenarios where wget can be indispensable include:

Keeping frequently updated pages or images in sync on two sites

Say you want to display a real-time webcam image on your site, but don't want to (or can't) use an absolute URL to the site where the camera saves the image in the image tag's src attribute. (Perhaps the other site's server is slower or less reliable than yours, or outside linking to the image has been disabled, as described in Recipe 5.5.) With wget, you can specify the URL of the file, a local directory on your server where it should be copied, and the number of times to retry a flaky HTTP connection. Combined with cron (see Recipe 1.8), wget can perform ...

Get Web Site Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.