Non-Interactive Downloads Using wget

Manually saving individual pages from a browser works fine when you are only looking at a few. At some point you will want to automate the process, especially when you want to archive an entire site. wget is the perfect tool for automating these downloads, so I will spend a few pages describing how it can be used.

wget is a Unix command-line tool for the non-interactive download of web pages. You can download from http://www.gnu.org/software/wget/, if your system does not already have it installed. A binary for Microsoft Windows is also available. It is a very flexible tool with a host of options listed in its manual page.

Downloading a Single Page

Capturing a single web page with wget is straightforward. Give it a URL, with no other options, and it will download the page into the current working directory with the same filename as that on the web site:

               % wget http://www.oreilly.com/index.html
    --08:52:06--  http://www.oreilly.com/index.html
               => `index.html'
    Resolving www.oreilly.com... 208.201.239.36, 208.201.239.37
    Connecting to www.oreilly.com[208.201.239.36]:80... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: 54,774 [text/html]

    100%[=====================================================>]
    54,774       135.31K/s

    08:52:07 (134.96 KB/s) - `index.html' saved [54774/54774]

Using the -nv option (non-verbose) suppresses most of these status messages, and the -q option silences it completely.

Saving the file with the same name might be a problem ...

Get Internet Forensics now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.