Name

wget [options] URL — wget

Synopsis

/usr/bin stdin stdout - file -- opt --help --version

The wget command hits a URL and downloads the information to a file or standard output. It’s great for capturing individual pages or entire web page hierarchies to arbitrary depth. For example, let’s capture the Yahoo home page:

$ wget http://www.yahoo.com
--23:19:51--  http://www.yahoo.com/
           => `index.html'
Resolving www.yahoo.com... done.
Connecting to www.yahoo.com[216.109.118.66]:80...
connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]

    [ <=>                                 ] 31,434
220.84K/s

23:19:51 (220.84 KB/s) - `index.html' saved [31434]

which is saved to a file index.html in the current directory. wget has the added ability to resume a download if it gets interrupted in the middle, say, due to a network failure: just run wget -c with the same URL and it picks up where it left off.

Another similar command is curl, which writes to standard output by default—unlike wget, which duplicates the original page filenames by default.

$ curl http://www.yahoo.com > mypage.html

wget has over 70 options, so we’ll cover just a few important ones. (curl has a different set of options; see its manpage.)

Useful options

-i filename

Read URLs from the given file and retrieve them in turn.

-O filename

Write all the captured HTML to the given file, one page appended after the other.

-c

Continue mode: if a previous retrieval was interrupted, leaving only a partial file as a result, pick up where wget left off. ...

Get Linux Pocket Guide now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.