Name
wget [options
] URL
— wget
Synopsis
/usr/bin
stdin stdout - file -- opt --help --version
The wget
command hits a URL and downloads the information to a file or standard output. It’s great for capturing individual pages or entire web page hierarchies to arbitrary depth. For example, let’s capture the Yahoo home page:
$ wget http://www.yahoo.com --23:19:51-- http://www.yahoo.com/ => `index.html' Resolving www.yahoo.com... done. Connecting to www.yahoo.com[216.109.118.66]:80... connected. HTTP request sent, awaiting response... 200 OK Length: unspecified [text/html] [ <=> ] 31,434 220.84K/s 23:19:51 (220.84 KB/s) - `index.html' saved [31434]
which is saved to a file index.html in the current directory. wget
has the added ability to resume a download if it gets interrupted in the middle, say, due to a network failure: just run wget -c
with the same URL and it picks up where it left off.
Another similar command is curl
, which writes to standard output by default—unlike wget
, which duplicates the original page filenames by default.
$ curl http://www.yahoo.com > mypage.html
wget
has over 70 options, so we’ll cover just a few important ones. (curl
has a different set of options; see its manpage.)
Useful options | |
| Read URLs from the given file and retrieve them in turn. |
| Write all the captured HTML to the given file, one page appended after the other. |
| Continue mode: if a previous retrieval was interrupted, leaving only a partial file as a result, pick up where |
Get Linux Pocket Guide now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.