#61 Extracting URLs from a Web Page

A straightforward shell script application of lynx is to extract a list of URLs on a given web page, which can be quite helpful in a variety of situations.

The Code

 #!/bin/sh # getlinks - Given a URL, returns all of its internal and # external links. if [ $# -eq 0 ] ; then echo "Usage: $0 [-d|-i|-x] url" >&2 echo "-d=domains only, -i=internal refs only, -x=external only" >&2 exit 1 fi if [ $# -gt 1 ] ; then case "$1" in -d) lastcmd="cut -d/ -f3 | sort | uniq" shift ;; -i) basedomain="http://$(echo $2 | cut -d/ -f3)/" lastcmd="grep \"^$basedomain\" | sed \"s|$basedomain||g\" | sort | uniq" shift ;; -x) basedomain="http://$(echo $2 | cut -d/ -f3)/" lastcmd="grep -v \"^$basedomain\" | sort | uniq" shift ;; ...

Get Wicked Cool Shell Scripts now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.