Getting links from a URL with urllib2

In this script, we can see how to extract links using urllib2 and HTMLParser. HTMLParser is a module that allows us to parse text files formatted in HTML.

You can get more information at https://docs.python.org/2/library/htmlparser.html.

You can find the following code in the get_links_from_url.py file:

#!/usr/bin/pythonimport urllib2from HTMLParser import HTMLParserclass myParser(HTMLParser):    def handle_starttag(self, tag, attrs):        if (tag == "a"):            for a in attrs:                if (a[0] == 'href'):                    link = a[1]                    if (link.find('http') >= 0):                        print(link)                        newParse = myParser()                        newParse.feed(link)web =  raw_input("Enter url: ")url = "http://"+webrequest = urllib2.Request(url)handle = urllib2.urlopen(request)parser = myParser() ...

Get Mastering Python for Networking and Security now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.