Hack #98. Pull the HTML Source Code from a Web Site

Integrate web data into your application.

"Use a Browser Inside Access" [Hack #97] shows you how to use the Microsoft Web Browser control to display a web page. This hack takes that functionality a step further and shows how to get to the source code. Being able to access the source code makes it possible to extract data from a web site.

Figure 10-8 shows a web site being displayed in the browser control, and a message box displays the site's HTML.

Reading the HTML source from a web site

Figure 10-8. Reading the HTML source from a web site

Tip

The Microsoft Web Browser control has an extensive programmatic model. Visit http://msdn.microsoft.com/library/default.asp?url=/workshop/browser/prog_browser_node_entry.asp for more information.

The HTML is returned with this line of code:

   MsgBox Me.WebBrowser1.Document.documentElement.innerhtml

The programmatic model for the web browser control follows the document object model (DOM). As the browser displays a web site, documentElement and its child nodes become available. In this example, the full HTML is accessed with the innerhtml property. Because the HTML is accessible, you can pass it to any routine you want. For example, you can have a routine that looks for HTML tables from which to pull data or that searches through the HTML for keywords, and so on.

Get Access Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.