Working with Web-Based Data (HTTP)

Let’s say you want to acquire some data from the web site of one of your business partners. There are lots of ways to retrieve a web page:

  • “By hand,” that is, by pointing your web browser to the right location.

  • Using a scripting language such as Perl, which, incidentally, has lots of available gizmos and gadgets to interpret the data once you retrieve it.

  • Via a command-line utility such as GNU wget (one of my favorite utilities).

  • Using Oracle’s built-in package UTL_HTTP.

Since this is a book about PL/SQL, guess which method I’ll be discussing!

If you’re running Oracle Database 11g Release 2 or later, you will need to set up a network ACL to permit outbound connections to any desired remote hosts, as mentioned in the previous section.

Let’s start with a relatively simple means of coding the retrieval of a web page. This first method, which slices up the web page and puts the slices into an array, actually predates Oracle’s support of CLOBs.

Retrieve a Web Page in “Pieces”

One of the first procedures that Oracle ever released in the UTL_HTTP package retrieves a web page into consecutive elements of an associative array. Usage can be pretty simple:

DECLARE
   page_pieces UTL_HTTP.html_pieces; -- array of VARCHAR2(2000)
BEGIN
   page_pieces := UTL_HTTP.request_pieces(url => 'http://www.oreilly.com/');
END;

This format is not terribly fun to work with, because the 2,000-byte boundaries are unrelated to anything you would find on the text of the page. So if you have ...

Get Oracle PL/SQL Programming, 5th Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.