Collecting Genesis Words in PHP

Here is a PHP script, getGenesisTags.php, which collects tags by counting thewords that appear in the book of Genesis in the Bible. The data is retrieved fromthe copy of the book of Genesis at the Project Gutenberg web site. (This script isavailable at http://examples.oreilly.com/tagclouds/ .) Let's see what it does.

<?
//
// Collect text from genesis

function getTags()
{
   global $tags;

The script contains a single function, called getTags(). This function will beinvoked from another script, makeTagCloud.php, which we will invoke later. Thepurpose of the getTags() function is to populate the global associative arraycalled $tags.

$url = 'http://www.gutenberg.org/dirs/etext05/bib0110.txt';

The previous line specifies the URL of the web page we are going to screen-scrape. This particular page contains the text of the book of Genesis. If you'd liketo use some other text, go to the Project Gutenberg web site ( http://www.gutenberg.org/ ) to find what you want.

To see what this text looks like in its raw form, check out the web page we'regrabbing in your browser:

    http://www.gutenberg.org/dirs/etext05/bib0110.txt
    // $txt = file_get_contents($url);
     $ch = curl_init();
     $timeout = 30; // set to zero for no timeout
     curl_setopt ($ch, CURLOPT_URL, $url);
     curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
     curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
     $txt = curl_exec($ch);
     curl_close($ch);

The previous lines retrieves the bible text from the Project Gutenberg web site. ...

Get Building Tag Clouds in Perl and PHP now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.