18.7. Processing Every Word in a File
Problem
You want to do something with every word in a file.
Solution
Read in each line with fgets( )
, separate the line
into words, and process each word:
$fh = fopen('great-american-novel.txt','r') or die($php_errormsg); while (! feof($fh)) { if ($s = fgets($fh,1048576)) { $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY); // process words } } fclose($fh) or die($php_errormsg);
Discussion
Here’s how to calculate average word length in a file:
$word_count = $word_length = 0; if ($fh = fopen('great-american-novel.txt','r')) { while (! feof($fh)) { if ($s = fgets($fh,1048576)) { $words = preg_split('/\s+/',$s,-1,PREG_SPLIT_NO_EMPTY); foreach ($words as $word) { $word_count++; $word_length += strlen($word); } } } } print sprintf("The average word length over %d words is %.02f characters.", $word_count, $word_length/$word_count);
Processing every word proceeds differently depending on how
“word” is defined. The code in this
recipe uses the
Perl-compatible
regular-expression engine’s \s
whitespace metacharacter, which includes space, tab, newline,
carriage return, and formfeed. Section 2.6
breaks apart a line into words by splitting on a space, which is
useful in that recipe because the words have to be rejoined with
spaces. The Perl-compatible engine also has a word-boundary assertion
(\b
) that matches between a word character
(alphanumeric) and a nonword character (anything else). Using
\b
instead of \s
to delimit words most noticeably treats ...
Get PHP Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.