Chapter 1. Strings

Introduction

Strings in PHP are sequences of bytes, such as “We hold these truths to be self-evident” or “Once upon a time” or even “111211211.” When you read data from a file or output it to a web browser, your data is represented as strings.

PHP strings are binary-safe (i.e., they can contain null bytes) and can grow and shrink on demand. Their size is limited only by the amount of memory that is available to PHP.

Warning

Usually, PHP strings are ASCII strings. You must do extra work to handle non-ASCII data like UTF-8 or other multibyte character encodings (see Chapter 19).

Similar in form and behavior to Perl and the Unix shell, strings can be initialized in three ways: with single quotes, with double quotes, and with the “here document” (heredoc) format. With single-quoted strings, the only special characters you need to escape inside a string are the backslash and the single quote itself. This example shows four single-quoted strings:

print 'I have gone to the store.';
print 'I\'ve gone to the store.';
print 'Would you pay $1.75 for 8 ounces of tap water?';
print 'In double-quoted strings, newline is represented by \n';

It prints:

I have gone to the store.
I've gone to the store.
Would you pay $1.75 for 8 ounces of tap water?
In double-quoted strings, newline is represented by \n

Caution

The preceding output shows what the raw output looks like. If you view it in a web browser, you will see all the sentences on the same line because HTML requires additional markup to insert line breaks.

Because PHP doesn’t check for variable interpolation or almost any escape sequences in single-quoted strings, defining strings this way is straightforward and fast.

Double-quoted strings don’t recognize escaped single quotes, but they do recognize interpolated variables and the escape sequences shown in Table 1-1.

Table 1-1. Double-quoted string escape sequences
Escape sequenceCharacter

\n

Newline (ASCII 10)

\r

Carriage return (ASCII 13)

\t

Tab (ASCII 9)

\\

Backslash

\$

Dollar sign

\"

Double quote

\0 through \777

Octal value

\x0 through \xFF

Hex value

Example 1-1 shows some double-quoted strings.

Example 1-1. Double-quoted strings
print "I've gone to the store.";
print "The sauce cost \$10.25.";
$cost = '$10.25';
print "The sauce cost $cost.";
print "The sauce cost \$\061\060.\x32\x35.";

Example 1-1 prints:

I've gone to the store.
The sauce cost $10.25.
The sauce cost $10.25.
The sauce cost $10.25.

The last line of Example 1-1 prints the price of sauce correctly because the character 1 is ASCII code 49 decimal and 061 octal. Character 0 is ASCII 48 decimal and 060 octal; 2 is ASCII 50 decimal and 32 hex; and 5 is ASCII 53 decimal and 35 hex.

Heredoc-specified strings recognize all the interpolations and escapes of double-quoted strings, but they don’t require double quotes to be escaped. Heredocs start with <<< and a token. That token (with no leading or trailing whitespace), followed by a semicolon to end the statement (if necessary), ends the heredoc. Example 1-2 shows how to define a heredoc.

Example 1-2. Defining a here document
print <<< END
It's funny when signs say things like:
   Original "Root" Beer
   "Free" Gift
   Shoes cleaned while "you" wait
or have other misquoted words.
END;

Example 1-2 prints:

It's funny when signs say things like:
   Original "Root" Beer
   "Free" Gift
   Shoes cleaned while "you" wait
or have other misquoted words.

Newlines, spacing, and quotes are all preserved in a heredoc. By convention, the end-of-string identifier is usually all caps, and it is case sensitive. Example 1-3 shows two more valid heredocs.

Example 1-3. More here documents
print <<< PARSLEY
It's easy to grow fresh:
Parsley
Chives
on your windowsill
PARSLEY;

print <<< DOGS
If you like pets, yell out:
DOGS AND CATS ARE GREAT!
DOGS;

Heredocs are especially useful for printing out HTML with interpolated variables because you don’t have to escape the double quotes that appear in the HTML elements. Example 1-4 uses a heredoc to print HTML.

Example 1-4. Printing HTML with a here document
if ($remaining_cards > 0) {
    $url = '/deal.php';
    $text = 'Deal More Cards';
} else {
    $url = '/new-game.php';
    $text = 'Start a New Game';
}
print <<< HTML
There are <b>$remaining_cards</b> left.
<p>
<a href="$url">$text</a>
HTML;

In Example 1-4, the semicolon needs to go after the end-of-string delimiter to tell PHP the statement is ended. In some cases, however, you shouldn’t use the semicolon. One of these cases is shown in Example 1-5, which uses a heredoc with the string concatenation operator.

Example 1-5. Concatenation with a here document
$html = <<< END
<div class="$divClass">
<ul class="$ulClass">
<li>
END
. $listItem . '</li></div>';

print $html;

Assuming some reasonable values for the $divClass, $ulClass, and $listItem variables, Example 1-5 prints:

<div class="class1">>
<ul class="class2">
<li> The List Item </li></div>

In Example 1-5, the expression needs to continue on the next line, so you don’t use a semicolon. Note also that in order for PHP to recognize the end-of-string delimiter, the . string concatenation operator needs to go on a separate line from the end-of-string delimiter.

Nowdocs are similar to heredocs, but there is no variable interpolation. So, nowdocs are to heredocs as single-quoted strings are to double-quoted strings. They’re best when you have a block of non-PHP code, such as JavaScript, that you want to print as part of an HTML page or send to another program.

For example, if you’re using jQuery:

$js = <<<'__JS__'
$.ajax({
  'url': '/api/getStock',
  'data': {
    'ticker': 'LNKD'
  },
  'success': function( data ) {
    $( "#stock-price" ).html( "<strong>$" + data + "</strong>" );
  }
});
__JS__;

print $js;

Individual bytes in strings can be referenced with square brackets. The first byte in the string is at index 0. Example 1-6 grabs one byte from a string.

Example 1-6. Getting an individual byte in a string
$neighbor = 'Hilda';
print $neighbor[3];

Example 1-6 prints:

d

Accessing Substrings

Problem

You want to know if a string contains a particular substring. For example, you want to find out if an email address contains a @.

Solution

Use strpos(), as in Example 1-7.

Example 1-7. Finding a substring with strpos( )
if (strpos($_POST['email'], '@') === false) {
    print 'There was no @ in the e-mail address!';
}

Discussion

The return value from strpos() is the first position in the string (the “haystack”) at which the substring (the “needle”) was found. If the needle wasn’t found at all in the haystack, strpos() returns false. If the needle is at the beginning of the haystack, strpos() returns 0 because position 0 represents the beginning of the string. To differentiate between return values of 0 and false, you must use the identity operator (===) or the not–identity operator (!==) instead of regular equals (==) or not-equals (!=). Example 1-7 compares the return value from strpos() to false using ===. This test only succeeds if strpos() returns false, not if it returns 0 or any other number.

See Also

Documentation on strpos().

Extracting Substrings

Problem

You want to extract part of a string, starting at a particular place in the string. For example, you want the first eight characters of a username entered into a form.

Solution

Use substr() to select your substring, as in Example 1-8.

Example 1-8. Extracting a substring with substr( )
$substring = substr($string,$start,$length);
$username = substr($_GET['username'],0,8);

Discussion

If $start and $length are positive, substr() returns $length characters in the string, starting at $start. The first character in the string is at position 0. Example 1-9 has positive $start and $length.

Example 1-9. Using substr( ) with positive $start and $length
print substr('watch out for that tree',6,5);

Example 1-9 prints:

out f

If you leave out $length, substr() returns the string from $start to the end of the original string, as shown in Example 1-10.

Example 1-10. Using substr( ) with positive start and no length
print substr('watch out for that tree',17);

Example 1-10 prints:

t tree

If $start is bigger than the length of the string, substr() returns false.

If $start plus $length goes past the end of the string, substr() returns all of the string from $start forward, as shown in Example 1-11.

Example 1-11. Using substr( ) with length past the end of the string
print substr('watch out for that tree',20,5);

Example 1-11 prints:

ree

If $start is negative, substr() counts back from the end of the string to determine where your substring starts, as shown in Example 1-12.

Example 1-12. Using substr( ) with negative start
print substr('watch out for that tree',-6);
print substr('watch out for that tree',-17,5);

Example 1-12 prints:

t tree
out f

With a negative $start value that goes past the beginning of the string (for example, if $start is −27 with a 20-character string), substr() behaves as if $start is 0.

If $length is negative, substr() counts back from the end of the string to determine where your substring ends, as shown in Example 1-13.

Example 1-13. Using substr( ) with negative length
print substr('watch out for that tree',15,-2);
print substr('watch out for that tree',-4,-1);

Example 1-13 prints:

hat tr
tre

See Also

Documentation on substr().

Replacing Substrings

Problem

You want to replace a substring with a different string. For example, you want to obscure all but the last four digits of a credit card number before printing it.

Solution

Use substr_replace(), as in Example 1-14.

Example 1-14. Replacing a substring with substr_replace( )
// Everything from position $start to the end of $old_string
// becomes $new_substring
$new_string = substr_replace($old_string,$new_substring,$start);

// $length characters, starting at position $start, become $new_substring
$new_string = substr_replace($old_string,$new_substring,$start,$length);

Discussion

Without the $length argument, substr_replace() replaces everything from $start to the end of the string. If $length is specified, only that many characters are replaced:

print substr_replace('My pet is a blue dog.','fish.',12);
print substr_replace('My pet is a blue dog.','green',12,4);
$credit_card = '4111 1111 1111 1111';
print substr_replace($credit_card,'xxxx ',0,strlen($credit_card)-4);
My pet is a fish.
My pet is a green dog.
xxxx 1111

If $start is negative, the new substring is placed by counting $start characters from the end of $old_string, not from the beginning:

print substr_replace('My pet is a blue dog.','fish.',-9);
print substr_replace('My pet is a blue dog.','green',-9,4);
My pet is a fish.
My pet is a green dog.

If $start and $length are 0, the new substring is inserted at the start of $old_string:

print substr_replace('My pet is a blue dog.','Title: ',0,0);
Title: My pet is a blue dog.

The function substr_replace() is useful when you’ve got text that’s too big to display all at once, and you want to display some of the text with a link to the rest. Example 1-15 displays the first 25 characters of a message with an ellipsis after it as a link to a page that displays more text.

Example 1-15. Displaying long text with an ellipsis
$r = mysql_query("SELECT id,message FROM messages WHERE id = $id") or die();
$ob = mysql_fetch_object($r);
printf('<a href="more-text.php?id=%d">%s</a>',
       $ob->id, substr_replace($ob->message,' ...',25));

The more-text.php page referenced in Example 1-15 can use the message ID passed in the query string to retrieve the full message and display it.

See Also

Documentation on substr_replace().

Processing a String One Byte at a Time

Problem

You need to process each byte in a string individually.

Solution

Loop through each byte in the string with for. Example 1-16 counts the vowels in a string.

Example 1-16. Processing each byte in a string
$string = "This weekend, I'm going shopping for a pet chicken.";
$vowels = 0;
for ($i = 0, $j = strlen($string); $i < $j; $i++) {
    if (strstr('aeiouAEIOU',$string[$i])) {
        $vowels++;
    }
}

Discussion

Processing a string a character at a time is an easy way to calculate the “Look and Say” sequence, as shown in Example 1-17.

Example 1-17. The Look and Say sequence
function lookandsay($s) {
    // initialize the return value to the empty string
    $r = '';
    // $m holds the character we're counting, initialize to the first
    // character in the string
    $m = $s[0];
    // $n is the number of $m's we've seen, initialize to 1
    $n = 1;
    for ($i = 1, $j = strlen($s); $i < $j; $i++) {
        // if this character is the same as the last one
        if ($s[$i] == $m) {
            // increment the count of this character
            $n++;
        } else {
            // otherwise, add the count and character to the return value
            $r .= $n.$m;
            // set the character we're looking for to the current one
            $m = $s[$i];
            // and reset the count to 1
            $n = 1;
        }
    }
    // return the built up string as well as the last count and character
    return $r.$n.$m;
}

for ($i = 0, $s = 1; $i < 10; $i++) {
    $s = lookandsay($s);
    print "$s\n";
}

Example 1-17 prints:

1
11
21
1211
111221
312211
13112221
1113213211
31131211131221
13211311123113112211

It’s called the “Look and Say” sequence because each element is what you get by looking at the previous element and saying what’s in it. For example, looking at the first element, 1, you say “one one.” So the second element is “11.” That’s two ones, so the third element is “21.” Similarly, that’s one two and one one, so the fourth element is “1211,” and so on.

See Also

Documentation on for; more about the “Look and Say” sequence.

Reversing a String by Word or Byte

Problem

You want to reverse the words or the bytes in a string.

Solution

Use strrev() to reverse by byte, as in Example 1-18.

Example 1-18. Reversing a string by byte
print strrev('This is not a palindrome.');

Example 1-18 prints:

.emordnilap a ton si sihT

To reverse by words, explode the string by word boundary, reverse the words, and then rejoin, as in Example 1-19.

Example 1-19. Reversing a string by word
$s = "Once upon a time there was a turtle.";
// break the string up into words
$words = explode(' ',$s);
// reverse the array of words
$words = array_reverse($words);
// rebuild the string
$s = implode(' ',$words);
print $s;

Example 1-19 prints:

turtle. a was there time a upon Once

Discussion

Reversing a string by words can also be done all in one line with the code in Example 1-20.

Example 1-20. Concisely reversing a string by word
$reversed_s = implode(' ',array_reverse(explode(' ',$s)));

See Also

Processing Every Word in a File discusses the implications of using something other than a space character as your word boundary; documentation on strrev() and array_reverse().

Generating a Random String

Problem

You want to generate a random string.

Solution

Use str_rand():

function str_rand($length = 32,
    $characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
if (!is_int($length) || $length < 0) {
    return false;
}

    $characters_length = strlen($characters) - 1;
$string = '';

for ($i = $length; $i > 0; $i--) {
$string .= $characters[mt_rand(0, $characters_length)];
}

    return $string;
}

Discussion

PHP has native functions for generating random numbers, but nothing for random strings. The str_rand() function returns a 32-character string constructed from letters and numbers.

Pass in an integer to change the length of the returned string. To use an alternative set of characters, pass them as a string as the second argument. For example, to get a 16-digit Morse Code:

print str_rand(16, '.-');
.--..-.-.--.----

See Also

Generating Random Numbers Within a Range for generating random numbers.

Expanding and Compressing Tabs

Problem

You want to change spaces to tabs (or tabs to spaces) in a string while keeping text aligned with tab stops. For example, you want to display formatted text to users in a standardized way.

Solution

Use str_replace() to switch spaces to tabs or tabs to spaces, as shown in Example 1-21.

Example 1-21. Switching tabs and spaces
$rows = $db->query('SELECT message FROM messages WHERE id = 1');
$obj = $rows->fetch(PDO::FETCH_OBJ);

$tabbed = str_replace(' ' , "\t", $obj->message);
$spaced = str_replace("\t", ' ' , $obj->message);

print "With Tabs: <pre>$tabbed</pre>";
print "With Spaces: <pre>$spaced</pre>";

Using str_replace() for conversion, however, doesn’t respect tab stops. If you want tab stops every eight characters, a line beginning with a five-letter word and a tab should have that tab replaced with three spaces, not one. Use the tab_expand() function shown in Example 1-22 to turn tabs to spaces in a way that respects tab stops.

Example 1-22. tab_expand( )
function tab_expand($text) {
    while (strstr($text,"\t")) {
        $text = preg_replace_callback('/^([^\t\n]*)(\t+)/m',
                                      'tab_expand_helper', $text);
    }
    return $text;
}

function tab_expand_helper($matches) {
    $tab_stop = 8;

    return $matches[1] .
    str_repeat(' ',strlen($matches[2]) *
                       $tab_stop - (strlen($matches[1]) % $tab_stop));
}


$spaced = tab_expand($obj->message);

You can use the tab_unexpand() function shown in Example 1-23 to turn spaces back to tabs.

Example 1-23. tab_unexpand( )
function tab_unexpand($text) {
    $tab_stop = 8;
    $lines = explode("\n",$text);
    foreach ($lines as $i => $line) {
        // Expand any tabs to spaces
        $line = tab_expand($line);
        $chunks = str_split($line, $tab_stop);
        $chunkCount = count($chunks);
        // Scan all but the last chunk
        for ($j = 0; $j < $chunkCount - 1; $j++) {
            $chunks[$j] = preg_replace('/ {2,}$/',"\t",$chunks[$j]);
        }
        // If the last chunk is a tab-stop's worth of spaces
        // convert it to a tab; Otherwise, leave it alone
        if ($chunks[$chunkCount-1] == str_repeat(' ', $tab_stop)) {
            $chunks[$chunkCount-1] = "\t";
        }
        // Recombine the chunks
        $lines[$i] = implode('',$chunks);
    }
    // Recombine the lines
    return implode("\n",$lines);
}

$tabbed = tab_unexpand($obj->message);

Both functions take a string as an argument and return the string appropriately modified.

Discussion

Each function assumes tab stops are every eight spaces, but that can be modified by changing the setting of the $tab_stop variable.

The regular expression in tab_expand() matches both a group of tabs and all the text in a line before that group of tabs. It needs to match the text before the tabs because the length of that text affects how many spaces the tabs should be replaced with so that subsequent text is aligned with the next tab stop. The function doesn’t just replace each tab with eight spaces; it adjusts text after tabs to line up with tab stops.

Similarly, tab_unexpand() doesn’t just look for eight consecutive spaces and then replace them with one tab character. It divides up each line into eight-character chunks and then substitutes ending whitespace in those chunks (at least two spaces) with tabs. This not only preserves text alignment with tab stops; it also saves space in the string.

See Also

Documentation on str_replace(), on preg_replace_callback(), and on str_split(). Using a PHP Function in a Regular Expression has more information on preg_replace_callback().

Controlling Case

Problem

You need to capitalize, lowercase, or otherwise modify the case of letters in a string. For example, you want to capitalize the initial letters of names but lowercase the rest.

Solution

Use ucfirst() or ucwords() to capitalize the first letter of one or more words, as shown in Example 1-24.

Example 1-24. Capitalizing letters
print ucfirst("how do you do today?");
print ucwords("the prince of wales");

Example 1-24 prints:

How do you do today?
The Prince Of Wales

Use strtolower() or strtoupper() to modify the case of entire strings, as in Example 1-25.

Example 1-25. Changing case of strings
print strtoupper("i'm not yelling!");
print strtolower('<A HREF="one.php">one</A>');

Example 1-25 prints:

I'M NOT YELLING!
<a href="one.php">one</a>

Discussion

Use ucfirst() to capitalize the first character in a string:

print ucfirst('monkey face');
print ucfirst('1 monkey face');

This prints:

Monkey face
1 monkey face

Note that the second phrase is not “1 Monkey face.”

Use ucwords() to capitalize the first character of each word in a string:

print ucwords('1 monkey face');
print ucwords("don't play zone defense against the philadelphia 76-ers");

This prints:

1 Monkey Face
Don't Play Zone Defense Against The Philadelphia 76-ers

As expected, ucwords() doesn’t capitalize the “t” in “don’t.” But it also doesn’t capitalize the “e” in “76-ers.” For ucwords(), a word is any sequence of nonwhitespace characters that follows one or more whitespace characters. Because both ' and - aren’t whitespace characters, ucwords() doesn’t consider the “t” in “don’t” or the “e” in “76-ers” to be word-starting characters.

Both ucfirst() and ucwords() don’t change the case of non–first letters:

print ucfirst('macWorld says I should get an iBook');
print ucwords('eTunaFish.com might buy itunaFish.Com!');

This prints:

MacWorld says I should get an iBook
ETunaFish.com Might Buy ItunaFish.Com!

The functions strtolower() and strtoupper() work on entire strings, not just individual characters. All alphabetic characters are changed to lowercase by strtolower() and strtoupper() changes all alphabetic characters to uppercase:

print strtolower("I programmed the WOPR and the TRS-80.");
print strtoupper('"since feeling is first" is a poem by e. e. cummings.');

This prints:

i programmed the wopr and the trs-80.
"SINCE FEELING IS FIRST" IS A POEM BY E. E. CUMMINGS.

When determining upper- and lowercase, these functions respect your locale settings.

See Also

For more information about locale settings, see Chapter 19; documentation on ucfirst(), ucwords(), strtolower(), and strtoupper().

Interpolating Functions and Expressions Within Strings

Problem

You want to include the results of executing a function or expression within a string.

Solution

Use the string concatenation operator (.), as shown in Example 1-26, when the value you want to include can’t be inside the string.

Example 1-26. String concatenation
print 'You have '.($_POST['boys'] + $_POST['girls']).' children.';
print "The word '$word' is ".strlen($word).' characters long.';
print 'You owe '.$amounts['payment'].' immediately.';
print "My circle's diameter is ".$circle->getDiameter().' inches.';

Discussion

You can put variables, object properties, and array elements (if the subscript is unquoted) directly in double-quoted strings:

print "I have $children children.";
print "You owe $amounts[payment] immediately.";
print "My circle's diameter is $circle->diameter inches.";

Interpolation with double-quoted strings places some limitations on the syntax of what can be interpolated. In the previous example, $amounts['payment'] had to be written as $amounts[payment] so it would be interpolated properly. Use curly braces around more complicated expressions to interpolate them into a string. For example:

print "I have {$children} children.";
print "You owe {$amounts['payment']} immediately.";
print "My circle's diameter is {$circle->getDiameter()} inches.";

Direct interpolation or using string concatenation also works with heredocs. Interpolating with string concatenation in heredocs can look a little strange because the closing heredoc delimiter and the string concatenation operator have to be on separate lines:

print <<< END
Right now, the time is
END
. strftime('%c') . <<< END
 but tomorrow it will be
END
. strftime('%c',time() + 86400);

Also, if you’re interpolating with heredocs, make sure to include appropriate spacing for the whole string to appear properly. In the previous example, Right now, the time is has to include a trailing space, and but tomorrow it will be has to include leading and trailing spaces.

See Also

For the syntax to interpolate variable variables (such as ${"amount_$i"}), see Creating a Dynamic Variable Name; documentation on the string concatenation operator.

Trimming Blanks from a String

Problem

You want to remove whitespace from the beginning or end of a string. For example, you want to clean up user input before validating it.

Solution

Use ltrim(), rtrim(), or trim(). The ltrim() function removes whitespace from the beginning of a string, rtrim() from the end of a string, and trim() from both the beginning and end of a string:

$zipcode = trim($_GET['zipcode']);
$no_linefeed = rtrim($_GET['text']);
$name = ltrim($_GET['name']);

Discussion

For these functions, whitespace is defined as the following characters: newline, carriage return, space, horizontal and vertical tab, and null.

Trimming whitespace off of strings saves storage space and can make for more precise display of formatted data or text within <pre> tags, for example. If you are doing comparisons with user input, you should trim the data first, so that someone who mistakenly enters 98052 followed by a few spaces as their zip code isn’t forced to fix an error that really isn’t one. Trimming before exact text comparisons also ensures that, for example, “salami\n” equals “salami.” It’s also a good idea to normalize string data by trimming it before storing it in a database.

The trim() functions can also remove user-specified characters from strings. Pass the characters you want to remove as a second argument. You can indicate a range of characters with two dots between the first and last characters in the range:

// Remove numerals and space from the beginning of the line
print ltrim('10 PRINT A$',' 0..9');
// Remove semicolon from the end of the line
print rtrim('SELECT * FROM turtles;',';');

This prints:

PRINT A$
SELECT * FROM turtles

PHP also provides chop() as an alias for rtrim(). However, you’re best off using rtrim() instead because PHP’s chop() behaves differently than Perl’s chop() (which is deprecated in favor of chomp(), anyway), and using it can confuse others when they read your code.

See Also

Documentation on trim(), ltrim(), and rtrim().

Generating Comma-Separated Data

Problem

You want to format data as comma-separated values (CSV) so that it can be imported by a spreadsheet or database.

Solution

Use the fputcsv() function to generate a CSV-formatted line from an array of data. Example 1-27 writes the data in $sales into a file.

Example 1-27. Generating comma-separated data
$sales = array( array('Northeast','2005-01-01','2005-02-01',12.54),
                array('Northwest','2005-01-01','2005-02-01',546.33),
                array('Southeast','2005-01-01','2005-02-01',93.26),
                array('Southwest','2005-01-01','2005-02-01',945.21),
                array('All Regions','--','--',1597.34) );

$filename = './sales.csv';
$fh = fopen($filename,'w') or die("Can't open $filename");
foreach ($sales as $sales_line) {
    if (fputcsv($fh, $sales_line) === false) {
        die("Can't write CSV line");
    }
}
fclose($fh) or die("Can't close $filename");

Discussion

To print the CSV-formatted data instead of writing it to a file, use the special output stream php://output, as shown in Example 1-28.

Example 1-28. Printing comma-separated data
$sales = array( array('Northeast','2005-01-01','2005-02-01',12.54),
                array('Northwest','2005-01-01','2005-02-01',546.33),
                array('Southeast','2005-01-01','2005-02-01',93.26),
                array('Southwest','2005-01-01','2005-02-01',945.21),
                array('All Regions','--','--',1597.34) );

$fh = fopen('php://output','w');
foreach ($sales as $sales_line) {
    if (fputcsv($fh, $sales_line) === false) {
        die("Can't write CSV line");
    }
}
fclose($fh);

To put the CSV-formatted data into a string instead of printing it or writing it to a file, combine the technique in Example 1-28 with output buffering, as shown in Example 1-29.

Example 1-29. Putting comma-separated data into a string
$sales = array( array('Northeast','2005-01-01','2005-02-01',12.54),
                array('Northwest','2005-01-01','2005-02-01',546.33),
                array('Southeast','2005-01-01','2005-02-01',93.26),
                array('Southwest','2005-01-01','2005-02-01',945.21),
                array('All Regions','--','--',1597.34) );

ob_start();
$fh = fopen('php://output','w') or die("Can't open php://output");
foreach ($sales as $sales_line) {
    if (fputcsv($fh, $sales_line) === false) {
        die("Can't write CSV line");
    }
}
fclose($fh) or die("Can't close php://output");
$output = ob_get_contents();
ob_end_clean();

See Also

Documentation on fputcsv(); Buffering Output to the Browser has more information about output buffering.

Parsing Comma-Separated Data

Problem

You have data in comma-separated values (CSV) format—for example, a file exported from Excel or a database—and you want to extract the records and fields into a format you can manipulate in PHP.

Solution

If the CSV data is in a file (or available via a URL), open the file with fopen() and read in the data with fgetcsv(). Example 1-30 prints out CSV data in an HTML table.

Example 1-30. Reading CSV data from a file
$fp = fopen($filename,'r') or die("can't open file");
print "<table>\n";
while($csv_line = fgetcsv($fp)) {
    print '<tr>';
    for ($i = 0, $j = count($csv_line); $i < $j; $i++) {
        print '<td>'.htmlentities($csv_line[$i]).'</td>';
    }
    print "</tr>\n";
}
print "</table>\n";
fclose($fp) or die("can't close file");

Discussion

By default, fgetcsv() reads in an entire line of data. If your average line length is more than 8,192 bytes, your program may run faster if you specify an explicit line length instead of letting PHP figure it out. Do this by providing a second argument to fgetcsv() that is a value larger than the maximum length of a line in your CSV file. (Don’t forget to count the end-of-line whitespace.) If you pass a line length of 0, PHP will use the default behavior.

You can pass fgetcsv() an optional third argument, a delimiter to use instead of a comma (,). However, using a different delimiter somewhat defeats the purpose of CSV as an easy way to exchange tabular data.

Don’t be tempted to bypass fgetcsv() and just read a line in and explode() on the commas. CSV is more complicated than that so that it can deal with field values that have, for example, literal commas in them that should not be treated as field delimiters. Using fgetcsv() protects you and your code from subtle errors.

See Also

Documentation on fgetcsv().

Generating Fixed-Width Field Data Records

Problem

You need to format data records such that each field takes up a set amount of characters.

Solution

Use pack() with a format string that specifies a sequence of space-padded strings. Example 1-31 transforms an array of data into fixed-width records.

Example 1-31. Generating fixed-width field data records
$books = array( array('Elmer Gantry', 'Sinclair Lewis', 1927),
                array('The Scarlatti Inheritance','Robert Ludlum', 1971),
                array('The Parsifal Mosaic','William Styron', 1979) );

foreach ($books as $book) {
    print pack('A25A15A4', $book[0], $book[1], $book[2]) . "\n";
}

Discussion

The format string A25A14A4 tells pack() to transform its subsequent arguments into a 25-character space-padded string, a 14-character space-padded string, and a 4-character space-padded string. For space-padded fields in fixed-width records, pack() provides a concise solution.

To pad fields with something other than a space, however, use substr() to ensure that the field values aren’t too long and str_pad() to ensure that the field values aren’t too short. Example 1-32 transforms an array of records into fixed-width records with .⁠-⁠padded fields.

Example 1-32. Generating fixed-width field data records without pack( )
$books = array( array('Elmer Gantry', 'Sinclair Lewis', 1927),
                array('The Scarlatti Inheritance','Robert Ludlum', 1971),
                array('The Parsifal Mosaic','William Styron', 1979) );

foreach ($books as $book) {
    $title  = str_pad(substr($book[0], 0, 25), 25, '.');
    $author = str_pad(substr($book[1], 0, 15), 15, '.');
    $year   = str_pad(substr($book[2], 0, 4), 4, '.');
    print "$title$author$year\n";
}

See Also

Documentation on pack() and on str_pad(). Storing Binary Data in Strings discusses pack() format strings in more detail.

Parsing Fixed-Width Field Data Records

Problem

You need to break apart fixed-width records in strings.

Solution

Use substr() as shown in Example 1-33.

Example 1-33. Parsing fixed-width records with substr( )
$fp = fopen('fixed-width-records.txt','r',true) or die ("can't open file");
while ($s = fgets($fp,1024)) {
    $fields[1] = substr($s,0,25);  // first field:  first 25 characters of the line
    $fields[2] = substr($s,25,15); // second field: next 15 characters of the line
    $fields[3] = substr($s,40,4);  // third field:  next 4 characters of the line
    $fields = array_map('rtrim', $fields); // strip the trailing whitespace
    // a function to do something with the fields
    process_fields($fields);
}
fclose($fp) or die("can't close file");

Or unpack(), as shown in Example 1-34.

Example 1-34. Parsing fixed-width records with unpack( )
function fixed_width_unpack($format_string,$data) {
  $r = array();
  for ($i = 0, $j = count($data); $i < $j; $i++) {
    $r[$i] = unpack($format_string,$data[$i]);
  }
  return $r;
}

Discussion

Data in which each field is allotted a fixed number of characters per line may look like this list of books, titles, and publication dates:

$booklist=<<<END
Elmer Gantry             Sinclair Lewis 1927
The Scarlatti InheritanceRobert Ludlum  1971
The Parsifal Mosaic      Robert Ludlum  1982
Sophie's Choice          William Styron 1979
END;

In each line, the title occupies the first 25 characters, the author’s name the next 15 characters, and the publication year the next 4 characters. Knowing those field widths, you can easily use substr() to parse the fields into an array:

$books = explode("\n",$booklist);

for($i = 0, $j = count($books); $i < $j; $i++) {
  $book_array[$i]['title'] = substr($books[$i],0,25);
  $book_array[$i]['author'] = substr($books[$i],25,15);
  $book_array[$i]['publication_year'] = substr($books[$i],40,4);
}

Exploding $booklist into an array of lines makes the looping code the same whether it’s operating over a string or a series of lines read in from a file.

The loop can be made more flexible by specifying the field names and widths in a separate array that can be passed to a parsing function, as shown in the fixed_width_substr() function in Example 1-35.

Example 1-35. fixed_width_substr( )
function fixed_width_substr($fields,$data) {
  $r = array();
  for ($i = 0, $j = count($data); $i < $j; $i++) {
    $line_pos = 0;
    foreach($fields as $field_name => $field_length) {
      $r[$i][$field_name] = rtrim(substr($data[$i],$line_pos,$field_length));
      $line_pos += $field_length;
    }
  }
  return $r;
}

$book_fields = array('title' => 25,
                     'author' => 15,
                     'publication_year' => 4);

$book_array = fixed_width_substr($book_fields,$booklist);

The variable $line_pos keeps track of the start of each field and is advanced by the previous field’s width as the code moves through each line. Use rtrim() to remove trailing whitespace from each field.

You can use unpack() as a substitute for substr() to extract fields. Instead of specifying the field names and widths as an associative array, create a format string for unpack(). A fixed-width field extractor using unpack() looks like the fixed_width_unpack() function shown in Example 1-36.

Example 1-36. fixed_width_unpack( )
function fixed_width_unpack($format_string,$data) {
  $r = array();
  for ($i = 0, $j = count($data); $i < $j; $i++) {
    $r[$i] = unpack($format_string,$data[$i]);
  }
  return $r;
}

Because the A format to unpack() means space-padded string, there’s no need to rtrim() off the trailing spaces.

Once the fields have been parsed into $book_array by either function, the data can be printed as an HTML table, for example:

$book_array = fixed_width_unpack('A25title/A15author/A4publication_year',
                                    $books);
print "<table>\n";
// print a header row
print '<tr><td>';
print join('</td><td>',array_keys($book_array[0]));
print "</td></tr>\n";
// print each data row
foreach ($book_array as $row) {
    print '<tr><td>';
    print join('</td><td>',array_values($row));
    print "</td></tr>\n";
}
print "</table>\n";

Joining data on </td><td> produces a table row that is missing its first <td> and last </td>. We produce a complete table row by printing out <tr><td> before the joined data and </td></tr> after the joined data.

Both substr() and unpack() have equivalent capabilities when the fixed-width fields are strings, but unpack() is the better solution when the elements of the fields aren’t just strings.

If all of your fields are the same size, str_split() is a handy shortcut for chopping up incoming data. It returns an array made up of sections of a string. Example 1-37 uses str_split() to break apart a string into 32-byte pieces.

Example 1-37. Chopping up a string with str_split( )
$fields = str_split($line_of_data,32);
// $fields[0] is bytes 0 - 31
// $fields[1] is bytes 32 - 63
// and so on

See Also

For more information about unpack(), see Storing Binary Data in Strings and the PHP website; documentation on str_split(); Turning an Array into a String discusses join().

Taking Strings Apart

Problem

You need to break a string into pieces. For example, you want to access each line that a user enters in a <textarea> form field.

Solution

Use explode() if what separates the pieces is a constant string:

$words = explode(' ','My sentence is not very complicated');

Use preg_split() if you need a Perl-compatible regular expression to describe the separator:

$words = preg_split('/\d\. /','my day: 1. get up 2. get dressed 3. eat toast');
$lines = preg_split('/[\n\r]+/',$_POST['textarea']);

Use the /i flag to preg_split() for case-insensitive separator matching:

$words = preg_split('/ x /i','31 inches x 22 inches X 9 inches');

Discussion

The simplest solution of the bunch is explode(). Pass it your separator string, the string to be separated, and an optional limit on how many elements should be returned:

$dwarves = 'dopey,sleepy,happy,grumpy,sneezy,bashful,doc';
$dwarf_array = explode(',',$dwarves);

This makes $dwarf_array a seven-element array, so print_r($dwarf_array) prints:

Array
(
    [0] => dopey
    [1] => sleepy
    [2] => happy
    [3] => grumpy
    [4] => sneezy
    [5] => bashful
    [6] => doc
)

If the specified limit is less than the number of possible chunks, the last chunk contains the remainder:

$dwarf_array = explode(',',$dwarves,5);
print_r($dwarf_array);

This prints:

Array
(
    [0] => dopey
    [1] => sleepy
    [2] => happy
    [3] => grumpy
    [4] => sneezy,bashful,doc
)

The separator is treated literally by explode(). If you specify a comma and a space as a separator, it breaks the string only on a comma followed by a space, not on a comma or a space.

With preg_split(), you have more flexibility. Instead of a string literal as a separator, it uses a Perl-compatible regular expression engine. With preg_split(), you can take advantage of various Perl-ish regular expression extensions, as well as tricks such as including the separator text in the returned array of strings:

$math = "3 + 2 / 7 - 9";
$stack = preg_split('/ *([+\-\/*]) */',$math,-1,PREG_SPLIT_DELIM_CAPTURE);
print_r($stack);

This prints:

Array
(
    [0] => 3
    [1] => +
    [2] => 2
    [3] => /
    [4] => 7
    [5] => -
    [6] => 9
)

The separator regular expression looks for the four mathematical operators (+, -, /, *), surrounded by optional leading or trailing spaces. The PREG_SPLIT_DELIM_CAPTURE flag tells preg_split() to include the matches as part of the separator regular expression in parentheses in the returned array of strings. Only the mathematical operator character class is in parentheses, so the returned array doesn’t have any spaces in it.

See Also

Regular expressions are discussed in more detail in Chapter 23; documentation on explode() and preg_split().

Wrapping Text at a Certain Line Length

Problem

You need to wrap lines in a string. For example, you want to display text by using <pre> and </pre> tags but have it stay within a regularly sized browser window.

Solution

Use wordwrap():

$s = "Four score and seven years ago our fathers brought forth on this continent a new nation, conceived in liberty and dedicated to the proposition that all men are created equal.";

print "<pre>\n".wordwrap($s)."\n</pre>";

This prints:

<pre>
Four score and seven years ago our fathers brought forth on this continent
a new nation, conceived in liberty and dedicated to the proposition that
all men are created equal.
</pre>

Discussion

By default, wordwrap() wraps text at 75 characters per line. An optional second argument specifies a different line length:

print wordwrap($s,50);

This prints:

Four score and seven years ago our fathers brought
forth on this continent a new nation, conceived in
liberty and dedicated to the proposition that all
men are created equal.

Other characters besides \n can be used for line breaks. For double spacing, use "\n\n":

print wordwrap($s,50,"\n\n");

This prints:

Four score and seven years ago our fathers brought

forth on this continent a new nation, conceived in

liberty and dedicated to the proposition that all

men are created equal.

There is an optional fourth argument to wordwrap() that controls the treatment of words that are longer than the specified line length. If this argument is 1, these words are wrapped. Otherwise, they span past the specified line length:

print wordwrap('jabberwocky',5) . "\n";
print wordwrap('jabberwocky',5,"\n",1);

This prints:

jabberwocky
jabbe
rwock
y

See Also

Documentation on wordwrap().

Storing Binary Data in Strings

Problem

You want to parse a string that contains values encoded as a binary structure or encode values into a string. For example, you want to store numbers in their binary representation instead of as sequences of ASCII characters.

Solution

Use pack() to store binary data in a string:

$packed = pack('S4',1974,106,28225,32725);

Use unpack() to extract binary data from a string:

$nums = unpack('S4',$packed);

Discussion

The first argument to pack() is a format string that describes how to encode the data that’s passed in the rest of the arguments. The format string S4 tells pack() to produce four unsigned short 16-bit numbers in machine byte order from its input data. Given 1974, 106, 28225, and 32725 as input on a little-endian machine, this returns eight bytes: 182, 7, 106, 0, 65, 110, 213, and 127. Each two-byte pair corresponds to one of the input numbers: 7 * 256 + 182 is 1974; 0 * 256 + 106 is 106; 110 * 256 + 65 = 28225; 127 * 256 + 213 = 32725.

The first argument to unpack() is also a format string, and the second argument is the data to decode. Passing a format string of S4, the eight-byte sequence that pack() produced returns a four-element array of the original numbers. print_r($nums) prints:

Array
(
    [1] => 1974
    [2] => 106
    [3] => 28225
    [4] => 32725
)

In unpack(), format characters and their count can be followed by a string to be used as an array key. For example:

$nums = unpack('S4num',$packed);
print_r($nums);

This prints:

Array
(
    [num1] => 1974
    [num2] => 106
    [num3] => 28225
    [num4] => 32725
)

Multiple format characters must be separated with / in unpack():

$nums = unpack('S1a/S1b/S1c/S1d',$packed);
print_r($nums);

This prints:

Array
(
    [a] => 1974
    [b] => 106
    [c] => 28225
    [d] => 32725
)

The format characters that can be used with pack() and unpack() are listed in Table 1-2.

Table 1-2. Format characters for pack( ) and unpack( )
Format characterData type

a

NUL-padded string

A

Space-padded string

h

Hex string, low nibble first

H

Hex string, high nibble first

c

signed char

C

unsigned char

s

signed short (16 bit, machine byte order)

S

unsigned short (16 bit, machine byte order)

n

unsigned short (16 bit, big endian byte order)

v

unsigned short (16 bit, little endian byte order)

i

signed int (machine-dependent size and byte order)

I

unsigned int (machine-dependent size and byte order)

l

signed long (32 bit, machine byte order)

L

unsigned long (32 bit, machine byte order)

N

unsigned long (32 bit, big endian byte order)

V

unsigned long (32 bit, little endian byte order)

f

float (machine-dependent size and representation)

d

double (machine-dependent size and representation)

x

NUL byte

X

Back up one byte

@

NUL-fill to absolute position

For a, A, h, and H, a number after the format character indicates how long the string is. For example, A25 means a 25-character space-padded string. For other format characters, a following number means how many of that type appear consecutively in a string. Use * to take the rest of the available data.

You can convert between data types with unpack(). This example fills the array $ascii with the ASCII values of each character in $s:

$s = 'platypus';
$ascii = unpack('c*',$s);
print_r($ascii);

This prints:

Array
(
    [1] => 112
    [2] => 108
    [3] => 97
    [4] => 116
    [5] => 121
    [6] => 112
    [7] => 117
    [8] => 115
)

See Also

Documentation on pack() and unpack().

Program: Downloadable CSV File

Combining the header() function to change the content type of what your PHP program outputs with the fputcsv() function for data formatting lets you send CSV files to browsers that will be automatically handed off to a spreadsheet program (or whatever application is configured on a particular client system to handle CSV files). Example 1-38 formats the results of an SQL SELECT query as CSV data and provides the correct headers so that it is properly handled by the browser.

Example 1-38. Downloadable CSV file
$db = new PDO('sqlite:/usr/local/data/sales.db');
$query = $db->query('SELECT region, start, end, amount FROM sales', PDO::FETCH_NUM);
$sales_data = $db->fetchAll();

// Open filehandle for fputcsv()
$output = fopen('php://output','w') or die("Can't open php://output");
$total = 0;

// Tell browser to expect a CSV file
header('Content-Type: application/csv');
header('Content-Disposition: attachment; filename="sales.csv"');

// Print header row
fputcsv($output,array('Region','Start Date','End Date','Amount'));
// Print each data row and increment $total
foreach ($sales_data as $sales_line) {
    fputcsv($output, $sales_line);
    $total += $sales_line[3];
}
// Print total row and close file handle
fputcsv($output,array('All Regions','--','--',$total));
fclose($output) or die("Can't close php://output");

Example 1-38 sends two headers to ensure that the browser handles the CSV output properly. The first header, Content-Type, tells the browser that the output is not HTML, but CSV. The second header, Content-Disposition, tells the browser not to display the output but to attempt to load an external program to handle it. The filename attribute of this header supplies a default filename for the browser to use for the downloaded file.

If you want to provide different views of the same data, you can combine the formatting code in one page and use a query string variable to determine which kind of data formatting to do. In Example 1-39, the format query string variable controls whether the results of an SQL SELECT query are returned as an HTML table or CSV.

Example 1-39. Dynamic CSV or HTML
$db = new PDO('sqlite:/usr/local/data/sales.db');
$query = $db->query('SELECT region, start, end, amount FROM sales', PDO::FETCH_NUM);
$sales_data = $db->fetchAll();

$total = 0;
$column_headers = array('Region','Start Date','End Date','Amount');
// Decide what format to use
$format = $_GET['format'] == 'csv' ? 'csv' : 'html';

// Print format-appropriate beginning
if ($format == 'csv') {
    $output = fopen('php://output','w') or die("Can't open php://output");
    header('Content-Type: application/csv');
    header('Content-Disposition: attachment; filename="sales.csv"');
    fputcsv($output,$column_headers);
 } else {
    echo '<table><tr><th>';
    echo implode('</th><th>', $column_headers);
    echo '</th></tr>';
 }

foreach ($sales_data as $sales_line) {
    // Print format-appropriate line
    if ($format == 'csv') {
        fputcsv($output, $sales_line);
    } else {
        echo '<tr><td>' . implode('</td><td>', $sales_line) . '</td></tr>';
   }
    $total += $sales_line[3];
}
$total_line = array('All Regions','--','--',$total);

// Print format-appropriate footer
if ($format == 'csv') {
    fputcsv($output,$total_line);
    fclose($output) or die("Can't close php://output");
 } else {
    echo '<tr><td>' . implode('</td><td>', $total_line) . '</td></tr>';
    echo '</table>';
 }

Accessing the program in Example 1-39 with format=csv in the query string causes it to return CSV-formatted output. Any other format value in the query string causes it to return HTML output. The logic that sets $format to CSV or HTML could easily be extended to other output formats such as JSON. If you have many places where you want to offer for download the same data in multiple formats, package the code in Example 1-39 into a function that accepts an array of data and a format specifier and then displays the right results.

Get PHP Cookbook, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.