Manipulating and Searching Strings

PHP has many functions to work with strings. The most commonly used functions for searching and modifying strings are those that use regular expressions to describe the string in question. The functions described in this section do not use regular expressions—they are faster than regular expressions, but they work only when you’re looking for a fixed string (for instance, if you’re looking for "12/11/01" rather than “any numbers separated by slashes”).

Substrings

If you know where in a larger string the interesting data lies, you can copy it out with the substr( ) function:

$piece = substr(string, start [, length ]);

The start argument is the position in string at which to begin copying, with 0 meaning the start of the string. The length argument is the number of characters to copy (the default is to copy until the end of the string). For example:

$name  = "Fred Flintstone";
$fluff = substr($name, 6, 4);          // $fluff is "lint"
$sound = substr($name, 11);            // $sound is "tone"

To learn how many times a smaller string occurs in a larger one, use substr_count( ) :

$number = substr_count(big_string, small_string);

For example:

$sketch = <<< End_of_Sketch
Well, there's egg and bacon; egg sausage and bacon; egg and spam;
egg bacon and spam; egg bacon sausage and spam; spam bacon sausage
and spam; spam egg spam spam bacon and spam; spam sausage spam spam
bacon spam tomato and spam;
End_of_Sketch;
$count = substr_count($sketch, "spam");
print("The word spam occurs $count times.");
The word spam occurs 14 times.

The substr_replace( ) function permits many kinds of string modifications:

$string = substr_replace(original, new, start [, length ]);

The function replaces the part of original indicated by the start (0 means the start of the string) and length values with the string new. If no fourth argument is given, substr_replace( ) removes the text from start to the end of the string.

For instance:

$greeting = "good morning citizen";
$farewell = substr_replace($greeting, "bye", 5, 7);
// $farewell is "good bye citizen"

Use a length value of 0 to insert without deleting:

$farewell = substr_replace($farewell, "kind ", 9, 0);
// $farewell is "good bye kind citizen"

Use a replacement of "" to delete without inserting:

$farewell = substr_replace($farewell, "", 8);
// $farewell is "good bye"

Here’s how you can insert at the beginning of the string:

$farewell = substr_replace($farewell, "now it's time to say ", 0, 0);
// $farewell is "now it's time to say good bye"'

A negative value for start indicates the number of characters from the end of the string from which to start the replacement:

$farewell = substr_replace($farewell, "riddance", -3);
// $farewell is "now it's time to say good riddance"

A negative length indicates the number of characters from the end of the string at which to stop deleting:

$farewell = substr_replace($farewell, "", -8, -5);
// $farewell is "now it's time to say good dance"

Miscellaneous String Functions

The strrev( ) function takes a string and returns a reversed copy of it:

$string = strrev(string);

For example:

echo strrev("There is no cabal");
labac on si erehT

The str_repeat( ) function takes a string and a count and returns a new string consisting of the argument string repeated count times:

$repeated = str_repeat(string, count);

For example, to build a crude horizontal rule:

echo str_repeat('-', 40);

The str_pad( ) function pads one string with another. Optionally, you can say what string to pad with, and whether to pad on the left, right, or both:

$padded = str_pad(to_pad, length [, with [, pad_type ]]);

The default is to pad on the right with spaces:

$string = str_pad('Fred Flintstone', 30);
echo "$string:35:Wilma";
Fred Flintstone               :35:Wilma

The optional third argument is the string to pad with:

$string = str_pad('Fred Flintstone', 30, '. ');
echo "{$string}35";
Fred Flintstone. . . . . . . .35

The optional fourth argument can be either STR_PAD_RIGHT (the default), STR_PAD_LEFT, or STR_PAD_BOTH (to center). For example:

echo '[' . str_pad('Fred Flintstone', 30, ' ', STR_PAD_LEFT) . "]\n";
echo '[' . str_pad('Fred Flintstone', 30, ' ', STR_PAD_BOTH) . "]\n";
[               Fred Flintstone]
               
[       Fred Flintstone        ]

Decomposing a String

PHP provides several functions to let you break a string into smaller components. In increasing order of complexity, they are explode( ), strtok( ), and sscanf( ).

Exploding and imploding

Data often arrives as strings, which must be broken down into an array of values. For instance, you might want to separate out the comma-separated fields from a string such as "Fred,25,Wilma". In these situations, use the explode( ) function:

$array = explode(separator, string [, limit]);

The first argument, separator, is a string containing the field separator. The second argument, string, is the string to split. The optional third argument, limit, is the maximum number of values to return in the array. If the limit is reached, the last element of the array contains the remainder of the string:

$input  = 'Fred,25,Wilma';
$fields = explode(',', $input);        
// $fields is array('Fred', '25', 'Wilma')
$fields = explode(',', $input, 2);     
// $fields is array('Fred', '25,Wilma')

The implode( ) function does the exact opposite of explode( )—it creates a large string from an array of smaller strings:

$string = implode(separator, array);

The first argument, separator, is the string to put between the elements of the second argument, array. To reconstruct the simple comma-separated value string, simply say:

$fields = array('Fred', '25', 'Wilma');
$string = implode(',', $fields);       // $string is 'Fred,25,Wilma'

The join( ) function is an alias for implode( ).

Tokenizing

The strtok( ) function lets you iterate through a string, getting a new chunk (token) each time. The first time you call it, you need to pass two arguments: the string to iterate over and the token separator:

$first_chunk = strtok(string, separator);

To retrieve the rest of the tokens, repeatedly call strtok( ) with only the separator:

$next_chunk  = strtok(separator);

For instance, consider this invocation:

$string = "Fred,Flintstone,35,Wilma";
$token  = strtok($string, ",");
while ($token !== false) {
  echo("$token<br>");
  $token = strtok(",");
}
Fred
                  
Flintstone
                  
35
                  
Wilma

The strtok( ) function returns false when there are no more tokens to be returned.

Call strtok( ) with two arguments to reinitialize the iterator. This restarts the tokenizer from the start of the string.

sscanf( )

The sscanf( ) function decomposes a string according to a printf( )-like template:

$array = sscanf(string, template);
$count = sscanf(string, template, var1, ... );

If used without the optional variables, sscanf( ) returns an array of fields:

$string = "Fred\tFlintstone (35)";
$a = sscanf($string, "%s\t%s (%d)");
print_r($a);Array
                  
(
                  
    [0] => Fred
                  
    [1] => Flintstone
                  
    [2] => 35
                  
)

Pass references to variables to have the fields stored in those variables. The number of fields assigned is returned:

$string = "Fred\tFlintstone (35)";
$n = sscanf($string, "%s\t%s (%d)", &$first, &$last, &$age);
echo "Matched n fields: $first $last is $age years old";
Fred Flintstone is 35 years old

String-Searching Functions

Several functions find a string or character within a larger string. They come in three families: strpos( ) and strrpos( ), which return a position; strstr( ), strchr( ), and friends, which return the string they find; and strspn( ) and strcspn( ), which return how much of the start of the string matches a mask.

In all cases, if you specify a number as the “string” to search for, PHP treats that number as the ordinal value of the character to search for. Thus, these function calls are identical because 44 is the ASCII value of the comma:

$pos = strpos($large, ",");            // find last comma
$pos = strpos($large, 44);             // find last comma

All the string-searching functions return false if they can’t find the substring you specified. If the substring occurs at the start of the string, the functions return 0. Because false casts to the number 0, always compare the return value with === when testing for failure:

if ($pos === false) {
  // wasn't found
} else {
  // was found, $pos is offset into string
}

Searches returning position

The strpos( ) function finds the first occurrence of a small string in a larger string:

$position = strpos(large_string, small_string);

If the small string isn’t found, strpos( ) returns false.

The strrpos( ) function finds the last occurrence of a character in a string. It takes the same arguments and returns the same type of value as strpos( ).

For instance:

$record = "Fred,Flintstone,35,Wilma";
$pos = strrpos($record, ",");          // find last comma
echo("The last comma in the record is at position $pos");
The last comma in the record is at position 18

If you pass a string as the second argument to strrpos( ), only the first character is searched for. To find the last occurrence of a multicharacter string, reverse the strings and use strpos( ):

$long = "Today is the day we go on holiday to Florida";
$to_find = "day";
$pos = strpos(strrev($long), strrev($to_find));
if ($pos === false) {
  echo("Not found");
} else {
  // $pos is offset into reversed strings
  // Convert to offset into regular strings
  $pos = strlen($long) - $pos - strlen($to_find);;
  echo("Last occurrence starts at position $pos");
}
Last occurrence starts at position 30

Searches returning rest of string

The strstr( ) function finds the first occurrence of a small string in a larger string and returns from that small string on. For instance:

$record = "Fred,Flintstone,35,Wilma"; 
$rest = strstr($record, ",");      // $rest is ",Flintstone,35,Wilma"

The variations on strstr( ) are:

stristr( )

Case-insensitive strstr( )

strchr( )

Alias for strstr( )

strrchr( )

Find last occurrence of a character in a string

As with strrpos( ), strrchr( ) searches backward in the string, but only for a character, not for an entire string.

Searches using masks

If you thought strrchr( ) was esoteric, you haven’t seen anything yet. The strspn( ) and strcspn( ) functions tell you how many characters at the beginning of a string are comprised of certain characters:

$length = strspn(string, charset);

For example, this function tests whether a string holds an octal number:

function is_octal ($str) {
  return strspn($str, '01234567') == strlen($str);
}

The c in strcspn( ) stands for complement—it tells you how much of the start of the string is not composed of the characters in the character set. Use it when the number of interesting characters is greater than the number of uninteresting characters. For example, this function tests whether a string has any NUL-bytes, tabs, or carriage returns:

function has_bad_chars ($str) {
  return strcspn($str, "\n\t\0");
}

Decomposing URLs

The parse_url( ) function returns an array of components of a URL:

$array = parse_url(url);

For example:

$bits = parse_url('http://me:secret@example.com/cgi-bin/board?user=fred);
print_r($bits);
Array
                  
(
                  
    [scheme] => http
                  
    [host] => example.com
                  
    [user] => me
                  
    [pass] => secret
                  
    [path] => /cgi-bin/board
                  
    [query] => user=fred
                  
)

The possible keys of the hash are scheme, host, port, user, pass, path, query, and fragment.

Get Programming PHP now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.