Cleaning Strings

Often, the strings we get from files or users need to be cleaned up before we can use them. Two common problems with raw data are the presence of extraneous whitespace and incorrect capitalization (uppercase versus lowercase).

Removing Whitespace

You can remove leading or trailing whitespace with the trim(), ltrim(), and rtrim() functions:

$trimmed = trim(string [, charlist ]);
$trimmed = ltrim(string [, charlist ]);
$trimmed = rtrim(string [, charlist ]);

trim() returns a copy of string with whitespace removed from the beginning and the end. ltrim() (the l is for left) does the same, but removes whitespace only from the start of the string. rtrim() (the r is for right) removes whitespace only from the end of the string. The optional charlist argument is a string that specifies all the characters to strip. The default characters to strip are given in Table 4-3.

Table 4-3. Default characters removed by trim(), ltrim(), and rtrim()

Character

ASCII value

Meaning

" "

0x20

Space

"\t"

0x09

Tab

"\n"

0x0A

Newline (line feed)

"\r"

0x0D

Carriage return

"\0"

0x00

NUL-byte

"\x0B"

0x0B

Vertical tab

For example:

$title = "   Programming PHP  \n";
$str1 = ltrim($title);   // $str1 is "Programming PHP  \n"
$str2 = rtrim($title);   // $str2 is "   Programming PHP"
$str3 = trim($title);    // $str3 is "Programming PHP"

Given a line of tab-separated data, use the charlist argument to remove leading or trailing whitespace without deleting the tabs:

$record = "  Fred\tFlintstone\t35\tWilma\t   \n";
$record = trim($record, " \r\n\0\x0B");
// $record is "Fred\tFlintstone\t35\tWilma"

Changing Case

PHP has several functions for changing the case of strings: strtolower() and strtoupper() operate on entire strings, ucfirst() operates only on the first character of the string, and ucwords() operates on the first character of each word in the string. Each function takes a string to operate on as an argument and returns a copy of that string, appropriately changed. For example:

$string1 = "FRED flintstone";
$string2 = "barney rubble";
print(strtolower($string1));
print(strtoupper($string1));
print(ucfirst($string2));
print(ucwords($string2));
fred flintstone
FRED FLINTSTONE
Barney rubble
Barney Rubble

If you’ve got a mixed-case string that you want to convert to “title case,” where the first letter of each word is in uppercase and the rest of the letters are in lowercase (and you are not sure what case the string is in to begin with), use a combination of strtolower() and ucwords():

print(ucwords(strtolower($string1)));
Fred Flintstone

Get Programming PHP, 3rd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.