4.7. Reformatting Database Content as HTML

Problem

You need to convert characters in text stored in or retrieved from a database to their proper HTML entities.

Solution

Use PHP's built-in string formatting functions, such as htmlentities( ) and str_ replace( ), to build your own an on-the-fly reformatting function:

	function processText( $text )
	{
	   $text = str_replace(">",">",$text);
	   $text = str_replace("&lt;","<",$text);
	   $text = str_replace("\r\n\r\n"," </p>\n<p> ",$text);
	   $text = str_replace("\r\n "," </p>\n<p> ",$text);
	   $text = str_replace("\n\n"," </p>\n<p> ",$text);
	   $text = str_replace("\n "," </p>\n<p> ",$text);
	   return $text;
	}

Discussion

The articles and other content displayed on a dynamic or template-driven web site are stored in database tables on the web server in which each bit of a page—headline, subhead, byline, and main text—likely has its own field, or slot, in an individual article record. The logic of the template file, written in PHP or another server-side scripting language, then retrieves a specific article based on a browser request and formats the contents of the article record as an HTML web page. A template may be designed to display just a handful of different articles, or thousands of different entries.

PHP has some built-in tools for handling the special requirements of text that moves from a database to a web page and back again. Combining these tools into one master function that meets the specific needs of your database-driven site ensures that all the content on your site gets formatted the same way. When you need to make a change, editing this single function does the trick.

addslashes( ) and stripslashes( ) are two built-in PHP functions that escape and unescape single-quote, double-quote, and backslash characters in text strings inserted and retrieved from a database. PHP will prevent those characters from being misinterpreted as delimiters between records by adding a slash before them. For example, addslashes( ) changes "St. Patrick's Day" into "St. Patrick\'s Day", while stripslashes( ) reverses the process.

Tip

If your PHP installation has magic_quotes_gpc enabled, then you should not use either of these functions. PHP will do it for you. You can easily check this and other PHP configuration settings by uploading a file to your web site called test.php containing this one line:

	<?php phpinfo( ) ?>

Then request the file through your web browser (http://domain.com/test.php); the status of magic_quotes_gpc should be listed as "On" or "Off."

Beyond that convenience, PHP makes no assumptions about how the text coming and going from your database should be formatted. But it provides a handful of built-in functions that you can use to do it yourself, such as converting new line characters—\n—to HTML line break characters—<br>—via the nl2br( ) function.

But using <br> tags to create line breaks after text blocks is an obsolete technique. In fact, the <br> tag has been retired from use by the WC3 in the latest HTML DTDs (see Recipe 4.1). Instead, you should be using block element markup—typically the paragraph tags <p> and </p>—along with a stylesheet to define styles for text blocks between them. If you've got the rest of your site formatted this way (and you should), then text converted with nl2br( ) may not get formatted correctly.

PHP's all-purpose find-and-replace function, str_replace( ), offers a way to wrap text blocks in paragraph tags:

	$text = str_replace("\n\n"," </p>\n <p> ",$text);

Here, the str_replace( ) function replaces all double new line characters with closing and opening paragraph tags and retains one new line character to make the resulting code more readable. So this:

	The quick brown fox jumped over the lazy dogs\n\nNow is the time for all good men and
	women to come to the aid of their country.

Becomes this:

	The quick brown fox jumped over the lazy dogs</p>
	<p>Now is the time for all good men and women to come to the aid of their country.

When you print the text block in the PHP template, enclose it in paragraph tags, since the first and last paragraphs of the text block likely were not preceded or followed by new lines:

	echo "<p>".$text."</p>"

So, the result would be:

	<p>The quick brown fox jumped over the lazy dogs</p>
	<p>Now is the time for all good men and women to come to the aid of their country.</p>

The str_replace( ) function looks for an exact match, so you may need to include some alternate searches, depending on how text is stored in your database. Paragraphs may be separated by just one new line character, or as many as two new line characters and two return characters (\r), or more. Here's a function that handles a few likely scenarios. When combining searches, always start with the most complex pattern and work toward the simplest to avoid double replacements. First, add this function to your PHP template:

	function processText( $text )
	{
	   $text = str_replace("\r\n\r\n"," </p>\n<p> ",$text);
	   $text = str_replace("\r\n "," </p>\n<p> ",$text);
	   $text = str_replace("\n\n"," </p>\n<p> ",$text);
	   $text = str_replace("\n "," </p>\n<p> ",$text);
	   return $text;
	}

Then apply the function to your text:

	$text = processText($text);

PHP also has a couple of functions for converting special characters to their HTML entities. For more about entities, see Recipe 4.2. One—htmlspecialchars( )—converts only ampersands, greater than (>) signs, and less than (<) signs (by default), as well as double and single quotes in a user-specified extended mode. The other—htmlentities( )—converts any character for which there is an HTML entity, including the ones that htmlspecialchars( ) converts.

Tip

Support for characters outside the Latin 1 (ISO-8859-1) repertoire varies depending on the character set installed on your web server and your installed version of PHP.

If you or your web site's visitors submit content to a database that is then displayed on the site, you can create a function to format and encode that text before it is saved in the database.

This function includes the addslashes( ) function to demonstrate how two built-in functions can be combined into a custom function:

	function processInsert( $text )
	{
	   $text = addslashes($text);
	   $text = htmlentities($text);
	   return $text;
	}

If the content submitted to the database includes inline HTML tags, such as <em> Important</em> for italics, htmlentities( ) will change it to &lt;em&gt;Important&lt;/em&gt;. And that will be rendered on the page as:

	<em>Important</em>

The tags are showing, but without the emphasis the author intended. You can modify the text-processing function I described above to undo a little of what htmlentities( ) has done so tags display properly. Two calls to str_replace( ) restore the greater than and less than sign to the tags:

	$text = str_replace("&gt;",">",$text);
	$text = str_replace("&lt;","<",$text);

The complete function now looks like this:

	function processText( $text )
	{
	   $text = str_replace("&gt;",">",$text);
	   $text = str_replace("&lt;","<",$text);
	   $text = str_replace("\r\n\r\n"," </p>\n<p> ",$text);
	   $text = str_replace("\r\n "," </p>\n<p> ",$text);
	   $text = str_replace("\n\n"," </p>\n<p> ",$text);
	   $text = str_replace("\n "," </p>\n<p> ",$text);
	   return $text;
	}

See Also

The online PHP Manual has detailed information on all the built-in functions described in this Recipe:

Get Web Site Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.