11.11. Removing HTML and PHP Tags

Problem

You want to remove HTML and PHP tags from a string or file.

Solution

Use strip_tags( ) to remove HTML and PHP tags from a string:

$html = '<a href="http://www.oreilly.com">I <b>love computer books.</b></a>';
print strip_tags($html);
I love computer books.

Use fgetss( ) to remove them from a file as you read in lines:

$fh = fopen('test.html','r') or die($php_errormsg);
while ($s = fgetss($fh,1024)) {
    print $s;
}
fclose($fh)                  or die($php_errormsg);

Discussion

While fgetss( ) is convenient if you need to strip tags from a file as you read it in, it may get confused if tags span lines or if they span the buffer that fgetss( ) reads from the file. At the price of increased memory usage, reading the entire file into a string provides better results:

$no_tags = strip_tags(join('',file('test.html')));

Both strip_tags( ) and fgetss( ) can be told not to remove certain tags by specifying those tags as a last argument. The tag specification is case-insensitive, and for pairs of tags, you only have to specify the opening tag. For example, this removes all but <b></b> tags from $html:

$html = '<a href="http://www.oreilly.com">I <b>love</b> computer books.</a>';
print strip_tags($html,'<b>');
I <b>love</b> computer books.

See Also

Documentation on strip_tags( ) at http://www.php.net/strip-tags and fgetss( ) at http://www.php.net/fgetss.

Get PHP Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.