You want to match any HTML, XHTML, or XML tags in a string, in order to remove, modify, count, or otherwise deal with them.
The most appropriate solution depends on several factors, including the level of accuracy, efficiency, and tolerance for erroneous markup that is acceptable to you. Once you’ve determined the approach that works for your needs, there are any number of things you might want to do with the results. But whether you want to remove the tags, search within them, add or remove attributes, or replace them with alternative markup, the first step is to find them.
Be forewarned that this will be a long recipe, fraught with subtleties, exceptions, and variations. If you’re looking for a quick fix and are not willing to put in the effort to determine the best solution for your needs, you might want to jump to the section of this recipe, which offers a decent mix of tolerance versus precaution.
This first solution is simple and more commonly used
than you might expect, but it’s included here mostly for comparison
and for an examination of its flaws. It may be good enough when you
know exactly what type of content you’re dealing with and are not
overly concerned about the consequences of incorrect handling. This
regex matches a
< symbol, then simply continues
until the first
|Regex options: None|
This next regex is again ...