Chapter 5. Advanced Parsing with Regular Expressions

Now that you’ve mastered the parsing techniques of the previous chapter, it’s time to look at advanced parsing with regular expressions, also known as regex. Regular expressions are an extraordinarily powerful and flexible tool. At first glance, they sound like the only tool you’ll ever need to parse web pages. But on further examination, you’ll discover that regular expressions shine in some situations—and are either overkill or simply not appropriate in others.

Regular expressions are not the easiest thing to learn, because a fair amount of parallel information is required to get even the simplest examples working. You’ll need to first understand the concept and have some idea of how patterns ...

Get Webbots, Spiders, and Screen Scrapers, 2nd Edition now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.