Parsing for Programming

The ability to display a feed on a web page is important, no doubt about it, but it’s not going to really excite anyone. To do that, you need to be able to parse feeds inside your own programs. In this section, we’ll look at the two major alternatives, MagpieRSS and the Ultraliberal Feed Parser. Both parsers are libraries; both convert feeds into native data structures; and neither cares whether a feed is RSS 1.0, RSS 2.0 or Atom. That, really, is the final word with respect to the Great Battle of the Standards; most of the time, at a programmatic level, no one cares.

PHP: MagpieRSS

The most popular parser in PHP, and arguably the most popular in use on the Web right now, is Kellan Elliott-McCrea’s MagpieRSS. As I write this, it stands at version 0.7, a low number indicative of modesty rather than product immaturity. MagpieRSS is a very refined product indeed.

To use MagpieRSS, first download the latest build from its web page at http://sourceforge.net/projects/magpierss/. There is also a weblog at http://laughingmeme.org/magpie_blog/.

Once downloaded, you’re presented with a load of READMEs and example scripts, plus five include files:

  • rss_fetch.inc is the library you call from scripts. It deals with retrieving the feed, and marshals the other files into parsing it, before returning the results to your code.

  • rss_parse.inc deals with the nitty gritty of feed parsing. MagpieRSS is a liberal parser, which means it doesn’t validate the feed it is given. It can ...

Get Developing Feeds with RSS and Atom now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.