Important Issues

This is where it starts to get really messy. We have discussed the production of three standards in this book, but there are hundreds of variations in the wild. This chaos has arisen because of a combination of the confusion of standards development, common misunderstandings about what constitutes valid XML, and a general agreement among the developers of feed-reading applications that they would parse invalid feeds at all costs. Indeed, it is because of these clashing versions of the RSS family that the Atom project was started in the first place.

When it comes to parsing any given feed, therefore, you need to take one specific fact into account: the feed is probably invalid. This might seem harsh, but you have to remember that the RSS community went through a long period when the specifications were so loosely defined that it was hard to pin down exactly what was and wasn’t valid. These feeds, and the systems that produce them, have not been revisited. Standards compliance aside, it’s thought that at least 10% of feeds aren’t even valid XML. This causes a lot of problems in itself.

The situation is improving, but you must remain aware of the problem. The most popular newsreader applications are built to be liberal parsers. That is, they act like modern web browsers and try to work round as many errors as they can. This tends to bring about a false sense of security that will betray you when you use your own parsing tools.

There is also a great deal of debate about ...

Get Developing Feeds with RSS and Atom now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.