Developing Feeds with RSS and Atom

A Short History of RSS and Atom

In the Developer’s Bars of the world—those dark, sordid places filled with grizzled coders and their clans—a special corner is always reserved for the developers of content-syndication standards. There, weeping into their beer, you’ll find the veterans of a long and difficult process. Most likely, they will have the Thousand Yard Stare of those who have seen more than they should. The standards you will read about in this book were not born fresh and innocent, of a streamlined process overseen by the Wise and Good. Rather, the following chapters have been dragged into the world and tempered through brawls, knife fights, and the occasional riot. What has survived, it is hoped, is hardy enough to prosper for the foreseeable future.

To fully understand these wayward children, and to get the most out of them, it is necessary to understand the motivations behind the different standards and how they evolved into what they are today.

HotSauce: MCF and RDF

The deepest, darkest origins of the current versions of RSS began in 1995 with the work of Ramanathan V. Guha. Known to most simply by his surname, Guha developed a system called the Meta Content Framework (MCF). Rooted in the work of knowledge-representation systems such as CycL, KRL, and KIF, MCF’s aim was to describe objects, their attributes, and the relationships between them.

MCF was an experimental research project funded by Apple, so it was pleasing for management that a great application came out of it: ProjectX, later renamed HotSauce. By late 1996, a few hundred sites were creating MCF files that described themselves, and Apple HotSauce allowed users to browse around these MCF representations in 3D. Documentation still exists on the Web for MCF and HotSauce. See http://www.eclectica-systems.co.uk/complex/hotsauce.php and Example 1-1 for more.

Example 1-1. An example of MCF

begin-headers:
MCFVersion: 0.95
name: "Eclectica"
end-headers:

unit: "tagging.mco" 
name: "Tagging and Acrobat Integration" 
default_genl_x: -109
default_genl_y: -65
typeOf: #"SubjectCategory"

unit: "http://www.nplum.demon.co.uk/temptin/temptin.htm" 
name: "TemptIn Information Management Template" 
genls_pos: ["tagging.mco" -85 -137]

unit: "http://www.nplum.demon.co.uk/temptin/tryout.htm" 
name: "Download Try-out Version" 
genls_pos: ["tagging.mco" -235 120]

It was popular, but experimental, and when Steve Jobs’s return to Apple’s management in 1997 heralded the end of much of Apple’s research activity, Guha left for Netscape.

There, he met with Tim Bray, one of the original XML pioneers, and started moving MCF over to an XML-based format. (XML itself was new at that time.) This project later became the Resource Description Framework (RDF). RDF is, as the World Wide Web Consortium (W3C) RDF Primer says, “a general-purpose language for representing information in the World Wide Web.” It is specifically designed for the representation of metadata and the relationships between things. In its fullest form, it is the basis for the concept known as the Semantic Web—the W3C’s vision wherein computers can understand the meaning of, and the relationships between, documents and other data. You can read http://en.wikipedia.org/wiki/Semantic_Web for more details.

Channel Definition Format

In 1997, XML was still in its infancy, and much of the Internet’s attention was focused on the increasingly frantic war between Microsoft and Netscape.

Microsoft had been watching the HotSauce experience, and early that year the Internet Explorer development team, along with some others, principally a company called Pointcast, created a system called the Channel Definition Format (CDF).

Released on March 8, 1997, and submitted as a standard to the W3C the very next day, CDF was XML-based and described both the content and a site’s particular ratings, scheduling, logos, and metadata. It was introduced in Microsoft’s Internet Explorer 4.0 and later into the Windows desktop itself, where it provided the backbone for what was then called Active Desktop. The CDF specification document is still online at http://www.w3.org/TR/NOTE-CDFsubmit.html, and Example 1-2 shows a sample.

Example 1-2. An example CDF document

<!DOCTYPE Channel SYSTEM "http://www.w3c.org/Channel.dtd" >
<Channel HREF="http://www.foosports.com/foosports.cdf" IsClonable=YES >

<IntroUrl
VALUE="http://www.foosports.com/channel-setup.html" />
<LastMod VALUE="1994.11.05T08:15-0500" />
<Title VALUE="FooSports" />
<Abstract VALUE="The latest in sports and atheletics from FooSports" />
<Author VALUE="FooSports" />

<Schedule>
<EndDate VALUE="1994.11.05T08:15-0500" />
<IntervalTime DAY=1 />
<EarliestTime HOUR=12 />
<LatestTime HOUR=18 />
</Schedule>

<Logo HREF="http://www.foosports.com/images/logo.gif" Type="REGULAR" />

<Item HREF="http://www.foosports.com/articles/a1.html">
<LastMod VALUE="1994.11.05T08:15-0500" />
<Title VALUE="How to get the most out of your mountain bike" />
<Abstract VALUE="20 tips on how to work your mountain-bike
to the bone and come out on top." />
<Author VALUE="FooSports" />
</Item>

<Channel IsClonable=NO >
<LastMod VALUE="1994.11.05T08:15-0500" />
<Title VALUE="FooSports News" />
<Abstract VALUE="Up-to-date daily sports news from FooSports" />
<Author VALUE="FooSports" />

<Logo HREF="http://www.foosports.com/images/newslogo.gif" Type="REGULAR" />
<Logo HREF="http://www.foosports.com/images/newslogowide.gif" Type="WIDE" />

<Item HREF="http://www.foosports.com/articles/news1.html" >
<LastMod VALUE="1994.11.05T08:15-0500" />
<Title VALUE="Michael Jordan does it again!"/>
<Abstract VALUE="Led by Michael Jordan in scoring, the Chicago Bulls make it to the 
playoffs again!"/>
<Author VALUE="FooSports" />
</Item>

<Item HREF="http://www.foosports.com/articles/news2.html" />
<LastMod VALUE="1994.11.05T08:15-0500" />
<Title VALUE="Islanders winning streak ends"/>
<Abstract VALUE="The New York islanders' 10-game winning streak ended with a disappointing 
loss to the Rangers" />
<Author VALUE="FooSports" />
</Item>

</Channel>

<Item HREF="http://www.foosports.com/animations/scrnsvr.html" />
<Usage VALUE="ScreenSaver"></Usage>
</Item>

<Item HREF="http://www.foosports.com/ticker.html" />
<Title VALUE="FooSports News Ticker" />
<Abstract VALUE="The latest sports headlines from FooSports" />
<Author VALUE="FooSports" />
<LastMod VALUE="1994.11.05T08:15-0500" />

<Schedule>
<StartDate VALUE="1994.11.05T08:15-0500" />
<EndDate VALUE="1994.11.05T08:15-0500" />
<IntervalTime DAY=1 />
<EarliestTime HOUR=12 />
<LatestTime HOUR=18 />
</Schedule>
</Item>

</Channel>

Very soon after its release, the potential of a standard, XML-based syndication format became apparent. By April 14, 1997, just over a month since Microsoft gave the standard its first public viewing, Dave Winer’s UserLand Software released support for the format into its Frontier product. Written by Wes Felter, and built upon by Dave Winer, it would be the company’s first foray into XML-based syndication, but by no means its last. UserLand was to become a major character in our story.

CDF was an exciting technology. It had arrived just as XML was being lauded as the Next Big Thing, and that combination—of a useful technology with a whole new thing to play with—made it rather irresistible for the nascent weblogging community. CDF, however, was really designed for the bigger publishers. A lot of the elements were overkill for the smaller content providers (who, at any rate, didn’t consider themselves content providers at all), and so a lot of webloggers started to look into creating a simpler specification.

RSS 1.0, RSS 2.0, and Atom are all deeply entrenched within the weblogging community, and I refer to weblogging, webloggers, and weblogs themselves frequently within this book. If you’ve never heard of the activity, it is easily explained. A weblog is, at heart, a personal web site, consisting of diary-like entries displayed in reverse chronological order. Weblogging, or blogging for short, is the activity of writing a weblog, or blog, upon which a weblogger, or blogger, spends his time. Weblogging is extremely popular: at time of writing, in late 2004, there are an estimated four million weblogs being written worldwide. The vast majority of these produce a syndication feed.

For good examples of weblogs, visit http://www.weblogs.com for a list of recently updated sites. My own weblog is found at http://www.benhammersley.com/weblog/index.html.

O’Reilly has also published a book on weblogging, Essential Blogging.

On December 27, 1997, Dave Winer started to publish his own weblog, Scripting News (http://www.scripting.com) in his own scriptingNews format, in addition to the CDF feed he had been providing since the spring. This, it was soon to be apparent, was a major step toward the RSS we have today. By early 1998, other formats were appearing, notably the Wilma Project, but all things considered, none were proving particularly popular. Mostly, it has to be said, because at this point, the weblogging world was very small.

RSS First Appears

By 1998, Netscape’s share of the browser market was in trouble. Microsoft’s release of Internet Explorer 4.0 the previous year was eating into Netscape’s position at the top of the market. Something had to be done, and so, in May 1998, Netscape formed a development team to work on the internally code-named “Project 60.”

When it launched on July 28, 1998, Project 60 was the My Netscape portal. It was a personalized front page that—in the traditional dot-com era business model—would capture eyeballs and provide sticky content. To this end, Netscape signed content-sharing deals with publishers like CNET to display its content within the portal.

Internally, this was done with an ever developing set of tools that were forever being renamed. Starting out as Site Preview Format (SPF) and then called Open-SPF, the format was developed by Dan Libby and based on the work Guha was doing with RDF. Netscape, at that time, was building an RDF parser into the Netscape 5 browser; Libby ripped that out and built a feed parsing system to drive the Netscape pages on its server. Content providers gave Netscape feeds, and Netscape incorporated those feeds into its site.

My Netscape benefited from this in many ways: it suddenly had a massive amount of content given to it for free. Of course, Netscape had no control over it or any real way to make money from it directly, but the additional usefulness of Netscape’s site made people stick around longer. In the heat of the dot-com boom, allowing people to put their own content on a Netscape page, alongside advertising sold by Netscape, was a very good idea: the portal could both save money on content and make more on ad sales. The user also benefited: having favorite sites summarized on one page meant one-stop shopping for a day’s browsing—a feature many found extremely useful. The feed provider didn’t lose out either, gaining both additional traffic and wider exposure.

The technology didn’t stop moving. The Open-SPF format was released as an Engineering Vision Statement on February 1, 1999, and a week later, Dave Winer picked up on it and suggested out loud that an XML format for webloggers might be useful (http://static.UserLand.com/UserLandDiscussArchive/msg002809.html):

I get frequent requests to channel Scripting News content thru my.netscape.com. I don’t have time to learn how it works. However, we have an always-current XML version of the last day of Scripting News, and would be happy to support Netscape and others in writing syndicators of that content flow. No royalty necessary. It would be easy to have a search engine feed off this flow of links and comments. There are starting to be a bunch of weblogs, wouldn’t it be interesting if we could agree on an XML format between us?

Then everything sped up. On February 11, Bill Humphries documented the XML format he was using for his Whump weblog, calling it “More Like This.” On February 22, Scripting News was publishing in Open-SPF and was available on the Netscape site. On March 1, 1999, after yet another name change, Dan Libby released the specification document for RDF-SPF 0.9. One final name change later, it became RSS 0.9, and RSS was here.

The first desktop aggregator, Carmen’s Headline Viewer, was released on April 25, 1999. UserLand followed Netscape with the second web-based aggregator, my.UserLand.com, on June 10. On July 2, the Syndication mailing list was started, and later Winer spoke on the telephone with the Netscape team to suggest some changes. After a short rest, the standard was off again.

The Standards Evolve

The first draft of the RSS format, as designed by Dan Libby, was a fully RDF-based data model that people inside Netscape felt was too complicated for end users. The resultant compromise—RSS 0.9—was not truly useful RDF nor was it as simple as it could be.

Some felt that using RDF improperly was worse than not using it at all, so when RSS 0.91 arrived, the RDF nature of the format was dropped. As Dan Libby explained to the rss-dev email list (http://groups.yahoo.com/group/rss-dev/message/239):

At the time, the primary users of RSS (Dave Winer the most vocal among them) were asking why it needed to be so complex and why it didn’t have support for various features, e.g. update frequencies. We really had no good answer, given that we weren’t using RDF for any useful purpose. Further, because RDF can be expressed in XML in multiple ways, I was uncomfortable publishing a DTD for RSS 0.9, since the DTD would claim that technically valid RDF/RSS data conforming to the RDF graph model was not valid RSS. Anyway, it didn’t feel “clean”. The compromise was to produce RSS 0.91, which could be validated with any validating XML parser, and which incorporated much of UserLand’s vocabulary, thus removing most (I think) of Dave’s major objections. I felt slightly bad about this, but given actual usage at the time, I felt it better suited the needs of its users: simplicity, correctness, and a larger vocabulary, without RDF baggage.

On July 10, 1999, three days after the fateful phone call, RSS 0.91 was released. It incorporated new features from UserLand Software’s scriptingNews format and was completely RDF-free. So, as would become a habit whenever a new version of RSS was released, the meaning of the RSS acronym was changed. While before it stood for “RDF Site Summary” in the RSS 0.91 specification, Dave Winer explained:

There is no consensus on what RSS stands for, so it’s not an acronym, it’s a name. Later versions of this spec may say it’s an acronym, and hopefully this won’t break too many applications.

A great deal of research into RDF continued, however. Indeed, Netscape’s RSS development team was always keen to use it. Their original specification (the one that was watered down to produce RSS 0.9) was published on the insistence of Dan Libby, and, although it has long since gone from the Netscape servers, you can find it in the Internet Archive (http://web.archive.org/web/20001204123600/http://my.netscape.com/publish/help/futures.html).

Netscape, however, was never to release any new versions: the RSS team was disbanded as the My Netscape Network was closed. So, when work began on a new version of RSS, it was left to the development community in general to sort out. The first pressing need involved including categories in the feed. By September 9, for example, Jon Udell was suggesting the use of a category element. It was the urge to add this and other new features that broke the development community in two.

The first camp, led by O’Reilly’s Rael Dornfest, wanted to introduce some form of extensibility to the standard. The ability to add new features, perhaps through modularization, necessitated such complexities as XML namespaces and the reintroduction of RDF, as envisioned by the Netscape team.

However, the second camp, led by Dave Winer, feared that this would add a level of complexity unwelcome to users. They wanted to keep RSS as simple as possible. The thinking at the time was that RSS, like HTML, would be learned by users viewing source and experimenting. An RDF-based specification would look extremely daunting.

The First Fork

The debate raged for nearly a year. August 14, 2000 saw the start of the RSS 1.0 mailing list and increasing polarization between the simple and RDF camps. On December 6, 2000, after a great deal of heated discussion, RSS 1.0 was released. It embraced the use of modules, XML namespaces, and a return to a full RDF data model. Two weeks later, on Christmas Day 2000, Dave Winer released RSS 0.92 as a rebuttal of the RDF alternative. The standard had forked.

It remained like this for four months: Netscape published the RSS 0.91 specification; UserLand published the 0.92 specification, which was upward-compatible with 0.91; and the RSS 1.0 Working Group published a 1.0 specification, which was not. Then, in early April 2001, My Netscape closed. A few weeks later, in mid-April, the RSS 0.91 DTD document Netscape had been hosting was pulled offline. Immediately, every parser that had been verifying feeds against it stopped working. This was early on in the XML world, and people didn’t know that this sort of architecture was a bad idea. (That DTD, incidentally, was written by Lars Marius Garshol, who wasn’t working at Netscape at all. He’d created the DTD by reverse engineering the specification, and had then given it to Dan Libby.)

UserLand came to the rescue. On April 27, Winer published a copy of the Netscape DTD on his own server. It’s still there: http://www.scripting.com/dtd/rss-0_91.dtd. Through this act, more than any other, UserLand claimed the right to be seen as the guardian of the 0.9x side of the argument.

Version 0.92, therefore, superseded 0.91, and that was how it remained for two years: two standards—RSS 0.92 as the simple, entry-level specification and RSS 1.0 as the more complex, but ultimately more feature-packed specification. And, of course, some people didn’t use the additional features of 0.91 and so were de facto RSS 0.91 users as well.

For the users of RSS feeds, this fork was not a major worry because the two standards remained compatible in practice. Even parsers specifically built to parse only RSS, rather than XML in general, can usually read simple examples of either version with equal ease, although the RDF implications go straight over the head of all but specifically designed RDF parsers.

All this, however, was changing.

The Second Fork

In late summer 2002, the RSS community forked again, perhaps irreversibly. Ironically enough, the fork came from an effort to merge the 0.9x and 1.0 strands from the previous fork and create an RSS 2.0 that would satisfy both camps.

Once again, the argument quickly settled into two sides. On one side, Dave Winer and a few others continued to believe in the importance of simplicity above all else, and regarded RDF as a technology that had yet to show any value within RSS. Winer also, for his own reasons, didn’t want the discussion over RSS 2.0 to take place on the traditional email lists. Rather, he wanted people to express their points of view in their weblogs, to which he would link his own at http://www.scripting.com.

On the other side, the members of the rss-dev mailing list, from which RSS 1.0 was born and nurtured to maturity, still wanted to include RDF within the specification—albeit in various simplified forms—and wished to hold the discussion on a publicly archived, centralized mailing list not subject to anyone’s filtering.

In many ways, both things happened. After a great deal of acrimony, UserLand released a specification they it RSS 2.0 and declared RSS frozen. That this was done without acknowledging, much less taking into account, the increasing concerns—both technical and social—of the rss-dev and RDF communities at large, caused much unhappiness.

After RSS 2.0’s release on September 16, 2002, the members of the rss-dev list started discussions on a possible name change to their own new, RSS 1.0-based specification. This would go hand in hand with a complete retooling of the specification, based on a totally open discussion and a rethink of the use of RDF. This ended up being, as you’ll see, a far more radical effort than it started out to be.

Pie, Echo, Necho, Atom

By June 2003, it was obvious that the continual in-fighting was going to go nowhere. The RSS specification process had reached an impasse, and was socially, if not technically, dead. From this wreckage, Sam Ruby, a programmer at IBM, started to discuss, quietly, the philosophical basis of what a syndication feed should be. He based his thinking not on the business needs of Microsoft or Netscape, nor on the long and bitter history of the RSS community but, instead, decided to start afresh. The idea was to build a conceptual model of a weblog entry, then design both a syndication format and a posting and editing API around the model. It was to be new and vendor-neutral, and the specification was to be very detailed indeed, which addressed a common criticism of both RSS 2.0 and 1.0.

It would also be developed in a rather unusual way. Instead of the bickering mailing lists, or the deeply biased weblog discussions, the new format would be developed on a wiki. The standard would be continually refactored by all comers until something good was revealed, and then further polished by many hands.

Meanwhile, Dave Winer had moved from UserLand Software to take up a one-year fellowship at the Berkman Center for Internet and Society at Harvard Law School. On July 15, 2003, UserLand gave the copyright of the RSS 2.0 specification to Harvard, who then published it under the Creative Commons Attribution/Share Alike license. In addition to this, Harvard created a three-man Advisory Board to aid RSS 2.0’s evolution. It consisted of Dave Winer, Jon Udell, and Brent Simmons (the author of NetNewsWire, a very popular RSS reader application).

Now the syndication world had three different groups: the RSS 2.0 Advisory Board; the RSS 1.0 working group, which was now almost completely dormant, having long considered the specification finished; and the ad hoc community surrounding the new effort.

The ad hoc group needed to decide on a name for its project. Initially nicknamed Pie, it went through Echo and Necho before going into a long process that whittled down over 260 different suggestions. In the end, as the title of this book suggests, the group voted to call it Atom.

Today’s Scene

Atom’s development continues: this book is based on Atom 0.5. Things may have changed by this book’s publication, but in general, the furor seems to have settled down. RDF isn’t included within Atom, but each individual element is very finely specified. This, as you’ll see in later chapters, makes a good deal of difference.

Just over a year after he formed it, on July 1, 2004, Dave Winer resigned from the RSS 2.0 Advisory Board. The other two members did likewise, and have been replaced by Rogers Cadenhead, Adam Curry, and Steve Zellers, who remain to this day. The RSS 2.0 specification has not changed at all since then.

Although the core specification has remained the same for a couple of years, RSS 1.0 is still in heavy development, although in areas far from those Atom is concerned with. When RSS 1.0 was first developed, its novelty was matched by that of the RDF standard itself. Now that RDF has matured, RSS 1.0 is there with it, and being used very heavily, albeit in entirely different fields from RSS 2.0 or Atom. You’ll see how in later chapters.

As it stands, therefore, the versioning-number system of RSS is misleading. Taken chronologically, 0.9 was based on RDF, 0.91 was not; 1.0 was, 0.92 was not; and now 2.0 is not. Version 1.0 is, and Atom currently isn’t. It should be noted that there is an RSS 3.0, proposed by Aaron Swartz as part of long rss-dev in-joke. (The joke culminated with a proposal to have RSS 4.0 expressed entirely through the medium of interpretive dance.) Search engine results finding these specifications are therefore wrong, though dryly funny.

In this book, therefore, I will concentrate on three flavors of syndication feeds: RSS 2.0, RSS 1.0, and Atom 0.5. For feed publishers, the three strands each have their own advantages and disadvantages, and their own specific uses. I’ll cover these in each of the relevant chapters.

Get Developing Feeds with RSS and Atom now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Developing Feeds with RSS and Atom by Ben Hammersley