Developing Feeds with RSS and Atom

Legal Implications

The copyright implications for RSS feeds are quite simple. There are two choices for feed publishers, and these reflect on the user.

First, the publisher can decide that the feed must be licensed in some way. In this case, only authorized users can use the feed. It is good manners on the part of the publisher to make it as obvious as possible that this is the case—by providing a copyright notice in an XML comment, at least, and preferably by making it difficult for unauthorized users to get to the feed. Password protection is a reasonable minimum. Registering a pay-only feed with aggregators or allowing Google to see the feed is asking for trouble.

Second, and most commonly, the publisher can decide that the RSS feed is entirely free to use. In this case, it is only polite for the publishers of public RSS feeds to consider the feed entirely in the public domain—free to be used by anyone, for anything. This might sound a little radical to the average company vice president, but remember: there is nothing in the RSS feed that didn’t, in some way, in the actual source information in the first place. It is rather futile to get upset that someone might not be using your headlines in the company-approved font, or committing a similar infraction; it’s somewhat against the spirit of the exercise.

Screen-scraping a site to create a feed, by writing a script to read the site-specific layout, is a different matter. It has already been legally found, in U.S. courts at least (in the Ticketmaster versus Tickets.com case of October 1999 to March 2000), that linking to a page didn’t in itself a breach of copyright. And you can argue, perhaps less convincingly, that reproducing headlines and excerpts from a site comes under fair-use guidelines for review purposes. However, it is extremely bad form to continue scraping a site if the site owner asks you to stop. Instead, try to evangelize RSS to the site owner and get him to start a proper feed.

Nevertheless, for private use, screen-scraping is a useful technique. In later chapters you’ll see how running screen-scraping scripts on your local machine can produce extremely useful feed-based applications. Because these are entirely self-contained, there’s no legal issue at all.

If You Are Scraped

If you are being scraped heavily and want it stopped, there are four ways to do so. First, scrapers should obey the robots.txt directive; setting a robots.txt file in the root directory of your site sends a definite signal most will follow. Second, you can contact the scraper and ask her to stop; if she is professional, she will do so immediately. Third, you can block the IP address of the scraper, although this is sometimes rather like herding cats; scrapers can move around.

The fourth and best way is to make a feed of your own. I’ll show how to do so in the following chapters.

Get Developing Feeds with RSS and Atom now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Developing Feeds with RSS and Atom by Ben Hammersley

Legal Implications

If You Are Scraped

Don’t leave empty-handed

It’s yours, free.

Check it out now on O’Reilly