Blog by Sumana Harihareswara, Changeset founder

30 Jan 2010, 13:41 p.m.

Insta-RSS Feeds: A Case Study In Freedom

Hi, reader. I wrote this in 2010 and it's now more than five years old. So it may be very out of date; the world, and I, have changed a lot since I wrote it! I'm keeping this up for historical archive purposes, but the me of today may 100% disagree with what I said then. I rarely edit posts after publishing them, but if I do, I usually leave a note in italics to mark the edit and the reason. If this post is particularly offensive or breaches someone's privacy, please contact me.

If you use Google Reader, now you can subscribe to any webpage as though it had a feed and thus automatically get alerted whenever it changes. When the Colbert Report free tickets page opens up new dates, or my slang dictionary adds items, you'll know.

Leonard started providing a version of this service years ago with his Syndication Automat. Now he only needs to use it to generate one feed: new publications from Dover. Sites have gotten sensical and started providing their own feeds. If you want something to run on your own server to make RSS feeds for pages that don't have them, you can use his free Scrape 'N' Feed code.

(I learned of this Google Reader feature via Matt Cutts, and his readers imply that there are paid services the change will undercut. Just another reminder that packaging up a free open source script with lovely UI can make you some cash -- for a while, until it turns up as a free feature in a popular app or OS. That's the S-curve of innovation, or temporal arbitrage.)

An RSS feed gives you data in an easy-to-mess-with format. For example, it would be easy enough to plug an RSS feed into a version control system so you could track diffs, reading the change history as easily as if it were a wiki page. Or you could use it in something like the Launchpad bug tracker's remote bug watch. You can enter a bug in Launchpad and if it's a duplicate of a bug in someone else's bugtracker, Launchpad uses that other bugtracker's API to keep an eye out, and lets you know when the remote bug's status changes. Enlarge your scope from software to something like MediaBugs (an RSS feed is basically the simplest possible RESTful API) and you can set up your system to automatically watch for particular journalists citing the same sources over and over, or calculate the proportion of an e-publisher's new releases that come un-DRM'd.

If you want to do forensic economics like Suresh Naidu, then the ability to get an RSS feed of any random webpage is especially cool. And do you remember the people who used Leonard's Beautiful Soup code to catch an international arms dealer? Quote from the lead investigator:

Anyway, the ViktorFeed is a development of basic python scripts I've been using for some time to collect data on certain aircraft movements through Sharjah and Dubai Airports. Both of these place all movements on the Web, but neither of them provide anything like an RSS feed, which is why I began scripting, in order to save checking them myself.

Whether it's deliberate or negligent, making a webpage without an RSS feed is a way of disempowering readers, and of making it slightly harder to vacuum that data into the market-flattening maw. It's like how certain archives will keep a controversial document in a room and only let people read it in that room, no cameras, no notepaper. Google plays nice with these kinds of restrictions, so site owners can opt out and then Google Reader users won't be able to make or read feeds for those pages. Not an antifeature, per se, but definitely a technical restriction on the user to enforce other people's whims. Scrape 'N' Feed has no such scruples, of course. If you don't want me to know what's on that page, don't put it on the web.


Sumana Harihareswara
30 Jan 2010, 15:05 p.m.

The free ChangeDetection.comservice, which for ten years has been providing a similar service (starting with the Central Pacific Railroad Photographic History Museum), also respects robots.txt and tags to let webmasters opt out of monitoring.

Sumana Harihareswara
30 Jan 2010, 16:33 p.m.

On re-read: I am aware of the issue of spidering as a drain on advertising revenues & bandwidth. The canonical example is the small-time webcomics artist who makes a fair amount of her livelihood from web ads and loses those impressions if people read the content via full-graphics screenscraped RSS feeds. The webcomics I read all provide some kind of RSS feed, sometimes with the graphics and sometimes not, and sometimes with ads included. The artists are then on stronger grounds to discourage or block other screenscrapers' feeds. For art/entertainment graphical content, partial-content RSS feeds make a lot of sense.