The Yahoo Firehose "feed" isn’t a feed at all


The web has been on a big trend of real-time for the past couple years. Friendfeed was one of the first services to show real-time updates across your social network and real-time feeds took the stage in a big way when Twitter started its streaming API. In April, Yahoo! announced it’s Firehose API claiming “it includes a real-time feed of every public action taken on our network”. The thing is, this isn’t a “feed” or a “stream” in the same sense that Twitter’s streaming API is. It’s a database you can poll with Yahoo’s YQL, an SQL like query language. Sure, the updates may be available in their database in near real-time, but to receive them you need to issue a new request. In fact the only way you know if there are updates is to continuously poll the service. A feed would be something like long-polling with HTTP server push (what twitter does) or PubSubHubbub.

It may be just semantics to some, but this bothers me. To those of us who build applications that publish or consumer real-time information this is a very important distinction. I plan on writing a python library that wraps flickr’s polling API into a “real-time” blocking continuous stream for a project I’m working on. I’ll publish the code on github and post it here when done.


Leave a Reply