The real-time web loves the word firehose. I’m not completely sure where the word started being used in relation to streaming data, but the first time I cared was with Twitter. The Twitter firehose, all the updates from everyone in real-time, the full stream. It seemed like everyone wanted it, few people needed it, and even fewer got it. Some every interesting things were built and bought (summize / now search.twitter.com) Now every company offers a firehose, right? Well some do. Wordpress pushes all their updates in realtime via xmpp or pubsubhubbub. And you can even purchase the full feed. Others have stepped forward with streams of data, Myspace for example.
So what exactly is a firehose. It goes beyond being a real-time push stream of data. If you just had my twitter stream in real-time would you call that a firehose, hardly. Firehose is all public data, the full thing.
As the Twitter Api says:
Returns all public statuses. The Firehose is not a generally available resource. Few applications require this level of access.
Now I’m starting to see more people say “we got a firehose!” when they dont have anything close.
First, Yahoo. See the Read Write Web Article (which shows they didnt look or didnt care, just repeat the Yahoo Press release)
The first sentence makes it clear that it is not a firehose:
The Updates Firehose API gives developers access to the full, real-time Yahoo! Updates index and allows developers to search, filter and combine Updates data using YQL.
That’s right, the full firehose to query. What? It isnt push, it isnt a full stream of data, it isnt even close. Instead this is just a combination of several APIs from yahoo into one endpoint. That’s cool for what that is worth, but that isnt real-time and it isnt streaming.
Second Spinn3r. From their site:
Spinn3r listens to a new Twitter firehose API which is a sample of the full Twitter feed.
Again, if it isnt the full stream it isnt the firehose. That’s like firefighters showing up with a garden hose. (likely the api method they use)
I’m glad to see that we have gotten more streams of real-time data on the web. I’m glad to see some companies providing a firehose of their content, even if they charge for it. The ability to build some of the next large scale ideas depends on that. Just dont crowd the search results with overusing the term, use it correctly.