XMPP As A Key Component Of The Social Web
Every once and a while I dust off my master's degree in computer science, and pontificate about the deeper implications of the social architecture we are composing on the web. A lot of our efforts have run into problems because the componentry people are trying to use is inadequate for scale.
Back in March, when Twitter was bracing for the SXSW torrent, I wrote
[from Twitter Braces For SXSW]What we would all love to hear is that they have ramped up the load-balancing and a reliable integration with a service like Amazon S3, that in principle dynamically scales with demand, so that we can do whatever we like, and the Twitter backbone will work. They threw out the Joyent solution a few months ago, professing love for the company but I guess not for the solution, and went to Verio, I think. Hasn't apparently gotten more stable, though, to my casual eye.
I really don't understand why the nice folks at Jabber, Inc, who parade their performance numbers up and down the street aren't in the mix at Twitter? Shout out to Joe Hildebrand: give these guys some help, please? Or is this going to be a mess until Google or Amazon or eBay or Microsoft buys Twitter?
The open source protocol XMPP is what Jabber is based on, and there is increasing interest across the web in seeing XMPP become a common component whenever messaging is involved.
A recent presentation at the Open Source Conference (OSCON) this week digs into the use of XMPP as a replacement for REST based web services:
[from OSCON day 1: Beyond REST? Building Data Services with XMPP PubSub - O'Reilly Radar by Robert Kaye][...]
For example, Kellan [Elliot-McCrea of Flickr, co-presenter with Evan Rabble, now of ENTP] talked about FriendFeed, a site that lets their users know when their friends share new items. In this example, Kellan pointed out that FriendFeed polls Flickr 2.9 million times in order to check on updates for 45 thousand users. And of those 45 thousand users, only 6.7 thousand are logged in at any one time. This of course, its a poor way of checking for changed content. Kellan says: "Polling sucks!"
It's clear that REST web-services provide the heavy lifting for many Web 2.0 sites, but its also clear that REST and its inherent polling mechanism isn't the best way of building a user notification system. With social networking sites not about to fade away, we're going to see an increasing need for capable message passing sites. And since Jabber is a well established and supported system, it only makes sense to piggyback on this great technology.
- Robert KayeTo solve this problem its key to leave standard REST web services behind and find a way to use message passing, which is a direct communication way of notifying users of changed content. The open and mature infrastructure that Rabble and Kellan found to use for this service is Jabber. Jabber has 10 years of experience of passing messages around the internet and has been embraced by many companies including Google.
XMPP, Jabber's protocol, works well for message passing and does not have many of the problems/limitations of HTTP:
1. XMPP works over persistent connections
2. It it stateful (SSL becomes cheap)
3. Designed as an event stream protocol
4. Natively federated and asynchronous
5. Identity, security and presence are built in.
6. Jabber servers are built and deployed to do this stuff.Given this, Kellan and Rabble decided to piggy-back a notification system on Jabber by sending XML fragments using a PubSub paradigm. In this context, PubSub is a simple method for passing XMPP pubsub stanzas via Jabber. PubSub is nothing more than a convention for how to send XML via Jabber, including a method for embedding ATOM fragments in the XML.
Rabble presented using XMPP for FireEagle, Yahoo!'s new personal geolocation service that allows users to provide their current location to other users. For a few users and a few updates you can paginate the data stream into RSS/atom feeds. But once you have more than a few users and frequent updates a paginated stream cannot keep up. What if a user publishes more updates than can an RSS feed can capture? Updates get lost -- and for applications using FireEagle missing an update presents a critical flaw. Using a system like XMPP, FireEagle can rely on Jabber to deliver all the updates -- exactly what Jabber was meant to do.
Kellan also applied XMPP/PubSub to Flickr and how a Flickr update "Firehose" might work. If Flickr sends a ~2k an atom enriched packet for each new public picture posted at a rate of 60 updates a second, it would take roughly a megabit of traffic. Even a normal DSL line can handle one mbit of traffic, so the network effects are manageable on this level, compared to the polling system that FriendFeed uses. (Kellan also points out that FriendFeed is not doing anything wrong at all -- the current web service centric model is simply insufficient for this type of service.)
To deploy your own message passing service based on XMPP/PubSub, you'll need to follow these 4 easy steps:
1. Get a Jabber client library. There are many available for all the popular languages.
2. Set up a Jabber server -- again there are many available to choose from. Turn off the features you won't be needing. (e.g. creating new accounts)
3. Build a component (according to Jabber XEP-0114)
4. Integrate the message passing system in your own site.Pretty simple, overall! The beauty of this approach comes from the fact that all off-the-shelf components were used to build this new notification system. No new magic technology is being created to enable this system, which is a personal metric of mine for determining the likelihood that a new system will succeed.
It's clear that REST web-services provide the heavy lifting for many Web 2.0 sites, but its also clear that REST and its inherent polling mechanism isn't the best way of building a user notification system. With social networking sites not about to fade away, we're going to see an increasing need for capable message passing sites. And since Jabber is a well established and supported system, it only makes sense to piggyback on this great technology.
I predict that this blueprint will be applied generally.
New services like Gnip might provide the equivalent of this model as a service layer in social apps, but whether tool developers opt to use a scalable service like Gnip or roll their own with XMPP and the Rabble/Kellen model above remains to be seen.

From Jud Valeski a Gnip co-founder:
"Many hurdles remain however. The entire network has been built out with HTTP connection assumptions. The way ports are monitored/firewalled inhibit any new connection types from taking hold. App developers think in terms of AJaX & *HttpRequest(), not publish and subscribe frameworks. XMPP is far from ubiquitous, and even if it's lucky, might not even broach HTTP's market penetration radar, but it deserves some time on the field."
http://one.valeski.org/2008/06/sockets-http-xmpp-and-leap-frog.html
Posted by: David Duey | July 24, 2008 at 01:29 PM
there's a gnip hyperlink that points to gnip.com -- shouldn't it be gnipcentral.com or something?
Posted by: Steven Michael Malloy | July 24, 2008 at 02:22 PM
It's unfortunate that technologists continue to propagate serious mistakes like "[...] its also clear that REST and its inherent polling mechanism isn't the best way of building a user notification system [...]"
REST is about state transfer - and event notifications are also state transfer.
As for HTTP, it isn't only "polling" - anyone that has posted a blog entry knows that. The 'client' can 'post' updates to the 'server' - exactly the same as event notifications via XMPP. The great thing about XMPP is the federated multi-hop capability with 'trust' built-in. Just like email, only with everyone using settings for very low latency delivery.
There have been multiple publish/subscribe over HTTP mechanism (comet, mod_pubsub, KnowNow, etc) over the years.
Posted by: MikeD | July 24, 2008 at 02:23 PM