Wed 3 Sep 2008
One of the sessions that I attended at OSCON was “Beyond REST? Building Data Services with XMPP PubSub” by Evan Henshaw-Plath and Kellan Elliott-McCrea. I think you can guess why that made me curious, but it was interesting to see how much that curiosity was shared by the rest of the conference: the room filled up long before the scheduled start. They certainly gave a very entertaining talk and one that spilled over into the blogosphere in posts by Stephen O’Grady, Joshua Schachter, and Debasish Ghosh.
Unfortunately, the technical argument was a gigantic paper tiger, which is a shame given that there are plenty of situations in which event-based architectures are a better solution than REST-based architectures. I made a brief comment about notification design and how they seemed to be ignoring a good twenty years of research on Internet-scale event notification systems. People need to understand that general-purpose PubSub is not a solution to scalability problems — it simply moves the problem somewhere else, and usually to a place that is inversely supported by the economics. I’ll have to explain that in a later post, since this one is focused on the technical.
Here’s the tiger:
On July 21st, 2008, friendfeed crawled flickr 2.9 million times to get the latest photos of 45,754 users, of which 6,721 of that 45,754 potentially uploaded a photo.
Conclusion: Polling sucks.
If you’d like to learn more about their XMPP solution, the slides are available from the OSCON website. I do think there is a lot to be learned from using different interaction styles and true stream-oriented protocols (the kind that don’t care about lost packets), but this FriendFeed example is ridiculous. It took me less than 30 seconds to design a better solution using nothing more than HTTP, and that while sitting in the middle of a conference session. This is exactly what Dare means by: “If a service doesn’t scale it is more likely due to bad design than to technology choice.”
They are comparing the efficiency of blind polling using HTTP crawls to a coordinated PubSub services setup with XMPP. Spidering an entire site is obviously not going to be efficient if you are only interested in what has changed. One advantage it has is that you don’t need to cooperate with the information provider. However, for the specific example of FriendFeed polling Flickr, cooperation is easy (both companies gain immensely by cooperating) and essential (2.9 million requests per day will get you blocked from any site that doesn’t want cooperation).
The solution, which I mentioned briefly at the talk, is to provide a resource that reflects all of the changes on Flickr during a given time period. Anyone (not just FriendFeed) can perform a GET on that resource to find out what has changed. In fact, one such example is the SUP (Simple Update Protocol) just introduced by FriendFeed. However, I had something more efficient in mind.
Web architects must understand that resources are just consistent mappings from an identifier to some set of views on server-side state. If one view doesn’t suit your needs, then feel free to create a different resource that provides a better view (for any definition of “better”). These views need not have anything to do with how the information is stored on the server, or even what kind of state it ultimately reflects. It just needs to be understandable (and actionable) by the recipient.
In this case, we want to represent the last-updated state of all Flickr users in a way that minimizes the lag between event and notification (let’s just assume that one minute is “fast enough” to receive a change notification). The simplest way of doing that is to log state changes by userid in a sequence of journal-style resources named according to the server’s UTC clock minutes. For example,
http://example.com/_2008/09/03/0000 http://example.com/_2008/09/03/0001 http://example.com/_2008/09/03/0002 ... http://example.com/_2008/09/03/2359
This URI pattern instantly drops the poll count from 2.9 million to 1440 (the number of minutes in a day) plus whatever pages are retrieved after we notice a user has changed their state. Alternatively, we could define a single append-only resource per day and use partial GET requests to retrieve only the bits since the last poll, but that tends to be harder on the server. Representations for the above resources can be generated by non-critical processes, cached, and even served from a separate distribution channel (like SUP).
What, then, should we include in the representation? Well, a simple list of relative URIs is good enough if the pattern is public, but that would be unwise for a site that features limited publication (obscured identifiers so that only people who have been given the URI can find the updated pictures). Likewise, the list might become unwieldy during event storms, when many users happen to publish at once. Of course, like any good CS problem, we can solve that with another layer of indirection.
Instead of a list of changed user ids or URIs, we can represent the state as a sparse bit array corresponding to all of Flickr’s users. I don’t know exactly how many users there are at Flickr, but let’s be generous and estimate it at one million. One million bits seems like a lot, but it is only 122kB in an uncompressed array. Considering that this array will only contain 1s when an update has occurred within the last minute, my guess is that it would average under 1kB per representation.
I can just imagine people reading “sparse bit array” and thinking that I must be talking about some optimal data structure that only chief scientists would think to use on the Web. I’m not. Any black-and-white GIF or PNG image is just a sparse bit array, and they have the nice side-effect of being easy to visualize. We can define our representation of 1 million Flickr users to be a 1000×1000 pixel black-and-white image and use existing tools for its generation (again, something that is easily done outside the critical path by separate programs observing the logs of changes within Flickr). I am quite certain that a site like Flickr can deliver 1kB images all day without impacting their scalability.
Finally, we need a way to map from the bits, each indicating that a user has changed something, to the much smaller set of users that FriendFeed knows about and wishes to receive notifications. If we can assume that the mapping is reasonably persistent (a month should be long enough), then we can define another resource to act as a mapping service. Such as,
which takes as input a userid (someone that a friend already knows and wants to monitor for changes) and returns the coordinate within the sparse array (the pixel within the 1000×1000 image) that corresponds to that user. FriendFeed can store that accumulated set of “interesting users” as another image file, using it like an “AND mask” filter to find the interesting changes on Flickr.
Note that this is all just a quick thought experiment based on the general idea. In order to build such a thing right, I’d have to know the internals of Flickr and what kinds of information FriendFeed is looking to receive, and there are many potential variations on the representations that might better suit those needs (for example, periods could be overlapped using gray-scale instead of B&W). The implementation has many other potential uses as well, since the sequence of images provide an active visualization of Flickr health.
I should also note that the above is not yet fully RESTful, at least how I use the term. All I have done is described the service interfaces, which is no more than any RPC. In order to make it RESTful, I would need to add hypertext to introduce and define the service, describe how to perform the mapping using forms and/or link templates, and provide code to combine the visualizations in useful ways. I could even go further and define these relationships as a standard, much like Atom has standardized a normal set of HTTP relationships with expected semantics, but I have bigger fish to fry right now.
The point is that you don’t need to change technologies (or even interaction styles) to solve a problem of information transfer efficiency. Sometimes you just need to rethink the problem. There are many systems for which a different architecture is far more efficient, just as XMPP is far more efficient than HTTP for something like group chat. Large-scale collaborative monitoring is not one of them. An XMPP solution is far more applicable to peer-to-peer monitoring, where there is no central service that is interested in the entire state of a big site like Flickr, but even then we have to keep in mind that the economics of the crowd will dictate scalability, not the protocol used for information transfer.