Wed 3 Sep 2008
Paper tigers and hidden dragons
Posted by Roy T. Fielding under software architecture, web architecture
[17] Comments
One of the sessions that I attended at OSCON was “Beyond REST? Building Data Services with XMPP PubSub†by Evan Henshaw-Plath and Kellan Elliott-McCrea. I think you can guess why that made me curious, but it was interesting to see how much that curiosity was shared by the rest of the conference: the room filled up long before the scheduled start. They certainly gave a very entertaining talk and one that spilled over into the blogosphere in posts by Stephen O’Grady, Joshua Schachter, and Debasish Ghosh.
Unfortunately, the technical argument was a gigantic paper tiger, which is a shame given that there are plenty of situations in which event-based architectures are a better solution than REST-based architectures. I made a brief comment about notification design and how they seemed to be ignoring a good twenty years of research on Internet-scale event notification systems. People need to understand that general-purpose PubSub is not a solution to scalability problems — it simply moves the problem somewhere else, and usually to a place that is inversely supported by the economics. I’ll have to explain that in a later post, since this one is focused on the technical.
Here’s the tiger:
On July 21st, 2008, friendfeed crawled flickr 2.9 million times to get the latest photos of 45,754 users, of which 6,721 of that 45,754 potentially uploaded a photo.
Conclusion: Polling sucks.
If you’d like to learn more about their XMPP solution, the slides are available from the OSCON website. I do think there is a lot to be learned from using different interaction styles and true stream-oriented protocols (the kind that don’t care about lost packets), but this FriendFeed example is ridiculous. It took me less than 30 seconds to design a better solution using nothing more than HTTP, and that while sitting in the middle of a conference session. This is exactly what Dare means by: “If a service doesn’t scale it is more likely due to bad design than to technology choice.”
They are comparing the efficiency of blind polling using HTTP crawls to a coordinated PubSub services setup with XMPP. Spidering an entire site is obviously not going to be efficient if you are only interested in what has changed. One advantage it has is that you don’t need to cooperate with the information provider. However, for the specific example of FriendFeed polling Flickr, cooperation is easy (both companies gain immensely by cooperating) and essential (2.9 million requests per day will get you blocked from any site that doesn’t want cooperation).
The solution, which I mentioned briefly at the talk, is to provide a resource that reflects all of the changes on Flickr during a given time period. Anyone (not just FriendFeed) can perform a GET on that resource to find out what has changed. In fact, one such example is the SUP (Simple Update Protocol) just introduced by FriendFeed. However, I had something more efficient in mind.
Web architects must understand that resources are just consistent mappings from an identifier to some set of views on server-side state. If one view doesn’t suit your needs, then feel free to create a different resource that provides a better view (for any definition of “better”). These views need not have anything to do with how the information is stored on the server, or even what kind of state it ultimately reflects. It just needs to be understandable (and actionable) by the recipient.
In this case, we want to represent the last-updated state of all Flickr users in a way that minimizes the lag between event and notification (let’s just assume that one minute is “fast enough” to receive a change notification). The simplest way of doing that is to log state changes by userid in a sequence of journal-style resources named according to the server’s UTC clock minutes. For example,
http://example.com/_2008/09/03/0000 http://example.com/_2008/09/03/0001 http://example.com/_2008/09/03/0002 ... http://example.com/_2008/09/03/2359
This URI pattern instantly drops the poll count from 2.9 million to 1440 (the number of minutes in a day) plus whatever pages are retrieved after we notice a user has changed their state. Alternatively, we could define a single append-only resource per day and use partial GET requests to retrieve only the bits since the last poll, but that tends to be harder on the server. Representations for the above resources can be generated by non-critical processes, cached, and even served from a separate distribution channel (like SUP).
What, then, should we include in the representation? Well, a simple list of relative URIs is good enough if the pattern is public, but that would be unwise for a site that features limited publication (obscured identifiers so that only people who have been given the URI can find the updated pictures). Likewise, the list might become unwieldy during event storms, when many users happen to publish at once. Of course, like any good CS problem, we can solve that with another layer of indirection.
Instead of a list of changed user ids or URIs, we can represent the state as a sparse bit array corresponding to all of Flickr’s users. I don’t know exactly how many users there are at Flickr, but let’s be generous and estimate it at one million. One million bits seems like a lot, but it is only 122kB in an uncompressed array. Considering that this array will only contain 1s when an update has occurred within the last minute, my guess is that it would average under 1kB per representation.
I can just imagine people reading “sparse bit array” and thinking that I must be talking about some optimal data structure that only chief scientists would think to use on the Web. I’m not. Any black-and-white GIF or PNG image is just a sparse bit array, and they have the nice side-effect of being easy to visualize. We can define our representation of 1 million Flickr users to be a 1000×1000 pixel black-and-white image and use existing tools for its generation (again, something that is easily done outside the critical path by separate programs observing the logs of changes within Flickr). I am quite certain that a site like Flickr can deliver 1kB images all day without impacting their scalability.
Finally, we need a way to map from the bits, each indicating that a user has changed something, to the much smaller set of users that FriendFeed knows about and wishes to receive notifications. If we can assume that the mapping is reasonably persistent (a month should be long enough), then we can define another resource to act as a mapping service. Such as,
http://example.com/_2008/09/03/users?{userid}
which takes as input a userid (someone that a friend already knows and wants to monitor for changes) and returns the coordinate within the sparse array (the pixel within the 1000×1000 image) that corresponds to that user. FriendFeed can store that accumulated set of “interesting users” as another image file, using it like an “AND mask” filter to find the interesting changes on Flickr.
Note that this is all just a quick thought experiment based on the general idea. In order to build such a thing right, I’d have to know the internals of Flickr and what kinds of information FriendFeed is looking to receive, and there are many potential variations on the representations that might better suit those needs (for example, periods could be overlapped using gray-scale instead of B&W). The implementation has many other potential uses as well, since the sequence of images provide an active visualization of Flickr health.
I should also note that the above is not yet fully RESTful, at least how I use the term. All I have done is described the service interfaces, which is no more than any RPC. In order to make it RESTful, I would need to add hypertext to introduce and define the service, describe how to perform the mapping using forms and/or link templates, and provide code to combine the visualizations in useful ways. I could even go further and define these relationships as a standard, much like Atom has standardized a normal set of HTTP relationships with expected semantics, but I have bigger fish to fry right now.
The point is that you don’t need to change technologies (or even interaction styles) to solve a problem of information transfer efficiency. Sometimes you just need to rethink the problem. There are many systems for which a different architecture is far more efficient, just as XMPP is far more efficient than HTTP for something like group chat. Large-scale collaborative monitoring is not one of them. An XMPP solution is far more applicable to peer-to-peer monitoring, where there is no central service that is interested in the entire state of a big site like Flickr, but even then we have to keep in mind that the economics of the crowd will dictate scalability, not the protocol used for information transfer.
It’s an interesting idea, but why assume just one bit per pixel (B&W)? Think how much more information you can convey if you had gray-scale (8-bits per pixel) or full-color (32).
The extra space could be used to indicate other kinds of change that might be of significance, say that the user has updated their profile, or they’ve posted a comment (vs. a picture), etc.
Even with 8-bits at 1M users, we’re talking about a reasonably small image file by Flickr standards. Besides these are server-to-server interactions with fat pipes and presumably hefty CPUs.
Two more thoughts:
– The image file itself can be exchanged via PubSub/XMPP between servers. That way, you get the best of timely updates AND resource optimization. In this case, it’s Peer-to-Peer but the peers happen to be servers.
– The same general concept should work with large-scale blogging services and RSS feed updates.
raminf: Yes, that’s what I meant by
and
Something as simple as an 8-color GIF can convey a huge amount of information and still compress well. The visualization is better if we restrict the color choices to a visible palette instead of the full color. I just used the B&W example because it is universal (applicable to any site with large numbers of users).
The poll vs push argument here is compelling : and certainly a “what has changed since TIMESTAMP” is a very good scaling solution. What push solves, however, is the near-real-time: you can get the content way quicker (the moment it goes out, basically).
The HTTP vs XMPP argument (which is only really hinted at here) is ridiculous. They are two different technologies, doing two different things. I also find the “only HTTP” comment odd: do you feel HTTP is somehow superior to other standard internet protocols?
Dear Roy:
I agree very much with your thoughts, and I like very much this kind of smart applications of REST. However, it may be a small issue, but you know users may come and go (enter into the system and being deleted), so each minute, the mapping user bit position may be different. Of course, I’m sure other strategies could be used to overcome this problem, as having a list of users at each minute doesn’t seem adequate. An incremental approach (for instance, always add new users at the end of the bit field, with an incremental identifier assigned) could be an option, but that may not be possible in all situations.
Regards,
diego.
singpolyma: I think you missed the point. I am not saying that PubSub/XMPP is bad, nor that REST/HTTP is particularly good.
The HTTP vs XMPP argument was being made in the talk. I don’t disagree with the argument that push is better than polling for many architectures. What I disagreed with is the paper tiger chosen to illustrate their point.
HTTP and XMPP have a lot in common; they are both overly verbose, for example. The primary mechanism for notifications on the Web right now, though, is email. XMPP is a slight improvement on that, but only if we can rely on XMPP server distribution in the same way that we currently rely on store-and-forward mail systems. XMPP and SMTP lists have the exact same scalability characteristics for this problem. Yet, if they had stood in front of the audience and said that Flickr should email FriendFeed every time a change is made, the reaction would have been quite different (even though the solution is equally elegant for that particular problem).
The real disadvantage of PubSub architectures is the inverse economy of scale, which I’ll have to address in a later post. The talk didn’t address that problem because the example has only one big consumer of events (FriendFeed).
dsevilla: Right, a persistent mapping is one where the correspondence between userid and pixel does not change for the life of that map. Allowing it to change monthly should be adequate to adjust for churn. The cost is that subscribers need to update their own interest map whenever the mapping function changes, so it would be better to use a versioned mapping that only changes when needed instead of the monthly one I described.
Roy: on first reading the bitmap idea seemed like forcing a square peg into a round hole. But it’s now growing on me, especially considering you could flip from bitmap pixel to URI pretty easily using something along the lines of URI templates –
http://example.org/users/{x},{y}
SMTP usually has much higher latencies than XMPP. Security in SMTP is… archaic. There is no presence in SMTP. There are drastically different protocols for MUAs and MTAs. Etc.
I agree that the problems that they solve can be thought of as roughly similar, but the differences in implication for event notification systems are more than slight.
hildjj: I am sure you know why SMTP usually has higher latencies than XMPP (reliable forwarding of large messages without persistent connections tends to add latency) and how to do security via encrypted payloads instead of connections. I have no doubt that XMPP is far superior to SMTP for just about every sort of messaging other than big email. However, none of that is relevant to the example used in the talk.
HTTP, XMPP, and SMTP are just tools — we should use whichever one best fits the deployment environment and interaction style. We should choose them based on rational design comparisons and actual data transmission needs, not paper tigers.
[…] Paper tigers and hidden dragons » Untangled "The point is that you don’t need to change technologies (or even interaction styles) to solve a problem of information transfer efficiency. Sometimes you just need to rethink the problem. There are many systems for which a different architecture is far more efficient, just as XMPP is far more efficient than HTTP for something like group chat. Large-scale collaborative monitoring is not one of them. An XMPP solution is far more applicable to peer-to-peer monitoring, where there is no central service that is interested in the entire state of a big site like Flickr, but even then we have to keep in mind that the economics of the crowd will dictate scalability, not the protocol used for information transfer." (tags: royfielding xmpp http web software scalability pubsub scaling notification messaging performance) […]
Have you seen the Six Apart Update Stream (http://updates.sixapart.com/)? It’s existed for a few years and is a never ending stream of Atom with all of the new content across TypePad, Vox, and LiveJournal. One of the things we like about it is the big search engines can consume it all and get full content while smaller services can just parse out the URL of the blog that updated and then decide if they care about the content.
Not saying it is the right solution, it certainly needs modernizing, but it works quite well for what we’re needing to do.
David: I remember hearing about the Six Apart Update Stream years ago, but hadn’t looked at it until now. That must be a goldmine for academic research. It seems like a good solution to the update polling problem, but only if we assume that listeners need/want to consume the full text of entries. It looks like Brad specifically had aggregators like FriendFeed in mind, but I can’t see how the full stream can remain scalable in the long run unless you can find a way to monetize the stream or restrict the feed to invited listeners.
Personally, I would suggest changing it to two streams — one with the full feed and another with just a URI and single-line title ending with a linefeed character. The latter is all that an event broker needs and is much easier to parse than Atom XML. Alternatively, an XMPP stream would be appropriate for small summaries published to many consumers, especially if you could convince someone else to broker the subscriptions (e.g., it would be a good demonstration stream for Jabber).
One advantage of SUP vs the bitmap idea is that it works for any HTTP resource — there doesn’t need to be any understanding of userids or the like. Also, SUP is already quite compact (about 8 bytes per update when compressed), but it would be fairly straightforward to create a bloom filter variant of SUP using the SUP-IDs as keys in the boom filter, though in practice I doubt the extra compression is necessary.
If you’re interested in discussing any of these ideas further, please email me paul @ friendfeed.
paul: The bitmap could also represent arbitrary URIs without the need for generated feed identifiers, since each URI is globally unique and can be used as the range in the identifier to pixel mapping. Likewise, SUP could be simplified by using a consistent hash of the feed URI instead of having the user generate a pseudo-unique id.
In any case, these solutions are very similar. The real differences are in the mapping function’s persistence, discovery of what URIs are monitored for updates, the representation format for update notifications, and the use of a single feed URI versus a template of separate resources. Each choice has trade-offs.
We chose opaque ids instead of URI hashes in order to avoid canonicalization issues (www vs not, .com vs .co.uk, etc) and to allow multiple URIs to share the same ID (RSS and Atom feeds of the same content, for example). It’s also helpful for sites that don’t want to expose their user namespace, since in practice fast hashes such as md5 are reversible through brute force.
That said, sites are free assign SUP ids by any method of their choosing, including URI hashes, so in practice this extra flexibility only makes things easier for SUP producers.
The SUP tag/header also provides for discovery of which URIs provide updates, and where those updates are published. This allows sites to publish multiple SUP feeds or even contribute to a shared feed published elsewhere.
See Joe Gregorio’s suggestion for using Bloom Filters instead of a sparse bit array. The trade-off of false positives for size is intriguing. It would allow us to describe the mapping directly, in the form of a set of independent hashing functions on the URI path, which would make it much easier to standardize the technique for any website.
Joe has fleshed-out his description in Bloom Filter Resources. Great stuff. Note that this can be generalized for any kind of “cool URI” (finite identifier set) update checks by using the URI path as key.
[…] need to address that teaser I included in my last post on paper tigers and hidden dragons: People need to understand that general-purpose PubSub is not a solution to scalability problems […]