Archive for September, 2008

About three weeks ago, a new “standard” for Content Management Interoperability Services (CMIS) was announced by EMC, IBM, and Microsoft with the usual fanfare of being the best thing since sliced bread and compliant with the latest buzzwords. One of those buzzwords, REST (as in Representational State Transfer), happens to be defined by my dissertation. I am getting tired of big companies making idiotic claims about REST and their so-called RESTful architectures. The only similarity between CMIS and REST is that they both have four-letter acronyms.

Note to our technology industry rags: A standard is an approved measure against which multiple independent organizations have agreed (by choice or by force) to have their products tested for compliance. A standards-effort is what we call a proposal when it is being actively worked on by some standards development organization. CMIS, on the other hand, is just a vendor proposal that is being submitted to OASIS. It only becomes a standards-effort once the OASIS members agree to host it, which shouldn’t be a problem given the pay-to-play nature of OASIS, and might become a standard if the final specification is approved.

CMIS is a thin veneer on RDBMS-based data repositories that provides a data model for document-like objects within filesystem-like folders, basic file versioning, and access via SQL queries and local object references. It is exactly the kind of document model one would expect within a legacy document management system that is backed by a large relational database and authored via Microsoft Office applications. No surprise, given the sponsors, and there are plenty of good reasons why folks would want to support such data models. For the interface, CMIS includes both a Web Services SOAP/WSDL protocol binding, tightly coupled to the data model, and a REST protocol binding, which also happens to be tightly coupled to the data model.

REST is an architectural style, not a protocol, and thus announcing it as a protocol binding is absurdly ignorant behavior for a group of technology companies. The RESTish protocol binding actually being proposed by CMIS is AtomPub, or at least it would be if not for the huge number of unnecessary protocol extensions that tunnel the Web Services interface through fake-Atom and fake-HTTP. The examples assume a single-script gateway that accepts methods in query strings with CMIS-* header fields to bind search scope, just like SOAP envelopes and bodies are used to tunnel object-specific protocols over HTTP. Are there any REST constraints that this binding doesn’t violate?

The SOAP protocol binding, in contrast, is more direct: half the number of pages and defined with WSDL and XSD. It is obvious that the SOAP binding was designed first and the AtomPub binding added for marketing reasons. I don’t think much of SOAP bindings, in general, but at least this one is consistent with the limited data model, the design of other SOAP-based services, and the goal of providing a control-oriented API for document management.

CMIS is a classic example of what happens when a control-oriented interface is slapped onto an HTTP-based protocol instead of redesigning the interface to be data-oriented. All of the lowest-common-denominator constraints of CMIS’ data model, which are necessary for the SOAP interface because its operations are object-specific, are completely unnecessary for an HTTP interface that is properly designed to be data-oriented. An HTTP interface doesn’t need to be limited to Atom feed formats for traversing folder hierarchies; hypertext is a lot more powerful than temporally-ordered query results. An HTTP interface doesn’t need to forbid the versioning of folders; hypertext can tell the client what operations are allowed on each folder. An HTTP interface doesn’t need a special query media type that (insanely) consists of a raw SQL statement embedded in XML sugar coating; any HTTP resource can be a stored procedure and any hypertext response can contain a list of results. An HTTP interface doesn’t need to traverse folders or query databases in order to access an object summary that points to a content stream that might then be downloaded; hypertext allows each object to be identified by a URI and manipulated independent of the discovery process.

CMIS is a Web Services interface for document management. It should be renamed WS-DMS and tossed on the same pile of other specs from that genre. WebDAV is a far more capable interface that has already been standardized to provide document-level write-access and versioning over HTTP. WebDAV isn’t very RESTful either, because it relies on folder operations instead of hypertext, but at least the WebDAV interface doesn’t interfere with the read-only side of HTTP, WebDAV is already supported by authoring tools with filesystem semantics, and Microsoft has already deployed hundreds of proprietary extensions to WebDAV within its Exchange and SharePoint server products. For that matter, CMIS would be a lot more interesting if it were designed as an extension to CIFS instead of HTTP or SOAP.

My bet is that the document repository vendors will continue to focus on making their own native HTTP interfaces more efficient, since that is how customers will evaluate their performance when integrated within heterogeneous architectures. At best, CMIS will become another method for data migration away from Web Services and legacy repositories, which may be justification enough for implementation. However, they should stop calling their AtomPub derivative a REST binding. Even if they manage to redesign the HTTP interface during the standards process, it will still only be an HTTP binding and only one part of an overall application architecture that could be called RESTful.

I need to address that teaser I included in my last post on paper tigers and hidden dragons:

People need to understand that general-purpose PubSub is not a solution to scalability problems — it simply moves the problem somewhere else, and usually to a place that is inversely supported by the economics.

I am talking here about the economics of sustainable systems, which I consider to be part of System Dynamics and Systems Engineering. Software Architecture and System Dynamics are deeply intertwined, just as Software Engineering and Systems Engineering are deeply intertwined, but they are not the same thing.

Fortunately, this discussion of economics has nothing to do with the financial economy or the proposed bailout on Wall Street. At least, I don’t think it does, though event-based integration and publish-subscribe middleware are most popular within financial institutions (and rightly so, since it matches their natural data stream of buy/sell events and news notifications that drive financial investing/speculation).

Economists have a well-known (but often misunderstood) notion of the economies of scale, which can be summarized as the effect of increased production on the average cost of production (see also Wikipedia and Investopedia). For most types of business, the average cost of producing some item is expected to go down as the business expands. This is a natural consequence of both the spreading of fixed costs over many more customers and the ability of larger distributors to demand more competitive pricing from their upstream suppliers (e.g., Walmart).

If we look at such a business in terms of its system dynamics, we should see “load” on the system increase at a much lower rate than the increase in customers. In other words, the additional customers should not, on average, create more cost than they generate in revenue once the business passes the initial fixed infrastructure cost break-even point. If they do create more cost, then the business is not sustainable at larger scales. For many businesses, that’s just fine — most of the personal services economy is based on models with a small window between “enough customers to survive” and “too many customers to handle without degrading service.”

The same is true of software architectures for network-based applications. As the number of consumers increase, the per-consumer cost of the overall system must decrease in order to sustain the system. Relative economies of scale were used by Nikunj Mehta in his dissertation to compare architectural choices: “A system is considered to scale economically if it responds to increased processing requirements with a sub-linear growth in the resources used for processing.” I like that as a quantifiable definition for the architectural property of scalability. However, Nikunj is focused on comparing the relative scalability of particular implementations within an architecture-driven modular framework, not the type of economics that I alluded to in my post.

It is important to think of system dynamics in terms of economics and not just system throughput, since the impact of growth in non-sustainable systems is often unrelated to the individual data transfers or protocols in use. In fact, some architectures that are more efficient from the standpoint of per-event network usage are also more susceptible to inverse economies of scale. PubSub is one such example.

PubSub (short for publish/subscribe) is a derivative of one of the most analyzed architectural styles in software: event-based integration (EBI). From a purely theoretical perspective, event-based integration is a natural fit for systems that monitor real-time activities, particularly when the number of event sources outnumber the recipients of event notifications (e.g, graphical user interfaces and process control systems). However, EBI does not scale well when the number of recipients greatly outnumbers the sources. Researchers have been studying Internet-scale event notification systems for over two decades. Several derivative styles have been proposed to help reduce the issue of scale, including PubSub, EventBus, and flood distribution. There are probably others that I have left out, but they usually boil down into the use of event brokers (intermediaries) and/or a shared data stream (multicast and flood).

PubSub improves EBI scalability by filtering events, either by publisher-driven topics or subscriber-chosen content, and only delivering notifications to the matching subscribers of those topics/content. In theory, if we are careful in choosing the topics such that they partition the events into relatively equal, non-overlapping subsets, and if the consumers of those events are only interested in subscribing to a small subset of those topics, then the corresponding reduction in notification traffic is substantial. Group chat, as in Jabber rooms, is one such example. For most applications, however, such a partitioning does not exist.

EventBus takes a slightly different tack on scalability by forcing traffic onto shared distribution streams, usually IP multicast or flood-distribution channels, and having all subscribers pick their own events off that stream. Unfortunately, multicast does not scale socially (a tragedy of the commons) and rarely succeeds across organizational domains, whereas flooding only succeeds when the signal/noise ratio is high.

Of course, there are many examples of successful EBI systems. USENET news, for example, combined both hierarchical PubSub and flood-distribution to great effect. Adam Rifkin and Rohit Khare did an extensive survey of Internet-scale event notification systems that still stands the test of time. And, as I said at the beginning, EBI is the most popular style of system architecture within the financial industry, where scalability is offset by relatively high subscription costs.

So, what’s the point of this rambling post? Ah, yes, now I remember: the inverse economics of general-purpose PubSub.

People are greedy. They tend to be event-gluttons, wishing to receive far more information than they actually intend to read, and rarely remember to unsubscribe from event streams. When they stop using a service, they tend to remove the software that reports the notifications locally, not the software or subscriptions that cause it to be delivered. If it were not for automatic bounce handling, Apache mailing lists would have melted down years ago. The only way to constrain such users is a tax on either subscriptions or deliveries.

PubSub systems have an imbalance of effort to subscribe versus effort to deliver. In other words, a single user request results in a potentially large number of server obligations. As such, a benevolent user can produce a disproportionate load on the publisher or broker that is distributing notifications. On the Internet, we don’t have the luxury of designing just for benevolent users, and thus in HTTP systems we call such requests a denial-of-service exploit. Any request architecture that allows a malevolent user to create load as a multiple of the effort required to make the request will result in denial of service attacks, since they succeed without overwhelming the attacker’s own connectivity.

The only solution to the inverse economics of PubSub that I know of is to match the costs to the revenue stream. In other words, charge consumers for the subscriptions they choose to receive, at a rate corresponding to the cost of delivering the subscribed stream at the subscribed rate and over the subscribed medium. That is exactly why there is no standard mechanism for notifications in HTTP: there is no scalable solution for subscribing to events without also standardizing the creation of user accounts and event filtering mechanisms, neither of which tend to be standardized because user management is an application of its own and event filtering is always domain-specific.

What does this mean for a system like Jabber? It means that each jabber server (the broker) needs to keep watch on the types of systems being supported via its infrastructure and adjust their subscription model accordingly. Applications like chat will work great, as will most applications wherein the event sources far outnumber the notification recipients. Applications with inverse economics need to pay their own way.

What does this mean for a system like Twitter? Well, that’s an interesting case because it incorporates both subscription-based following of events and RESTful interfaces for event logs. When a user only follows a small set of twitters, or the set of twitters being followed are mostly quiet, then they might be more economically supported by an EBI system instead of polling their twitter home stream. However, that won’t be true for people who want to follow many twitters or who are only interested in looking at twits in batch (basically, anyone who polls at a lower frequency than their incoming twit rate). In particular, malevolent followers (those robots that follow everyone, for whatever reason) are much more dangerous to an EBI system.

My guess is that Twitter will eventually move to an ad-supported revenue stream on their Web interfaces, since ad revenue increases at the same rate as RESTful applications need to scale (i.e., RESTful systems have a natural economy of scale that makes them sustainable even when their popularity isn’t anticipated). Likewise, Twitter’s event-based interfaces will move toward a more sustainable subscription model that charges for excessive following and premium services like real-time event delivery via XMPP or SMS. That kind of adjustment is necessary to rebalance the system dynamics.

One of the sessions that I attended at OSCON was “Beyond REST? Building Data Services with XMPP PubSub” by Evan Henshaw-Plath and Kellan Elliott-McCrea. I think you can guess why that made me curious, but it was interesting to see how much that curiosity was shared by the rest of the conference: the room filled up long before the scheduled start. They certainly gave a very entertaining talk and one that spilled over into the blogosphere in posts by Stephen O’Grady, Joshua Schachter, and Debasish Ghosh.

Unfortunately, the technical argument was a gigantic paper tiger, which is a shame given that there are plenty of situations in which event-based architectures are a better solution than REST-based architectures. I made a brief comment about notification design and how they seemed to be ignoring a good twenty years of research on Internet-scale event notification systems. People need to understand that general-purpose PubSub is not a solution to scalability problems — it simply moves the problem somewhere else, and usually to a place that is inversely supported by the economics. I’ll have to explain that in a later post, since this one is focused on the technical.

Here’s the tiger:

On July 21st, 2008, friendfeed crawled flickr 2.9 million times to get the latest photos of 45,754 users, of which 6,721 of that 45,754 potentially uploaded a photo.

Conclusion: Polling sucks.

If you’d like to learn more about their XMPP solution, the slides are available from the OSCON website. I do think there is a lot to be learned from using different interaction styles and true stream-oriented protocols (the kind that don’t care about lost packets), but this FriendFeed example is ridiculous. It took me less than 30 seconds to design a better solution using nothing more than HTTP, and that while sitting in the middle of a conference session. This is exactly what Dare means by: “If a service doesn’t scale it is more likely due to bad design than to technology choice.

They are comparing the efficiency of blind polling using HTTP crawls to a coordinated PubSub services setup with XMPP.  Spidering an entire site is obviously not going to be efficient if you are only interested in what has changed. One advantage it has is that you don’t need to cooperate with the information provider. However, for the specific example of FriendFeed polling Flickr, cooperation is easy (both companies gain immensely by cooperating) and essential (2.9 million requests per day will get you blocked from any site that doesn’t want cooperation).

The solution, which I mentioned briefly at the talk, is to provide a resource that reflects all of the changes on Flickr during a given time period. Anyone (not just FriendFeed) can perform a GET on that resource to find out what has changed. In fact, one such example is the SUP (Simple Update Protocol) just introduced by FriendFeed. However, I had something more efficient in mind.

Web architects must understand that resources are just consistent mappings from an identifier to some set of views on server-side state. If one view doesn’t suit your needs, then feel free to create a different resource that provides a better view (for any definition of “better”). These views need not have anything to do with how the information is stored on the server, or even what kind of state it ultimately reflects. It just needs to be understandable (and actionable) by the recipient.

In this case, we want to represent the last-updated state of all Flickr users in a way that minimizes the lag between event and notification (let’s just assume that one minute is “fast enough” to receive a change notification). The simplest way of doing that is to log state changes by userid in a sequence of journal-style resources named according to the server’s UTC clock minutes.  For example,

This URI pattern instantly drops the poll count from 2.9 million to 1440 (the number of minutes in a day) plus whatever pages are retrieved after we notice a user has changed their state. Alternatively, we could define a single append-only resource per day and use partial GET requests to retrieve only the bits since the last poll, but that tends to be harder on the server. Representations for the above resources can be generated by non-critical processes, cached, and even served from a separate distribution channel (like SUP).

What, then, should we include in the representation? Well, a simple list of relative URIs is good enough if the pattern is public, but that would be unwise for a site that features limited publication (obscured identifiers so that only people who have been given the URI can find the updated pictures). Likewise, the list might become unwieldy during event storms, when many users happen to publish at once. Of course, like any good CS problem, we can solve that with another layer of indirection.

Instead of a list of changed user ids or URIs, we can represent the state as a sparse bit array corresponding to all of Flickr’s users. I don’t know exactly how many users there are at Flickr, but let’s be generous and estimate it at one million. One million bits seems like a lot, but it is only 122kB in an uncompressed array. Considering that this array will only contain 1s when an update has occurred within the last minute, my guess is that it would average under 1kB per representation.

I can just imagine people reading “sparse bit array” and thinking that I must be talking about some optimal data structure that only chief scientists would think to use on the Web. I’m not. Any black-and-white GIF or PNG image is just a sparse bit array, and they have the nice side-effect of being easy to visualize. We can define our representation of 1 million Flickr users to be a 1000×1000 pixel black-and-white image and use existing tools for its generation (again, something that is easily done outside the critical path by separate programs observing the logs of changes within Flickr). I am quite certain that a site like Flickr can deliver 1kB images all day without impacting their scalability.

Finally, we need a way to map from the bits, each indicating that a user has changed something, to the much smaller set of users that FriendFeed knows about and wishes to receive notifications. If we can assume that the mapping is reasonably persistent (a month should be long enough), then we can define another resource to act as a mapping service. Such as,{userid}

which takes as input a userid (someone that a friend already knows and wants to monitor for changes) and returns the coordinate within the sparse array (the pixel within the 1000×1000 image) that corresponds to that user. FriendFeed can store that accumulated set of “interesting users” as another image file, using it like an “AND mask” filter to find the interesting changes on Flickr.

Note that this is all just a quick thought experiment based on the general idea. In order to build such a thing right, I’d have to know the internals of Flickr and what kinds of information FriendFeed is looking to receive, and there are many potential variations on the representations that might better suit those needs (for example, periods could be overlapped using gray-scale instead of B&W). The implementation has many other potential uses as well, since the sequence of images provide an active visualization of Flickr health.

I should also note that the above is not yet fully RESTful, at least how I use the term. All I have done is described the service interfaces, which is no more than any RPC. In order to make it RESTful, I would need to add hypertext to introduce and define the service, describe how to perform the mapping using forms and/or link templates, and provide code to combine the visualizations in useful ways. I could even go further and define these relationships as a standard, much like Atom has standardized a normal set of HTTP relationships with expected semantics, but I have bigger fish to fry right now.

The point is that you don’t need to change technologies (or even interaction styles) to solve a problem of information transfer efficiency. Sometimes you just need to rethink the problem. There are many systems for which a different architecture is far more efficient, just as XMPP is far more efficient than HTTP for something like group chat. Large-scale collaborative monitoring is not one of them. An XMPP solution is far more applicable to peer-to-peer monitoring, where there is no central service that is interested in the entire state of a big site like Flickr, but even then we have to keep in mind that the economics of the crowd will dictate scalability, not the protocol used for information transfer.

Hmm, over five months since my last (published) post. I don’t know how folks like Tim and Sam keep writing every day. I’ll have to ask when I see them at ApacheCon US in New Orleans.

I fell off the blogon while trying to prepare my two presentations for ApacheCon Europe and have yet to catch up with my email since then. Seriously, I spend way too much time reading and deleting email, so if you’ve sent me something and haven’t received a response then please understand that it isn’t because I don’t like you (unless, of course, you sent it to several of my email addresses at once, in which case it was probably blocked as spam).

Whenever I did find time to blog, I found myself wanting to update the software first, fiddle with the comment settings, or try out various blogspam avoidance tricks. My personal blog runs on WordPress, both because it’s cool cutting-edge open source software and because my ISP won’t let me run anything written in Java (let alone Day’s commercial products). It also has the nice side-effect of forcing me to pay attention to everyday web content management issues, instead of just the more abstract (implementation independent) issues of Web protocols. In other words, it keeps me in the loop in terms of what people are looking for in public-facing web content management. And, if I want to see what it looks like through our own software, I can just wander over to Planet Day.

One area in which open source blogging software is way ahead of the curve is support for collaborative antispam solutions. I’ve been much happier with my blog since I installed the TypePad Antispam plugin. The social filtering of blog spam is even more effective than it has been for Gmail, especially given the setting to automatically discard comments that look like spam on old posts. I wonder how long it would take Michael Marth to write an Apache Sling plugin for comment filtering using the TypePad service?