Mon 24 Mar 2008
Information networks
Posted by Roy T. Fielding under web architecture
1 Comment
One of the keys to understanding how the Web works is to understand how social networks impact the structure of information (and vice versa). Jon Kleinberg at Cornell University has organized a couple great courses on the subject, at both the undergraduate and graduate levels. One of these days, I’ll find the time to (re)read all of those papers… for now, all I can do is shorten the distance.
Hey, I saw your talk about ‘Waka’. It seems to me that shortening things like ‘Content-Encoding’ -> ‘CE’ doesn’t really solve the problem, as new properties could just as easily come in, and if the names are short and two people introduce their own fields one could get accidental namespace collisions. Instead I would suggest keeping long names, but using a dictionary approach of sorts to make things shorter…
Why not just choose some very-low-maximum-growth header compression algorithm, combined with a default dictionary (containing the usual field names), and otherwise just keep HTTP. You could call it Header-Compression-Support: blah, and have the first request GET per connection of the server reply with something similar if it is supported, with further requests actually going the compressed route. Due to their repetitive nature the headers would likely compress very, very well as long as the compression dictionary is kept separate from that of the data being sent.
To deal with async issues, Header-Compression-Support could specify that it is supported while Header-Compression-Next could specify that from then on, that compression type would be used.
1) Client sends support header
2) Sender replies with normal packet and ‘header compression next’, meaning all its further sends will be using the specified form of compression
3) Client keeps sending normally (perhaps including the support header, but it’s redundant after the first, so maybe not) until it receives the server’s ‘header compression next’
4) Client then, knowing the server supports it, sends its own ‘header compression next’
So, being async, the server will have one request with its header classic HTTP, and the client will have two, or maybe three or four due to latency before it gets activated.
A big benefit of this over what I saw on your slides is that the field *values* will, over a couple requests eventually disappear from the data stream as well, as that information is repetitive too. It’s a not-too-complex solution that would massively reduce bandwidth use.