Network Working Group R. Fielding INTERNET-DRAFT U.C. Irvine Expires six months after publication date. 26 March 1997 Age Header Field in HTTP/1.1 Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress.'' To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Discussion of this memo should take place within the HTTP working group (http-wg@cuckoo.hpl.hp.com). Abstract The "Age" response-header field in HTTP/1.1 [RFC 2068] is intended to provide a lower-bound for the estimation of a response message's age (time since generation) by explicitly indicating the amount of time that is known to have passed since the response message was retrieved or revalidated. However, there has been considerable controversy over when the Age header field should be added to a response. This document explains the issues and provides a set of proposed changes for the revision of RFC 2068. 1. Problem Statement HTTP/1.1 [1] defines the Age header field in section 14.6: The Age response-header field conveys the sender's estimate of the amount of time since the response (or its revalidation) was generated at the origin server. A cached response is "fresh" if its age does not exceed its freshness lifetime. Age values are calculated as specified in section 13.2.3. Age = "Age" ":" age-value age-value = delta-seconds Age values are non-negative decimal integers, representing time in seconds. If a cache receives a value larger than the largest positive integer it can represent, or if any of its age calculations overflows, it MUST transmit an Age header with a value of 2147483648 (2^31). HTTP/1.1 caches MUST send an Age header in every response. Caches SHOULD use an arithmetic type of at least 31 bits of range. This document focuses on the ambiguous use of the term "caches" in the second-to-last line above. The ambiguity is due to the fact that a cache never sends responses --- only a server application (proxy, gateway, or origin server), which may or may not include a cache, is capable of sending a response. HTTP/1.1 defines a "cache" as A program's local store of response messages and the subsystem that controls its message storage, retrieval, and deletion. A cache stores cachable responses in order to reduce the response time and network bandwidth consumption on future, equivalent requests. Any client or server may include a cache, though a cache cannot be used by a server that is acting as a tunnel. There are two possible interpretations of HTTP/1.1 caches MUST send an Age header in every response. Either a) An HTTP/1.1 server that includes a cache MUST send an Age header field in every response. or b) An HTTP/1.1 server that includes a cache MUST include an Age header field in every response generated from its own cache. The remainder of this document discusses the relative merits of these two options, referred to as "Option A" and "Option B", concluding in section 5 with a set of proposed changes to remove the ambiguity from future editions of the HTTP/1.1 specification. 2. Review of HTTP/1.1 Response Age Calculation HTTP/1.1 defines an algorithm for calculating the age of a response message upon receipt by a cache. This document does not propose any modification of this algorithm; we describe it here in order to provide the background necessary to understand the later analyses. We only provide a brief summary here -- for a full explanation, see section 13.2.3 (Age Calculations) of RFC 2068 [1]. Summary of age calculation algorithm, when a cache receives a response: /* * age_value * is the value of Age: header received by the cache with * this response. * date_value * is the value of the origin server's Date: header * request_time * is the (local) time when the cache made the request * that resulted in this cached response * response_time * is the (local) time when the cache received the * response * now * is the current (local) time */ apparent_age = max(0, response_time - date_value); corrected_received_age = max(apparent_age, age_value); response_delay = response_time - request_time; corrected_initial_age = corrected_received_age + response_delay; resident_time = now - response_time; current_age = corrected_initial_age + resident_time; 3. Analysis of Option A If we were to assume that An HTTP/1.1 server that includes a cache MUST send an Age header field in every response. is true, then an HTTP/1.1 proxy containing a cache would be required to add an Age header field value to every response that was forwarded, including those that were obtained first-hand from the origin server and never touched by the caching mechanism. This would directly contradict the paragraph in section 13.2.1 of RFC 2068 that states: The expiration mechanism applies only to responses taken from a cache and not to first-hand responses forwarded immediately to the requesting client. and also directly contradicts the last paragraph of section 13.2.3 of RFC 2068 that states: Note that a client cannot reliably tell that a response is first- hand, but the presence of an Age header indicates that a response is definitely not first-hand. If we further assume that the above two paragraphs are in error, then the following example illustrates the effect of the age calculation when a first-hand response passes through a hierarchical system of proxy caches (A, B, C), with each segment taking (a, b, c, d) amount of time to satisfy the request: UA -------> A -------> B ---------> C -------> OS a b c d Since the age calculation includes an estimation of clock skew by each recipient (apparent_age), we also have the variables skewC = max(0, response_time(C) - date_value(OS)); skewB = max(0, response_time(B) - date_value(OS)); skewA = max(0, response_time(A) - date_value(OS)); skewUA = max(0, response_time(UA) - date_value(OS)); then the received age will be calculated as follows: At C: age=max(skewC,0)+d B: age=max(skewB,max(skewC,0)+d)+(c+d) A: age=max(skewA,max(skewB,max(skewC,0)+d)+(c+d))+(b+c+d) UA: age=max(skewUA,max(skewA,max(skewB,max(skewC,0)+d)+(c+d))+ (b+c+d))+(a+b+c+d) Because the response is first-hand, we know that the real age at UA must be less than (a+b+c+d). Note that (a+b+c+d) will always be added by UA, so the cumulative overestimation of the age will be at least max(skewUA,max(skewA,max(skewB,max(skewC,0)+d)+(c+d))+(b+c+d)) If we further assume that all clocks are synchronized (the minimum case), then the age at UA will be estimated as d+(c+d)+(b+c+d)+(a+b+c+d) Note that the above is the minimum overestimation; since the variables skewC, skewB, skewA, and skewUA are all unbounded, the clock skew of each host on the request path adds to the perceived response age of all downstream recipients. Furthermore, a fast clock on the origin will add to the overestimated age at each hop. However, in section 13.2.3 of RFC 2068, we also find In essence, the Age value is the sum of the time that the response has been resident in each of the caches along the path from the origin server, plus the amount of time it has been in transit along network paths. which in our example would imply an age value of (a+b+c+d). Thus, Option A would result in an incorrect calculation of the age value, resulting in an overestimation of age in all cases, with the amount of error bounded only by the synchronization of clocks for each and every recipient along the request chain, plus the cumulative overestimation of the network transit time by each recipient. 4. Analysis of Option B If we were to assume that An HTTP/1.1 server that includes a cache MUST include an Age header field in every response generated from its own cache. then an Age header field would not be added to a response that is received first-hand, and thus we would not contradict the sections of RFC 2068 that were quoted above. Using the same example as in the analysis of Option A, the calculation of age with Option B would be as follows: At C: age=max(skewC,0)+d B: age=max(skewB,0)+(c+d) A: age=max(skewA,0)+(b+c+d) UA: age=max(skewUA,0)+(a+b+c+d) Note that there is no cumulative overestimation of the age. The estimated age value at each recipient is only dependent on the skew between the recipient's clock and that of the origin server, plus the total amount of time the request and response has been in transit along the network path. The minimum estimated age at UA is (a+b+c+d) which matches the description provided in section 13.2.3 of RFC 2068. 5. Counter-arguments The only argument voiced against Option B is that the calculation is "less conservative" than Option A, and that being "conservative" is better in order to "reduce as much as possible the probability of inadvertently delivering a stale response to a user." If "conservative" means "always overestimates more than the other option", then the argument is certainly true. However, if the purpose of Age was to provide an overestimate, then why stop there? Why not add arbitrary amounts of age to forwarded response, just in case? Why not disable caching entirely? The reason is because HTTP caching is good for the Internet as a whole, and in particular for the owners of the network bandwidth that would be used to satisfy a request that has already been cached. Overestimating response age reduces the effectiveness of caching, and thus results in increased network congestion, added bandwidth requirements, and in some cases additional per-packet charges. Age was created to compensate for the possibility that clock skew between the origin server (represented by the Date header field) and the user agent (represented by the request time) might result in the age of a response being underestimated. Age was created so that HTTP/1.1 caches can communicate the actual observed age, thus providing a lower-bound for the age calculation that would be more reliable than simply calculating the difference between the date stamps. If Age is to be useful, it must be trusted by cache implementers. In order to be trusted by cache implementers, the value of the Age header field must match its definition: the age of the response as observed by the application that generated the response message. Furthermore, Option B is guaranteed to be conservative if all of the applications involved are HTTP/1.1-compliant or if the recipient's clock is equal to or ahead of the origin server clock. The only case in which Option A *might* result in a better estimation than Option B is where one or more HTTP/1.0 caches are in the request chain AND the response came from one of those HTTP/1.0 caches in which it resided for some time AND the user agent's system clock is running behind the origin server's clock. In this one case, Option A would compensate for the clock skew if there existed an HTTP/1.1 cache between the user agent and the HTTP/1.0 cache generating the response AND the HTTP/1.1 cache is better-synchronized to the origin server clock. The above scenario would require a minimum of two proxies in the chain, with at least one outer proxy being an old HTTP/1.0 cache and at least one inner proxy using HTTP/1.1. Given that, for many other reasons (described in RFC 2068), an HTTP/1.0 proxy is incapable of reliably caching HTTP messages in a proxy hierarchy, this scenario is not compelling. In contrast, Option A would overestimate the age on all HTTP/1.1 requests, even when there are no longer any HTTP/1.0 proxies. It would also make the age calculation dependent on the clock synchronization of every recipient along the request chain, with the possibility for drastic overestimation if any of the recipients has a bad clock. Option A would therefore make the Age header field value consistently less reliable than simple comparison of date stamps. 5. Conclusion and Proposed Changes Option B is the correct interpretation of when the Age header field should be added to an HTTP/1.1 response. The following changes to RFC 2068 will remove the ambiguity. In section 14.6 (Age), replace the sentence HTTP/1.1 caches MUST send an Age header in every response. with An HTTP/1.1 server that includes a cache MUST include an Age header field in every response generated from its own cache. In section 13.2.3 (Age Calculations), replace the paragraph HTTP/1.1 uses the Age response-header to help convey age information between caches. The Age header value is the sender's estimate of the amount of time since the response was generated at the origin server. In the case of a cached response that has been revalidated with the origin server, the Age value is based on the time of revalidation, not of the original response. with HTTP/1.1 uses the Age response-header to convey the estimated age of the response message when obtained from a cache. The Age field value is the cache's estimate of the amount of time since the response was generated or revalidated by the origin server. Delete the following paragraph from section 13.2.3: Note that this correction is applied at each HTTP/1.1 cache along the path, so that if there is an HTTP/1.0 cache in the path, the correct received age is computed as long as the receiving cache's clock is nearly in sync. We don't need end-to-end clock synchronization (although it is good to have), and there is no explicit clock synchronization step. Replace the following two paragraphs from section 13.2.3: When a cache sends a response, it must add to the corrected_initial_age the amount of time that the response was resident locally. It must then transmit this total age, using the Age header, to the next recipient cache. Note that a client cannot reliably tell that a response is first- hand, but the presence of an Age header indicates that a response is definitely not first-hand. Also, if the Date in a response is earlier than the client's local request time, the response is probably not first-hand (in the absence of serious clock skew). with The current_age of a cache entry is calculated by adding the amount of time (in seconds) since the cache entry was last validated by the origin server to the corrected_initial_age. When a response is generated from a cache entry, the server must include a single Age header field in the response with a value equal to the cache entry's current_age. The presence of an Age header field in a response implies that a response is not first-hand. However, the converse is not true, since the lack of an Age header field in a response does not imply that the response is first-hand unless all caches along the request path are compliant with HTTP/1.1 (i.e., older HTTP caches did not implement the Age header field). 6. Security Considerations The proposed changes close a potential security problem with HTTP/1.1 which would become manifest if a proxy with a slow clock (due to a hardware malfunction, failure to properly set, or caused to be reset by some malevolent agent) adds an Age header field to every response it forwarded, instead of only to those retrieved from its own cache, and thus eliminating the ability of a compliant downstream cache to reduce bandwidth usage on a congested network. Although this is not a serious concern with today's use of HTTP caching, future use of hierarchical cache networks would be impacted. 7. Acknowledgements This document was derived from discussions by the author within the HTTP working group, particularly with Jeffrey C. Mogul. 9. References [1] R. Fielding, J. Gettys, J. Mogul, H. Frystyk, and T. Berners-Lee. "Hypertext Transfer Protocol -- HTTP/1.1." RFC 2068, U.C. Irvine, DEC, MIT/LCS, January 1997. 9. Author's Address Roy T. Fielding Department of Information and Computer Science University of California, Irvine Irvine, CA 92697-3425 Fax: +1(714)824-1715 EMail: fielding@ics.uci.edu