Optimizing caching within your AEM architecture is one of the quickest ways to get a big performance boost. This article focuses on explaining how to optimize the various caches that are available within an AEM architecture.
AEM Architecture and Caching
In all AEM architectures, the user encounters multiple cache layers when visiting your site. There are 4 cache layers to consider in a standard AEM architecture. This includes the Web Browser, CDN, Dispatcher and AEM instances.
The first level of cache a user encounters on a repeated visit of your site is their own browser. Caching at the browser level is commonly done via the Cache-Control: max-age=… response header. The max-age setting tells the browser how many seconds it should cache the file for before attempting to “revalidate” or request it from the site again. This concept of cache max-age is commonly referred to as “Cache Expiration” or TTL (“Time to Live”).
There are various options (or “directives”) within the Cache-Control header that affect how caching occurs. Here are some common directives:
private - the private directive in the Cache-Control header it makes it so the file would only be cached in the browser, not in intermediate caches such as CDNs. A practical use for this directive would be if your page includes personalized / user-specific content.
Cache-Control: max-age=300, private
s-maxage - the s-maxage directive in the Cache-Control header allows you to set a different TTL for shared caches such as CDNs. When this value is set then the browser would use what is set in max-age and other caches would respect the s-maxage setting instead.
Cache-Control: max-age=600, s-maxage=300
Modern browsers all support the Cache-Control header, however, some old deprecated headers exist from HTTP/1.0 which may still have an effect on caching. These headers are Expires and Pragma. If you don’t need to support very old browsers then do not send those response headers.
In addition to caching, revalidation is an important concept as well. Revalidation relies on the Last-Modified (response) / If-Modified-Since (request) headers, and the ETag (response) / If-None-Match (request) headers.
When testing caching in Google Chrome, if you are testing over https and you have a self-signed certificate, nothing will get cached. Chrome won’t cache responses or perform revalidation when there is an untrusted or invalid certificate.
Note on Dispatcher:
There is an issue with AEM Dispatcher v4.2.3 and earlier versions where the /enableTTL only caches using max-age directive. This means that even when private or s-maxage directives are set it would still cache if max-age is set. This issue is resolved in Dispatcher 4.2.4 and later versions.
A CDN or “Content Delivery Network”, is a distributed network of web servers designed to cache and serve content from the location nearest to your users. This reduces network hops and distance from the user’s computer to your content, thereby reducing “Round Trip Time” (RTT). RTT is the time it takes for the browser to send a request to your site and receive a response. Competition in the CDN provider space has made CDNs very cost effective. This makes the decision of using a CDN for your site an easy one. If you are not using a CDN yet, then you should definitely incorporate a CDN in your site.
There are many CDN providers, each one offers different features and configurations.
HOW CDN CACHING WORKS
CDNs cache content following rules similar to browsers. They rely on the Cache-Control HTTP response header and generally fall back to the Expires header if no Cache-Control header is found.
Most CDNs provide some way to trigger a manual flush of the cache. In many cases, cache flushes have some delay (e.g. 15 minutes) in regards to propagating to all edge servers that have your files.
Optimizing CDN Usage
There are a few things to do to ensure you are caching files optimally in the CDN:
Use a CDN that supports the stale-while-revalidate and stale-if-error directives in the Cache-Control header.
GZip compress responses for all file types that are not pre-compressed.
On Apache, this can be done via AddOutputFilterByType directive:
If your CDN provider supports Edge-side Includes (ESI) then leverage this feature.
Popular CDN Providers
Here’s a list of some popular CDN providers:
Be careful for the Vary response header. In certain cases, Vary can cause both the CDN and browser to skip caching entirely. As a general rule of thumb, avoid adding Vary except for Vary: Accept-Encoding (applied only when the response is gzip compressed). In other words, if you need to “vary” the output of a response, use a different URL.
For example, if you have different version of the HTML for mobile versus desktop, then use a different URL. This will allow CDNs and browsers to cache more effectively.
AEM Dispatcher Caching
If the CDN cache has expired, then the request would reach the AEM dispatcher cache. At this level, there are many things which can be done to optimize caching.
Since this is a larger topic, see this article for details on how to optimize the dispatcher cache.
AEM Publish Instances
At the AEM level, there are a few things that should be done to optimize the various cache layers:
Set the following HTTP response headers which are not set by AEM per default.
If the site has personalized / dynamic content:
Optimize client libraries.