Optimizing AEM Site Caches

Optimizing caching within your AEM architecture is one of the quickest ways to get a big performance boost. This article focuses on explaining how to optimize the various caches that are available within an AEM architecture.

Description description

Environment

Adobe Experience Manager

Issues/Symptoms

How to optimize AEM site caches?

AEM Architecture and Caching

In all AEM architectures, the user encounters multiple cache layers when visiting your site. There are 4 cache layers to consider in a standard AEM architecture. This includes the Web Browser, CDN, Dispatcher and AEM instances.

screenshot_2018-03-25160541

Resolution resolution

A. Browser Caching

The first level of cache a user encounters on a repeated visit of your site is their own browser. Caching at the browser level is commonly done via the Cache-Control: max-age=… response header. The max-age setting tells the browser how many seconds it should cache the file for before attempting to “revalidate” or request it from the site again. This concept of cache max-age is commonly referred to as “Cache Expiration” or TTL (“Time to Live”).

There are various options (or “directives”) within the Cache-Control header that affect how caching occurs. Here are some common directives:

  1. private - the private directive in the Cache-Control header it makes it so the file would only be cached in the browser, not in intermediate caches such as CDNs. A practical use for this directive would be if your page includes personalized / user-specific content.

    Example usage: Cache-Control: max-age=300, private

  2. s-maxage - the s-maxage directive in the Cache-Control header allows you to set a different TTL for shared caches such as CDNs. When this value is set then the browser would use what is set in max-age and other caches would respect the s-maxage setting instead.

    Example usage: Cache-Control: max-age=600, s-maxage=300

Modern browsers all support the Cache-Control header, however, some old deprecated headers exist from HTTP/1.0 which may still have an effect on caching. These headers are Expires and Pragma. If you don’t need to support very old browsers then do not send those response headers.

In addition to caching, revalidation is an important concept as well. Revalidation relies on the Last-Modified(response) / If-Modified-Since (request) headers, and the ETag (response) / If-None-Match (request) headers.

Caution on browser testing:

When testing caching in Google Chrome, if you are testing over https and you have a self-signed certificate, nothing will get cached. Chrome won’t cache responses or perform revalidation when there is an untrusted or invalid certificate.

Note on dispatcher:

There is an issue with AEM Dispatcher v4.2.3 and earlier versions where the /enableTTL only caches using max-age directive. This means that even when private or s-maxage directives are set it would still cache if max-age is set. This issue is resolved in Dispatcher 4.2.4 and later versions.

B. CDN Caching

A CDN or “Content Delivery Network”, is a distributed network of web servers designed to cache and serve content from the location nearest to your users. This reduces network hops and distance from the user’s computer to your content, thereby reducing “Round Trip Time” (RTT). RTT is the time it takes for the browser to send a request to your site and receive a response. Competition in the CDN provider space has made CDNs very cost effective. This makes the decision of using a CDN for your site an easy one. If you are not using a CDN yet, then you should definitely incorporate a CDN in your site.

There are many CDN providers, each one offers different features and configurations.

How does CDN Caching work?

CDNs cache content following rules similar to browsers. They rely on the Cache-Control HTTP response header and generally fall back to the Expires header if no Cache-Control header is found.

Most CDNs provide some way to trigger a manual flush of the cache.  In many cases, cache flushes have some delay (e.g. 15 minutes) in regards to propagating to all edge servers that have your files.

Optimizing CDN Usage

There are a few things to do to ensure you are caching files optimally in the CDN:

  1. Use a CDN that supports the stale-while-revalidate and stale-if-error directives in the Cache-Control header.

    • stale-while-revalidate - this directive tells the CDN to serve the old (already cached) version of the file while it retrieves a new one after the cache file has expired.
    • stale-if-error - similarly, this directive tells the CDN to serve the old (already cached) version of the file when the origin responds with an error during revalidation.
  2. GZip compress responses for all file types that are not pre-compressed.

    You should do this from the dispatcher level. This will ensure that you reduce the number of bytes sent to the CDN. CDNs commonly charge by bytes transferred so compressing responses reduce cost.

    • Enable GZip compression on the Dispatcher level: Apache - use mod_deflate. Be careful for mod_deflate’s use of the Vary. In certain cases, the Vary header can cause the CDN and Browser to skip caching entirely.

    • Microsoft IIS - use Dynamic Compression.

    • Do not allow gzip compression of large files or files that are already compressed. Note that most image and video formats are already precompressed. Compressing them on the fly at the web server level comes at a very high cost to performance.

      • On Apache, this can be done via AddOutputFilterByType directive: AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript
      • On IIS, this can be controlled via the < dynamicTypes> configuration.
    • If your CDN provider supports Edge-side Includes (ESI) then leverage this feature. AEM components can be broken up using ESI. To do this, use Apache \[ Sling Dynamic Includes\] or implement a custom solution. It is useful where you have fairly static pages but you are serving more dynamic content in a few parts of the page. In these cases you are essentially breaking the page up into multiple CDN files. This way you can cache different parts of the page for different periods of time.

Popular CDN Providers

Here’s a list of some popular CDN providers:

There are several more, each with different features.

Caution:

Be careful for the Vary response header. In certain cases, Vary can cause both the CDN and browser to skip caching entirely. As a general rule of thumb, avoid adding Vary except for Vary: Accept-Encoding (applied only when the response is gzip compressed). In other words, if you need to “vary” the output of a response, use a different URL.

For example, if you have different version of the HTML for mobile versus desktop, then use a different URL. This will allow CDNs and browsers to cache more effectively.

C. AEM Dispatcher Caching

If the CDN cache has expired, then the request would reach the AEM dispatcher cache. At this level, there are many things which can be done to optimize caching.

Since this is a larger topic, see this article for details on how to optimize the dispatcher cache.

D. AEM Publish Instances

At the AEM level, there are a few things that should be done to optimize the various cache layers:

  1. Set the following HTTP response headers which are not set by AEM per default.

    1. Cache-Control: max-age=… - To set this header, ACS Commons - Dispatcher TTL could could be used, or you could implement custom code to set it.
    2. Last-Modified - If the page content is relatively static such as an article then you could set its last-modified header to the cq:lastModified date/time (last time the article was modified). However, if the page is dynamic with JCR query results contained in component content then it would be best to set it to use the current date / time.
    3. ETag - If you decide to use this instead of Last-Modified, you could write a ReplicationEventListener that listens for page activations and generates an md5 hash of the page content. This could be set as a property on the jcr:content node of the page on the author instance. When pages are replicated, it would be sent to the publish instances. For page components with content that is relatively static, this could work ok, however if the page is somewhat dynamic or references a lot content then ETag would have to be omitted (or calculated).
  2. If the site has personalized / dynamic content:

    1. Use Apache Sling Dynamic Includes to break up the content so that different parts of the page can be cached for different periods of time.

      1. The caching can be done with the following technologies:

        • Edge-side Includes (ESI) on the CDN
        • Server-side Includes (SSI) on the web server
        • Asynchronous Javascript and XML (AJAX) on the browser
      2. Components on the page can be broken up into separate requests which can be cached for different periods of time. Parts of the page that are relatively static could be cached for much longer periods of time.

    2. Consider using the HTTP Cache feature of ACS Commons.

  3. Optimize client libraries.

    1. Enable minification on client libraries.

    2. If files in the client library are pre-minified, then disable minification on that client library. Edit the cq:ClientLibraryFolder node:

      1. Set property jsProcessor of type String[ ] with value min:none
      2. and set property cssProcessor of type String[ ] with value min:none
      3. See more details, here.
    3. Embed client libraries to minify and reduce js and css files.

    4. Implement versioned-clientlibs from ACS Commons to allow the CDN and Dispatcher to cache js and css files for longer periods of time.

recommendation-more-help
3d58f420-19b5-47a0-a122-5c9dab55ec7f