How to optimize the Dispatcher cache?

This article offers detailed instructions on the different ways to optimize the Dispatcher cache. It further describes the steps to enable TTL (“Time to Live” or expiration) style invalidations, disabling Dispatcher flush agents, re-fetching Dispatcher flush, among others.

Description description

Environment

Adobe Experience Manager

Issues/Symptoms

This article focuses on the latest optimizations in the AEM Dispatcher and how to best leverage those. The AEM Dispatcher is a caching reverse proxy server designed for use with Adobe Experience Manager. It can be installed and run as a module within an existing web server software. At the time of writing this article, the Dispatcher module is supported on Apache HTTP Server, Microsoft IIS, and iPlanet.

Resolution resolution

How does Dispatcher caching work?

At the most basic level, the AEM dispatcher is a reverse proxy that works by performing caching, cache flushing and cache invalidation.

See the related links for more details on the Dispatcher:

Optimizing the Dispatcher cache

Here are some ways to optimize the Dispatcher cache:

  1. Cache almost everything  - This means cache any content that would be requested more than once by users.

  2. Cache personalized content for different periods of time  - If your site has personalized content then consider using Apache Sling Dynamic Includes in your AEM application to leverage Ajax (Asynchronous JavaScript and XML calls at the browser level), SSI (Server Side Includes at the Web Server level), and ESI (Edge-side Includes at the CDN level) to cache different parts of the page for different periods of time.

  3. Never delete the Dispatcher cache on a live Dispatcher  - If a Dispatcher is serving live content and you delete the cache, it causes a massive flood of requests to go back to AEM.  Due to this, the Dispatcher cache should never be deleted on a live Dispatcher.

  4. Prime the cache  - Before deleting the Dispatcher cache, pull the Dispatcher off your load balancer, delete the cache, then run a web crawler tool to cache files on the Dispatcher before putting it on the load balancer.

  5. Cache error pages  - Leverage the DispatcherPassError 1 (Apache Web Server specific) directive to serve error pages such as 404s from the Dispatcher cache.

  6. GZip compress all file types except for those that are pre-compressed  - In Apache Web Server, mod_deflate could be used, but make sure that  Vary: User-Agent  header isn’t set.  In Microsoft IIS, use Dynamic Compression.

    Apache configuration example (specifying only certain content types to avoid precompressed file types):

    AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/javascript

  7. Enable /serveStaleOnError   in the /cache configuration - Serve the old cache file when AEM instances are serving errors.

  8. Add /gracePeriod   to the /cache configuration - Define the number of seconds a stale, auto-invalidated resource may still be served from the cache after the last content publish event (“activation”).  This reduces the number of requests that go back to the publish instances during a large content publishing activity such as a “Tree Activation”.

  9. Add rules to /ignoreUrlParams  - Ignore querystring parameters that are not required or used by the application.  This allows caching of URLs even when a querystring is present.

  10. Cache the Cache-Control and Last-Modified response headers  - Use the  /headers  configuration to cache the HTTP response headers  Cache-Control  and  Last-Modified  (and/or  ETag  header if you are sending it from AEM).  This helps in simplifying and optimizing caching at the CDN and browser levels.  Caching these headers makes it so only AEM sets the headers, not the web server itself.  Note that when you do this, then you need to start sending the headers from your AEM application.

  11. Cache content for as long as possible  and  reduce requests that go back to AEM  - Optimize flush requests by enabling refetching flush on all flush agents. See the below section titled Re-fetching Dispatcher Flush. Or use  /enableTTL  and set  Cache-Control: max-age=…  header to cache files as long as possible.  See below for details on this topic.

Using TTLs

As of Dispatcher version 4.1.11, /enableTTL 1 can be set in any file configuration.  This setting makes the Dispatcher respect cache expirations set in the HTTP Cache-Control response header.  In other words, the Dispatcher will function similar to a CDN where primary form of cache invalidation occurs when files expire.  Once you implement this and start sending  Cache-Control: max-age=…  for all responses from AEM, then you can safely disable your Dispatcher flush agents in the publish instances.

After disabling flush agents on the publish instances then you may still want to be able to flush the Dispatcher cache.  In that case, you can use ACS Commons - Dispatcher Flush UI.  This tool is installed on the author instance.  It gives users a UI where they can perform manual cache flush requests.

I. Steps to enable TTL (“Time to Live” or expiration) style invalidations:

  1. Modify source code in the AEM application to send  Cache-Control  header and  Last-Modified  for all requests where it’s not already set.
  2. Install Dispatcher 4.1.11 or later.
  3. Set  /enableTTL 1  in any farm configuration of the site.
  4. Set the  /headers  configuration to cache the  Cache-Control  and  Last-Modified  headers.
  5. Restart the web server.

II. Disable Dispatcher flush agents on the publish instances:

The Dispatcher will now use the Cache-Control header to control invalidation of the cache files.  Since that is the case, then Dispatcher flushing from the publish instances is no longer required.

  1. Go to /etc/replication/agents.publish.html on each publish instance.
  2. Go to each flush agent’s configuration and disable the agent.

III. Allow manual Dispatcher flush requests from the author instance:

Now that flush agents are disabled, you would rely entirely on the  Cache-Control  header to control when content is refreshed on the dispatcher.  You can still allow users to issue manual flushes of the Dispatcher cache:

  1. Install ACS Commons - Dispatcher Flush UI on the author instance.
  2. Configure flush agents on the author instance.
  3. In each of the agent configurations, set  Triggers  =>   Ignore Default  option to enabled. This option makes the flush agents ignore when users click  (Un)Publish  or  (De)Activate  in the AEM UI.

Re-fetching Dispatcher Flush

To optimize the Dispatcher flush requests, all Dispatcher flush agents should have a feature called refetching flush enabled.

To enable re-fetching the dispatcher flush, do the following:

  1. Go to  http://aemhost:port/crx/packmgr/index.jsp  and login as admin.

  2. Download the package from here.

  3. Upload and install the package to package manager.

  4. Go to your Dispatcher flush agent configuration. For example  /etc/replication/agents.author/flush.html

  5. Click  Edit

  6. Set the following

    • Serialization Type  =  Re-fetch Dispatcher Flush
    • Extended  =>   HTTP Method  =  POST
  7. Click  Save

Note - The package installed above is just a basic example.  To customize and optimize re-fetching flush you can modify the list of URIs that it sends.  The code is open source and can be found here.  The code adds a list of URIs to the request body as parameters telling Dispatcher which paths to re-fetch.  You can add more paths per your application requirements to optimize your site’s caching capabilities.

Detailed explanation of re-fetching flush

Normally a Dispatcher flush works by deleting files:

  1. Touch .stat file(s)
  2. Delete /content/foo.*
  3. Delete /content/foo/_jcr_content

Due to the fact that files are deleted in step 2, the next time a user requests a file like /content/foo.html or /content/foo.json, while the file is being “re-fetched” then subsequent requests for the same file would also be sent to the publish instances until the file is cached.  For slow responses or heavy traffic pages such as home pages this can cause flooding of the publish instance tier.

To solve this issue, enable a feature of the Dispatcher called re-fetching.  This feature allows you to send a list of URIs that the Dispatcher should proactively “re-fetch” and replace instead deleting.

See 22:41-27:05 in this presentation recording for a demo of how it works and how to configure it.

recommendation-more-help
3d58f420-19b5-47a0-a122-5c9dab55ec7f