Introduction

Traffic passes through the CDN to an Apache web server layer, which supports modules including Dispatcher. In order to increase performance, Dispatcher is used primarily as a cache to limit processing on the publish nodes.
Rules can be applied to the Dispatcher configuration to modify any default cache expiration settings, resulting in caching at the CDN. Note that Dispatcher also respects the resulting cache expiration headers if enableTTL is enabled in the Dispatcher configuration, implying that it will refresh specific content even outside of content being republished.

This page also describes how the Dispatcher cache is invalidated and how caching works at the browser level with regards to client-side libraries.

Caching

HTML/Text

  • by default, cached by the browser for five minutes, based on the cache-control header emitted by the Apache layer. The CDN also respects this value.
  • the default HTML/Text caching setting can be disabled by defining the DISABLE_DEFAULT_CACHING variable in global.vars:
Define DISABLE_DEFAULT_CACHING

This can be useful, for example, when your business logic requires fine tuning of the age header (with a value based on calendar day) since by default the age header is set to 0. That said, please exercise caution when turning off default caching.

  • can be overridden for all HTML/Text content by defining the EXPIRATION_TIME variable in global.vars using the AEM as a Cloud Service SDK Dispatcher tools.

  • can be overridden on a finer grained level, including controlling CDN and browser cache independently, with the following Apache mod_headers directives:

    <LocationMatch "^/content/.*\.(html)$">
         Header set Cache-Control "max-age=200"
         Header set Surrogate-Control "max-age=3600"
         Header set Age 0
    </LocationMatch>
    
    NOTE

    The Surrogate-Control header applies to the Adobe managed CDN. If using a customer managed CDN, a different header may be required depending on your CDN provider.

    Exercise caution when setting either global cache control headers or those that match a wide regex so they are not applied to content that you need to keep private. Consider using multiple directives to ensure rules are applied in a fine-grained manner. With that said, AEM as a Cloud Service will remove the cache header if it detects that it has been applied to what it detects to be uncacheable by Dispatcher, as described in Dispatcher documentation. In order to force AEM to always apply the caching headers, one can add the always option as follows:

    <LocationMatch "^/content/.*\.(html)$">
         Header unset Cache-Control
         Header unset Expires
         Header always set Cache-Control "max-age=200"
         Header set Age 0
    </LocationMatch>
    
    

    You must ensure that a file under src/conf.dispatcher.d/cache has the following rule (which is in the default configuration):

    /0000
    { /glob "*" /type "allow" }
    
    
  • To prevent specific content from being cached at the CDN, set the Cache-Control header to private. For example, the following would prevent html content under a directory named secure from being cached at the CDN:

       <LocationMatch "/content/secure/.*\.(html)$">.  // replace with the right regex
       Header unset Cache-Control
       Header unset Expires
       Header always set Cache-Control "private"
      </LocationMatch>
    
    
  • While HTML content set to private will not be cached at the CDN, it can be cached at the dispatcher if Permission Sensitive Caching is configured, ensuring that only authorized users can be served the content.

    NOTE

    The other methods, including the dispatcher-ttl AEM ACS Commons project, will not successfully override values.

    NOTE

    Please note that Dispatcher might still cache content according to its own caching rules. To make the content truly private you should ensure that it is not cached by Dispatcher.

Client-Side libraries (js,css)

  • When using AEM’s Client-Side library framework, JavaScript and CSS code is generated in such a way that browsers can cache it indefinitely, since any changes manifest as new files with a unique path. In other words, HTML that references the client libraries will be produced as needed so customers can experience new content as it is published. The cache-control is set to “immutable” or 30 days for older browsers who don’t respect the “immutable” value.
  • see the section Client-side libraries and version consistency for additional details.

Images and any content large enough to be stored in blob storage

The default behavior for programs created after mid-May 2022 (specifically, for program ids that are higher than 65000) is to cache by default, while also respecting the request’s authentication context. Older programs (program ids equal or lower than 65000) do not cache blob content by default.

In both cases, the caching headers can be overridden on a finer grained level at the Apache/Dispatcher layer by using the Apache mod_headers directives, for example:

   <LocationMatch "^/content/.*\.(jpeg|jpg)$">
     Header set Cache-Control "max-age=222"
     Header set Age 0
   </LocationMatch>

When modifying the caching headers at the Dispatcher layer, please be cautious not to cache too widely, see the discussion in the HTML/text section above. Also, make sure that assets that are meant to be kept private (rather than cached) are not part of the LocationMatch directive filters.

New default caching behavior

The AEM layer will set cache headers depending on whether the cache header has already been set and the value of the request type. Please note that if no cache control header has been set, public content is cached and authenticated traffic is set to private. If a cache control header has been set, the cache headers will be left untouched.

Cache control header exists? Request type AEM sets cache headers to
No public Cache-Control: public, max-age=600, immutable
No authenticated Cache-Control: private, max-age=600, immutable
Yes any unchanged

While not recommended, it is possible to change the new default behavior to follow the older behavior (program ids equal or lower than 65000) by setting the Cloud Manager environment variable AEM_BLOB_ENABLE_CACHING_HEADERS to false.

Older default caching behavior

The AEM layer will not cache blob content by default.

NOTE

It is recommended to change the older default behavior to be consistent with the new behavior (program ids that are higher than 65000) by setting the Cloud Manager environment variable AEM_BLOB_ENABLE_CACHING_HEADERS to true. If the program is already live, make sure you verify that after the changes, content behaves as you expect.

At present, images in blob storage that are marked private cannot be cached at the dispatcher using Permission Sensitive Caching. The image is always requested from the AEM origin and served if the user is authorized.

NOTE

The other methods, including the dispatcher-ttl AEM ACS Commons project, will not successfully override the values.

Other content file types in node store

  • no default caching
  • default cannot be set with the EXPIRATION_TIME variable used for html/text file types
  • cache expiration can be set with the same LocationMatch strategy described in the html/text section by specifying the appropriate regex

Further Optimizations

  • Avoid using User-Agent as part of the Vary header. Older versions of the default Dispatcher setup (prior to archetype version 28) included this and we recommend you removing it by using the steps below.

    • Locate the vhost files in <Project Root>/dispatcher/src/conf.d/available_vhosts/*.vhost
    • Remove or comment out the line: Header append Vary User-Agent env=!dont-vary from all vhost files, with the exception of default.vhost, which is read-only
  • Use the Surrogate-Control header to control CDN caching independent from browser caching

  • Consider applying stale-while-revalidate and stale-if-error directives to allow background refresh and avoid cache misses, keeping your content fast and fresh for users.

    • There are many ways to apply these directives, but adding a 30 minute stale-while-revalidate to all cache control headers is a good starting point.
  • Some examples follow for various content types, which can be used as a guide when setting up your own caching rules. Please carefully consider and test for your specific setup and requirements:

    • Cache mutable client library resources for 12h and background refresh after 12h.

      <LocationMatch "^/etc\.clientlibs/.*\.(?i:json|png|gif|webp|jpe?g|svg)$">
         Header set Cache-Control "max-age=43200,stale-while-revalidate=43200,stale-if-error=43200,public" "expr=%{REQUEST_STATUS} < 400"
         Header set Age 0
      </LocationMatch>
      
    • Cache immutable client library resources long-term (30 days) with background refresh to avoid MISS.

      <LocationMatch "^/etc\.clientlibs/.*\.(?i:js|css|ttf|woff2)$">
         Header set Cache-Control "max-age=2592000,stale-while-revalidate=43200,stale-if-error=43200,public,immutable" "expr=%{REQUEST_STATUS} < 400"
         Header set Age 0
      </LocationMatch>
      
    • Cache HTML pages for 5min with background refresh 1h on browser and 12h on CDN. Cache-Control headers will always be added so it is important to ensure that matching html pages under /content/* are intended to be public. If not, consider using a more specific regex.

      <LocationMatch "^/content/.*\.html$">
         Header unset Cache-Control
         Header always set Cache-Control "max-age=300,stale-while-revalidate=3600" "expr=%{REQUEST_STATUS} < 400"
         Header always set Surrogate-Control "stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
         Header set Age 0
      </LocationMatch>
      
    • Cache content services/Sling model exporter json responses for 5min with background refresh 1h on browser and 12h on CDN.

      <LocationMatch "^/content/.*\.model\.json$">
         Header set Cache-Control "max-age=300,stale-while-revalidate=3600" "expr=%{REQUEST_STATUS} < 400"
         Header set Surrogate-Control "stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
         Header set Age 0
      </LocationMatch>
      
    • Cache immutable URLs from the core image component long-term (30 days) with background refresh to avoid MISS.

      <LocationMatch "^/content/.*\.coreimg.*\.(?i:jpe?g|png|gif|svg)$">
         Header set Cache-Control "max-age=2592000,stale-while-revalidate=43200,stale-if-error=43200,public,immutable" "expr=%{REQUEST_STATUS} < 400"
         Header set Age 0
      </LocationMatch>
      
    • Cache mutable resources from the DAM like images and video for 24h and background refresh after 12h to avoid MISS

      <LocationMatch "^/content/dam/.*\.(?i:jpe?g|gif|js|mov|mp4|png|svg|txt|zip|ico|webp|pdf)$">
         Header set Cache-Control "max-age=43200,stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
         Header set Age 0
      </LocationMatch>
      

HEAD request behavior

When a HEAD request is received at the Adobe CDN for a resource that is not cached, the request is transformed and received by the Dispatcher and/or AEM instance as a GET request. If the response is cacheable, then subsequent HEAD requests will be served from the CDN. If the response is not cacheable, then subsequent HEAD requests will be passed to the Dispatcher and/or AEM instance for a period of time that depends on the Cache-Control TTL.

Marketing campaign parameters

Website URLs frequently include marketing campaign parameters that are used to track a campaign’s success. In order to use the dispatcher cache effectively, it is recommended that you configure the dispatcher configuration’s ignoreUrlParams property as documented here.

The ignoreUrlParams section must be uncommented and should reference the file conf.dispatcher.d/cache/marketing_query_parameters.any. The file can be modified by uncommenting the lines corresponding to the parameters that are relevant to your marketing channels. You may add other parameters as well.

/ignoreUrlParams {
{{ /0001 { /glob "*" /type "deny" }}}
{{ $include "../cache/marketing_query_parameters.any"}}
}

Dispatcher Cache Invalidation

In general, it will not be necessary to invalidate the Dispatcher cache. Instead you should rely on the Dispatcher refreshing its cache when content is being republished and the CDN respecting cache expiration headers.

Dispatcher Cache Invalidation during Activation/Deactivation

Like previous versions of AEM, publishing or unpublishing pages clears the content from the Dispatcher cache. If a caching issue is suspected, you should republish the pages in question and ensure that a virtual host is available that matches the ServerAlias localhost, which is required for Dispatcher cache invalidation.

NOTE

For proper dispatcher invalidation, make sure that requests from “127.0.0.1”, “localhost”, “.local”, “.adobeaemcloud.com”, and “.adobeaemcloud.net” are all matched and handled by a vhost configuration so those request can be served. You can do this by either global matching “*” in a catch-all vhost configuration following the pattern in the reference AEM archetype or by ensuring that the previously mentioned list is caught by one of the vhosts.

When the publish instance receives a new version of a page or asset from the author, it uses the flush agent to invalidate appropriate paths on its Dispatcher. The updated path is removed from the Dispatcher cache, together with its parents, up to a level (you can configure this with the statfileslevel).

Explicit invalidation of the Dispatcher cache

Adobe recommends to rely on standard cache headers to control the content delivery life cycle. However, if needed, it is possible to invalidate content directly in Dispatcher.

The following list contains scenarios where you might want to explicitly invalidate the cache (while optionally listening for the completion of the invalidation):

  • After publishing content such as experience fragments or content fragments, invalidating published and cached content that references those elements.
  • Notifying an external system when referenced pages have been successfully invalidated.

There are two approaches to explicitly invalidate the cache:

  • The preferred approach is using Sling Content Distribution (SCD) from Author.
  • By using the Replication API to invoke the publish Dispatcher flush replication agent.

The approaches differ in terms of tier availability, the ability to deduplicate events and event processing guarantee. The table below summarizes these options:

N/A Tier availability Deduplication Guarantee Action Impact Description
Sling Content Distribution (SCD) API Author Possible using either the Discovery API or enabling the deduplication mode. At least once.
  1. ADD
  2. DELETE
  3. INVALIDATE
  1. Hierarchical/Stat Level
  2. Hierarchical/Stat Level
  3. Level Resource-Only
  1. Publishes content and invalidates the cache.
  2. Removes content and invalidates the cache.
  3. Invalidates content without publishing it.
Replication API Publish Not possible, event raised on every publish instance. Best effort.
  1. ACTIVATE
  2. DEACTIVATE
  3. DELETE
  1. Hierarchical/Stat Level
  2. Hierarchical/Stat Level
  3. Hierarchical/Stat Level
  1. Publishes content and invalidates the cache.
  2. From Author/Publish Tier - Removes content and invalidates the cache.
  3. From Author Tier - Removes content and invalidates the cache (if triggered from AEM Author tier on the Publish agent).

    From Publish Tier - Invalidates only the cache (if triggered from AEM Publish tier on the Flush or Resource-only-flush agent).

Please note that the two actions directly related to cache invalidation are Sling Content Distribution (SCD) API Invalidate and Replication API Deactivate.

Also, from the table, we observe that:

  • SCD API is needed when every event must be guaranteed, for example, syncing with an external system that requires accurate knowledge. If there is a publish tier upscaling event at the time of the invalidation call, an additional event is raised when each new publish processes the invalidation.

  • Using the Replication API isn’t a common use case, but should be used in cases where the trigger to invalidate the cache comes from the publish tier and not the author tier. This might be useful if dispatcher TTL is configured.

In conclusion, if you are looking to invalidate the Dispatcher cache, the recommended option is to use the SCD API Invalidate action from Author. Additionally, you can also listen for the event so you can then trigger further downstream actions.

Sling Content Distribution (SCD)

NOTE

When using the instructions presented below, please be aware that you should test the custom code in an AEM Cloud Service Dev environment and not locally.

When using the SCD action from Author, the implementation pattern is as follows:

  1. From Author, write custom code to invoke the sling content distribution API, passing the invalidate action with a list of paths:
@Reference
private Distributor distributor;

ResourceResolver resolver = ...; // the resource resolver used for authorizing the request
String agentName = "publish";    // the name of the agent used to distribute the request

String pathToInvalidate = "/content/to/invalidate";
DistributionRequest distributionRequest = new SimpleDistributionRequest(DistributionRequestType.INVALIDATE, false, pathToInvalidate);
distributor.distribute(agentName, resolver, distributionRequest);

  • (Optionally) Listen for an event that reflects the resource being invalidated for all Dispatcher instances:
package org.apache.sling.distribution.journal.shared;

import org.apache.sling.discovery.DiscoveryService;
import org.apache.sling.distribution.journal.impl.event.DistributionEvent;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;
import org.osgi.service.event.Event;
import org.osgi.service.event.EventHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import static org.apache.sling.distribution.DistributionRequestType.INVALIDATE;
import static org.apache.sling.distribution.event.DistributionEventProperties.DISTRIBUTION_PATHS;
import static org.apache.sling.distribution.event.DistributionEventProperties.DISTRIBUTION_TYPE;
import static org.apache.sling.distribution.event.DistributionEventTopics.AGENT_PACKAGE_DISTRIBUTED;
import static org.osgi.service.event.EventConstants.EVENT_TOPIC;

@Component(immediate = true, service = EventHandler.class, property = {
        EVENT_TOPIC + "=" + AGENT_PACKAGE_DISTRIBUTED
})
public class InvalidatedHandler implements EventHandler {
    private static final Logger LOG = LoggerFactory.getLogger(InvalidatedHandler.class);

    @Reference
    private DiscoveryService discoveryService;

    @Override
    public void handleEvent(Event event) {

        String distributionType = (String) event.getProperty(DISTRIBUTION_TYPE);

        if (INVALIDATE.name().equals(distributionType)) {
            boolean isLeader = discoveryService.getTopology().getLocalInstance().isLeader();
            // process the OSGi event on the leader author instance
            if (isLeader) {
                String[] paths = (String[]) event.getProperty(DISTRIBUTION_PATHS);
                String packageId = (String) event.getProperty(DistributionEvent.PACKAGE_ID);
                invalidated(paths, packageId);
            }
        }
    }

    private void invalidated(String[] paths, String packageId) {
        // custom logic
        LOG.info("Successfully applied package with id {}, paths {}", packageId, paths);
    }
}

  • (Optionally) Execute business logic in the invalidated(String[] paths, String packageId) method above.
NOTE

The Adobe CDN is not flushed when Dispatcher is invalidated. The Adobe-managed CDN respects TTLs and thus there is no need for it to be flushed.

Replication API

Presented below is the implementation pattern when using the replication API Deactivate action:

  1. On the publish tier, call the Replication API to trigger the publish Dispatcher flush replication agent.

The flush agent endpoint is not configurable but rather preconfigured to point to Dispatcher, matched with the publish service running alongside the flush agent.

The flush agent can typically be triggered by custom code based on OSGi events or workflows.

String[] paths = …
ReplicationOptions options = new ReplicationOptions();
options.setSynchronous(true);
options.setFilter( new AgentFilter {
  public boolean isIncluded (Agent agent) {
   return agent.getId().equals("flush");
  }
});

Replicator.replicate (session,ReplicationActionType.DELETE,paths, options);

Client-Side libraries and Version Consistency

Pages are composed of HTML, Javascript, CSS, and images. Customers are encouraged to leverage the Client-Side Libraries (clientlibs) framework to import Javascript and CSS resources into HTML pages, taking into account dependencies between JS libraries.

The clientlibs framework provides automatic version management, meaning that developers can check in changes to JS libraries in source control and the latest version will be made available when a customer pushes their release. Without this, developers would need to manually change HTML with references to the new version of the library, which is especially onerous if many HTML templates share the same library.

When the new versions of libraries are released to production, the referencing HTML pages are updated with new links to those updated library versions. Once the browser cache has expired for a given HTML page, there is no concern that the old libraries will be loaded from the browser cache since the refreshed page (from AEM) now is guaranteed to reference the new versions of the libraries. In other words, a refreshed HTML page will include all the latest library versions.

The mechanism for this is a serialized hash, which is appended to the client library link, ensuring a unique, versioned url for the browser to cache the CSS/JS. The serialized hash is only updated when the contents of the client library changes. This means that if unrelated updates occur (i.e no changes to the underlying css/js of the client library) even with a new deployment, the reference remains the same, ensuring less disruption to the browser cache.

Enabling Longcache versions of Client Side Libraries - AEM as a Cloud Service SDK Quickstart

Default clientlib includes on an HTML page look like the following example:

<link rel="stylesheet" href="/etc.clientlibs/wkndapp/clientlibs/clientlib-base.css" type="text/css">

When strict clientlib versioning is enabled, a long term hash key is added as a selector to the client library. As a result, the clientlib reference look like this:

<link rel="stylesheet" href="/etc.clientlibs/wkndapp/clientlibs/clientlib-base.lc-7c8c5d228445ff48ab49a8e3c865c562-lc.css" type="text/css">

Strict clientlib versioning is enabled by default in all AEM as a Cloud Service environments.

To enable strict clientlib versioning in the local SDK Quickstart perform the following actions:

  1. Navigate to the OSGi Configuration manager <host>/system/console/configMgr
  2. Find the OSGi Config for Adobe Granite HTML Library Manager:
    • Check the checkbox to enable Strict Versioning
    • In the field labeled Long term client side cache key, enter the value of /.*;hash
  3. Save the changes. It is not necessary to save this configuration in source control since AEM as a Cloud Service automatically enables this configuration in dev, stage and production environments.
  4. Any time the contents of the client library are changed, a new hash key is generated and the HTML reference will be updated.

On this page