Traffic passes through the CDN to an apache web server layer, which supports modules including the dispatcher. In order to increase performance, the dispatcher is used primarily as a cache to limit processing on the publish nodes.
Rules can be applied to the dispatcher configuration to modify any default cache expiration settings, resulting in caching at the CDN. Note that dispatcher also respects the resulting cache expiration headers if enableTTL is enabled in the dispatcher configuration, implying that it will refresh specific content even outside of content being republished.
This page also describes how dispatcher cache is invalidated, as well as how caching works at the browser level with regards to client-side libraries.
cache-control header emitted by the apache layer. The CDN also respects this value.DISABLE_DEFAULT_CACHING variable in global.vars:Define DISABLE_DEFAULT_CACHING
This can be useful, for example, when your business logic requires fine tuning of the age header (with a value based on calendar day) since by default the age header is set to 0. That said, please exercise caution when turning off default caching.
can be overridden for all HTML/Text content by defining the EXPIRATION_TIME variable in global.vars using the AEM as a Cloud Service SDK Dispatcher tools.
can be overridden on a finer grained level, including controlling CDN and browser cache independently, with the following apache mod_headers directives:
<LocationMatch "^/content/.*\.(html)$">
Header set Cache-Control "max-age=200"
Header set Surrogate-Control "max-age=3600"
Header set Age 0
</LocationMatch>
The Surrogate-Control header applies to the Adobe managed CDN. If using a customer managed CDN, a different header may be required depending on your CDN provider.
Exercise caution when setting global cache control headers or those that match a wide regex so they are not applied to content that you might intend to keep private. Consider using multiple directives to ensure rules are applied in a fine-grained manner. With that said, AEM as a Cloud Service will remove the cache header if it detects that it has been applied to what it detects to be uncacheable by dispatcher, as described in dispatcher documentation. In order to force AEM to always apply the caching headers, one can add the always option as follows:
<LocationMatch "^/content/.*\.(html)$">
Header unset Cache-Control
Header unset Expires
Header always set Cache-Control "max-age=200"
Header set Age 0
</LocationMatch>
You must ensure that a file under src/conf.dispatcher.d/cache has the following rule (which is in the default configuration):
/0000
{ /glob "*" /type "allow" }
To prevent specific content from being cached at the CDN, set the Cache-Control header to private. For example, the following would prevent html content under a directory named secure from being cached at the CDN:
<LocationMatch "/content/secure/.*\.(html)$">. // replace with the right regex
Header unset Cache-Control
Header unset Expires
Header always set Cache-Control “private”
</LocationMatch>
The other methods, including the dispatcher-ttl AEM ACS Commons project, will not successfully override values.
Please note that dispatcher might still cache content according to its own caching rules. To make the content truly private you should ensure that it is not cached by dispatcher.
The default behavior for programs created after mid-May 2022 (specifically, for program ids that are higher than 65000) is to cache by default, while also respecting the request’s authentication context. Older programs (program ids equal or lower than 65000) do not cache blob content by default.
In both cases, the caching headers can be overridden on a finer grained level at the apache/dispatcher layer by using the apache mod_headers directives, for example:
<LocationMatch "^/content/.*\.(jpeg|jpg)$">
Header set Cache-Control "max-age=222"
Header set Age 0
</LocationMatch>
When modifying the caching headers at the dispatcher layer, please be cautious not to cache too widely, see the discussion in the HTML/text section above. Also, make sure that assets that are meant to be kept private (rather than cached) are not part of the LocationMatch directive filters.
The AEM layer will set cache headers depending on whether the cache header has already been set and the value of the request type. Please note that if no cache control header has been set, public content is cached and authenticated traffic is set to private. If a cache control header has been set, the cache headers will be untouched.
| Cache control header exists? | Request type | AEM sets cache headers to |
|---|---|---|
| No | public | Cache-Control: public, max-age=600, immutable |
| No | authenticated | Cache-Control: private, max-age=600, immutable |
| Yes | any | unchanged |
While not recommended, it is possible to change the new default behavior to follow the older behavior (program ids equal or lower than 65000) by setting the Cloud Manager environment variable AEM_BLOB_ENABLE_CACHING_HEADERS to false.
The AEM layer will not cache blob content by default.
It is recommended to change the older default behavior to be consistent with the new behavior (program ids that are higher than 65000) by setting the Cloud Manager environment variable AEM_BLOB_ENABLE_CACHING_HEADERS to true. If the program is already live, make sure you verify that after the changes, content behaves as you expect.
The other methods, including the dispatcher-ttl AEM ACS Commons project, will not successfully override the values.
EXPIRATION_TIME variable used for html/text file typesAvoid using User-Agent as part of the Vary header. Older versions of the default dispatcher setup (prior to archetype version 28) included this and we recommend you remove that using the steps below.
<Project Root>/dispatcher/src/conf.d/available_vhosts/*.vhostHeader append Vary User-Agent env=!dont-vary from all vhost files, with the exception of default.vhost, which is read-onlyUse the Surrogate-Control header to control CDN caching independent from browser caching
Consider applying stale-while-revalidate and stale-if-error directives to allow background refresh and avoid cache misses, keeping your content fast and fresh for users.
stale-while-revalidate to all cache control headers is a good starting point.Some examples follow for various content types, which can be used as a guide when setting up your own caching rules. Please carefully consider and test for your specific setup and requirements:
Cache mutable client library resources for 12h and background refresh after 12h.
<LocationMatch "^/etc\.clientlibs/.*\.(?i:json|png|gif|webp|jpe?g|svg)$">
Header set Cache-Control "max-age=43200,stale-while-revalidate=43200,stale-if-error=43200,public" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache immutable client library resources long-term (30 days) with background refresh to avoid MISS.
<LocationMatch "^/etc\.clientlibs/.*\.(?i:js|css|ttf|woff2)$">
Header set Cache-Control "max-age=2592000,stale-while-revalidate=43200,stale-if-error=43200,public,immutable" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache HTML pages for 5min with background refresh 1h on browser and 12h on CDN. Cache-Control headers will always be added so it is important to ensure that matching html pages under /content/* are intended to be public. If not, consider using a more specifc regex.
<LocationMatch "^/content/.*\.html$">
Header unset Cache-Control
Header always set Cache-Control "max-age=300,stale-while-revalidate=3600" "expr=%{REQUEST_STATUS} < 400"
Header always set Surrogate-Control "stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache content services/Sling model exporter json responses for 5min with background refresh 1h on browser and 12h on CDN.
<LocationMatch "^/content/.*\.model\.json$">
Header set Cache-Control "max-age=300,stale-while-revalidate=3600" "expr=%{REQUEST_STATUS} < 400"
Header set Surrogate-Control "stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache immutable URLs from the core image component long-term (30 days) with background refresh to avoid MISS.
<LocationMatch "^/content/.*\.coreimg.*\.(?i:jpe?g|png|gif|svg)$">
Header set Cache-Control "max-age=2592000,stale-while-revalidate=43200,stale-if-error=43200,public,immutable" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache mutable resources from the DAM like images and video for 24h and background refresh after 12h to avoid MISS
<LocationMatch "^/content/dam/.*\.(?i:jpe?g|gif|js|mov|mp4|png|svg|txt|zip|ico|webp|pdf)$">
Header set Cache-Control "max-age=43200,stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
When a HEAD request is received at the Adobe CDN for a resource that is not cached, the request is transformed and received by the dispatcher and/or AEM instance as a GET request. If the response is cacheable then subsequent HEAD requests will be served from the CDN. If the response is not cacheable then subsequent HEAD requests will be passed to the dispatcher and/or AEM instance for a period of time that depends on the Cache-Control TTL.
In general, it will not be necessary to invalidate the dispatcher cache. Instead you should rely on the dispatcher refreshing its cache when content is being republished and the CDN respecting cache expiration headers.
Like previous versions of AEM, publishing or unpublishing pages will clear the content from the dispatcher cache. If a caching issue is suspected, customers should republish the pages in question and ensure that a virtual host is available that matches ServerAlias localhost, which is required for dispatcher cache invalidation.
When the publish instance receives a new version of a page or asset from the author, it uses the flush agent to invalidate appropriate paths on its dispatcher. The updated path is removed from the dispatcher cache, together with its parents, up to a level (you can configure this with the statfileslevel).
Adobe recommends to rely on standard cache headers to control the content delivery life cycle. However, if needed, it is possible to invalidate content directly in dispatcher.
The following list contains scenarios where you might want to explicitly invalidate the cache (while optionally listening for the completion of the invalidation):
There are two approaches to explicitly invalidate the cache:
The approaches differ in terms of tier availability, the ability to deduplicate events and event processing guarantee. The table below summarizes these options:
| N/A | Tier availability | Deduplication | Guarantee | Action | Impact | Description |
|---|---|---|---|---|---|---|
| Sling Content Distribution (SCD) API | Author | Possible using either the Discovery API or enabling the deduplication mode. | At least once. |
|
|
|
| Replication API | Publish | Not possible, event raised on every publish instance. | Best effort. |
|
|
|
Please note that the two actions directly related to cache invalidation are Sling Content Distribution (SCD) API Invalidate and Replication API Deactivate.
Also, from the table, we observe that:
SCD API is needed when every event must be guaranteed, for example, syncing with an external system that requires accurate knowledge. Note that if there is a publish tier upscaling event at the time of the invalidation call, an additional event will be raised when each new publish processes the invalidation.
Using the Replication API isn’t a common use case, but should be used in cases where the trigger to invalidate the cache comes from the publish tier and not the author tier. This might be useful if dispatcher TTL is configured.
In conclusion, if you are looking to invalidate the dispatcher cache, the recommended option is to use the SCD API Invalidate action from Author. Additionally, you can also listen for the event so you can then trigger further downstream actions.
When using the instructions presented below, please be aware that you should test the custom code in an AEM Cloud Service Dev environment and not locally.
When using the SCD action from Author, the implementation pattern is as follows:
@Reference
private Distributor distributor;
ResourceResolver resolver = ...; // the resource resolver used for authorizing the request
String agentName = "publish"; // the name of the agent used to distribute the request
String pathToInvalidate = "/content/to/invalidate";
DistributionRequest distributionRequest = new SimpleDistributionRequest(DistributionRequestType.INVALIDATE, false, pathToInvalidate);
distributor.distribute(agentName, resolver, distributionRequest);
package org.apache.sling.distribution.journal.shared;
import org.apache.sling.discovery.DiscoveryService;
import org.apache.sling.distribution.journal.impl.event.DistributionEvent;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;
import org.osgi.service.event.Event;
import org.osgi.service.event.EventHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import static org.apache.sling.distribution.DistributionRequestType.INVALIDATE;
import static org.apache.sling.distribution.event.DistributionEventProperties.DISTRIBUTION_PATHS;
import static org.apache.sling.distribution.event.DistributionEventProperties.DISTRIBUTION_TYPE;
import static org.apache.sling.distribution.event.DistributionEventTopics.AGENT_PACKAGE_DISTRIBUTED;
import static org.osgi.service.event.EventConstants.EVENT_TOPIC;
@Component(immediate = true, service = EventHandler.class, property = {
EVENT_TOPIC + "=" + AGENT_PACKAGE_DISTRIBUTED
})
public class InvalidatedHandler implements EventHandler {
private static final Logger LOG = LoggerFactory.getLogger(InvalidatedHandler.class);
@Reference
private DiscoveryService discoveryService;
@Override
public void handleEvent(Event event) {
String distributionType = (String) event.getProperty(DISTRIBUTION_TYPE);
if (INVALIDATE.name().equals(distributionType)) {
boolean isLeader = discoveryService.getTopology().getLocalInstance().isLeader();
// process the OSGi event on the leader author instance
if (isLeader) {
String[] paths = (String[]) event.getProperty(DISTRIBUTION_PATHS);
String packageId = (String) event.getProperty(DistributionEvent.PACKAGE_ID);
invalidated(paths, packageId);
}
}
}
private void invalidated(String[] paths, String packageId) {
// custom logic
LOG.info("Successfully applied package with id {}, paths {}", packageId, paths);
}
}
invalidated(String[] paths, String packageId) method above.The Adobe CDN is not flushed when dispatcher is invalidated. The Adobe-managed CDN respects TTLs and thus there is no need for it to be flushed.
Presented below is the implementation pattern when using the replication API Deactivate action:
The flush agent endpoint is not configurable but rather preconfigured to point to dispatcher, matched with the publish service running alongside the flush agent.
The flush agent can typically be triggered by custom code based on OSGi events or workflows.
String[] paths = …
ReplicationOptions options = new ReplicationOptions();
options.setSynchronous(true);
options.setFilter( new AgentFilter {
public boolean isIncluded (Agent agent) {
return agent.getId().equals(“flush”);
}
});
Replicator.replicate (session,ReplicationActionType.DELETE,paths, options);
Pages are composed of of HTML, Javascript, CSS, and images. Customers are encouraged to leverage the Client-Side Libraries (clientlibs) framework to import Javascript and CSS resources into HTML pages, taking into account dependencies between JS libraries.
The clientlibs framework provides automatic version management, meaning that developers can check in changes to JS libraries in source control and the latest version will be made available when a customer pushes their release. Without this, developers would need to manually change HTML with references to the new version of the library, which is especially onerous if many HTML templates share the same library.
When the new versions of libraries are released to production, the referencing HTML pages are updated with new links to those updated library versions. Once the browser cache has expired for a given HTML page, there is no concern that the old libraries will be loaded from the browser cache since the refreshed page (from AEM) now is guaranteed to reference the new versions of the libraries. In other words, a refreshed HTML page will include all the latest library versions.
The mechanism for this is a serialized hash, which is appended to the client library link, ensuring a unique, versioned url for the browser to cache the CSS/JS. The serialized hash is only updated when the contents of the client library changes. This means that if unrelated updates occur (i.e no changes to the underlying css/js of the client library) even with a new deployment, the reference remains the same, ensuring less disruption to the browser cache.
Default clientlib includes on an HTML page look like the following example:
<link rel="stylesheet" href="/etc.clientlibs/wkndapp/clientlibs/clientlib-base.css" type="text/css">
When strict clientlib versioning is enabled, a long term hash key is added as a selector to the client library. As a result, the clientlib reference look like this:
<link rel="stylesheet" href="/etc.clientlibs/wkndapp/clientlibs/clientlib-base.lc-7c8c5d228445ff48ab49a8e3c865c562-lc.css" type="text/css">
Strict clientlib versioning is enabled by default in all AEM as a Cloud Service environments.
To enable strict clientlib versioning in the local SDK Quickstart perform the following actions:
<host>/system/console/configMgr