Traffic passes through the CDN to an Apache web server layer, which supports modules including Dispatcher. In order to increase performance, Dispatcher is used primarily as a cache to limit processing on the publish nodes.
Rules can be applied to the Dispatcher configuration to modify any default cache expiration settings, resulting in caching at the CDN. Note that Dispatcher also respects the resulting cache expiration headers if enableTTL
is enabled in the Dispatcher configuration, implying that it will refresh specific content even outside of content being republished.
This page also describes how the Dispatcher cache is invalidated and how caching works at the browser level with regards to client-side libraries.
cache-control
header emitted by the Apache layer. The CDN also respects this value.DISABLE_DEFAULT_CACHING
variable in global.vars
:Define DISABLE_DEFAULT_CACHING
This can be useful, for example, when your business logic requires fine tuning of the age header (with a value based on calendar day) since by default the age header is set to 0. That said, please exercise caution when turning off default caching.
can be overridden for all HTML/Text content by defining the EXPIRATION_TIME
variable in global.vars
using the AEM as a Cloud Service SDK Dispatcher tools.
can be overridden on a finer grained level, including controlling CDN and browser cache independently, with the following Apache mod_headers
directives:
<LocationMatch "^/content/.*\.(html)$">
Header set Cache-Control "max-age=200"
Header set Surrogate-Control "max-age=3600"
Header set Age 0
</LocationMatch>
The Surrogate-Control header applies to the Adobe managed CDN. If using a customer managed CDN, a different header may be required depending on your CDN provider.
Exercise caution when setting either global cache control headers or those that match a wide regex so they are not applied to content that you need to keep private. Consider using multiple directives to ensure rules are applied in a fine-grained manner. With that said, AEM as a Cloud Service will remove the cache header if it detects that it has been applied to what it detects to be uncacheable by Dispatcher, as described in Dispatcher documentation. In order to force AEM to always apply the caching headers, one can add the always option as follows:
<LocationMatch "^/content/.*\.(html)$">
Header unset Cache-Control
Header unset Expires
Header always set Cache-Control "max-age=200"
Header set Age 0
</LocationMatch>
You must ensure that a file under src/conf.dispatcher.d/cache
has the following rule (which is in the default configuration):
/0000
{ /glob "*" /type "allow" }
To prevent specific content from being cached at the CDN, set the Cache-Control header to private. For example, the following would prevent html content under a directory named secure from being cached at the CDN:
<LocationMatch "/content/secure/.*\.(html)$">. // replace with the right regex
Header unset Cache-Control
Header unset Expires
Header always set Cache-Control "private"
</LocationMatch>
While HTML content set to private will not be cached at the CDN, it can be cached at the dispatcher if Permission Sensitive Caching is configured, ensuring that only authorized users can be served the content.
The other methods, including the dispatcher-ttl AEM ACS Commons project, will not successfully override values.
Please note that Dispatcher might still cache content according to its own caching rules. To make the content truly private you should ensure that it is not cached by Dispatcher.
The default behavior for programs created after mid-May 2022 (specifically, for program ids that are higher than 65000) is to cache by default, while also respecting the request’s authentication context. Older programs (program ids equal or lower than 65000) do not cache blob content by default.
In both cases, the caching headers can be overridden on a finer grained level at the Apache/Dispatcher layer by using the Apache mod_headers
directives, for example:
<LocationMatch "^/content/.*\.(jpeg|jpg)$">
Header set Cache-Control "max-age=222"
Header set Age 0
</LocationMatch>
When modifying the caching headers at the Dispatcher layer, please be cautious not to cache too widely, see the discussion in the HTML/text section above. Also, make sure that assets that are meant to be kept private (rather than cached) are not part of the LocationMatch
directive filters.
The AEM layer will set cache headers depending on whether the cache header has already been set and the value of the request type. Please note that if no cache control header has been set, public content is cached and authenticated traffic is set to private. If a cache control header has been set, the cache headers will be left untouched.
Cache control header exists? | Request type | AEM sets cache headers to |
---|---|---|
No | public | Cache-Control: public, max-age=600, immutable |
No | authenticated | Cache-Control: private, max-age=600, immutable |
Yes | any | unchanged |
While not recommended, it is possible to change the new default behavior to follow the older behavior (program ids equal or lower than 65000) by setting the Cloud Manager environment variable AEM_BLOB_ENABLE_CACHING_HEADERS
to false.
The AEM layer will not cache blob content by default.
It is recommended to change the older default behavior to be consistent with the new behavior (program ids that are higher than 65000) by setting the Cloud Manager environment variable AEM_BLOB_ENABLE_CACHING_HEADERS to true. If the program is already live, make sure you verify that after the changes, content behaves as you expect.
At present, images in blob storage that are marked private cannot be cached at the dispatcher using Permission Sensitive Caching. The image is always requested from the AEM origin and served if the user is authorized.
The other methods, including the dispatcher-ttl AEM ACS Commons project, will not successfully override the values.
EXPIRATION_TIME
variable used for html/text file typesAvoid using User-Agent
as part of the Vary
header. Older versions of the default Dispatcher setup (prior to archetype version 28) included this and we recommend you removing it by using the steps below.
<Project Root>/dispatcher/src/conf.d/available_vhosts/*.vhost
Header append Vary User-Agent env=!dont-vary
from all vhost files, with the exception of default.vhost, which is read-onlyUse the Surrogate-Control
header to control CDN caching independent from browser caching
Consider applying stale-while-revalidate
and stale-if-error
directives to allow background refresh and avoid cache misses, keeping your content fast and fresh for users.
stale-while-revalidate
to all cache control headers is a good starting point.Some examples follow for various content types, which can be used as a guide when setting up your own caching rules. Please carefully consider and test for your specific setup and requirements:
Cache mutable client library resources for 12h and background refresh after 12h.
<LocationMatch "^/etc\.clientlibs/.*\.(?i:json|png|gif|webp|jpe?g|svg)$">
Header set Cache-Control "max-age=43200,stale-while-revalidate=43200,stale-if-error=43200,public" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache immutable client library resources long-term (30 days) with background refresh to avoid MISS.
<LocationMatch "^/etc\.clientlibs/.*\.(?i:js|css|ttf|woff2)$">
Header set Cache-Control "max-age=2592000,stale-while-revalidate=43200,stale-if-error=43200,public,immutable" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache HTML pages for 5min with background refresh 1h on browser and 12h on CDN. Cache-Control headers will always be added so it is important to ensure that matching html pages under /content/* are intended to be public. If not, consider using a more specific regex.
<LocationMatch "^/content/.*\.html$">
Header unset Cache-Control
Header always set Cache-Control "max-age=300,stale-while-revalidate=3600" "expr=%{REQUEST_STATUS} < 400"
Header always set Surrogate-Control "stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache content services/Sling model exporter json responses for 5min with background refresh 1h on browser and 12h on CDN.
<LocationMatch "^/content/.*\.model\.json$">
Header set Cache-Control "max-age=300,stale-while-revalidate=3600" "expr=%{REQUEST_STATUS} < 400"
Header set Surrogate-Control "stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache immutable URLs from the core image component long-term (30 days) with background refresh to avoid MISS.
<LocationMatch "^/content/.*\.coreimg.*\.(?i:jpe?g|png|gif|svg)$">
Header set Cache-Control "max-age=2592000,stale-while-revalidate=43200,stale-if-error=43200,public,immutable" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
Cache mutable resources from the DAM like images and video for 24h and background refresh after 12h to avoid MISS
<LocationMatch "^/content/dam/.*\.(?i:jpe?g|gif|js|mov|mp4|png|svg|txt|zip|ico|webp|pdf)$">
Header set Cache-Control "max-age=43200,stale-while-revalidate=43200,stale-if-error=43200" "expr=%{REQUEST_STATUS} < 400"
Header set Age 0
</LocationMatch>
When a HEAD request is received at the Adobe CDN for a resource that is not cached, the request is transformed and received by the Dispatcher and/or AEM instance as a GET request. If the response is cacheable, then subsequent HEAD requests will be served from the CDN. If the response is not cacheable, then subsequent HEAD requests will be passed to the Dispatcher and/or AEM instance for a period of time that depends on the Cache-Control
TTL.
Website URLs frequently include marketing campaign parameters that are used to track a campaign’s success. In order to use the dispatcher cache effectively, it is recommended that you configure the dispatcher configuration’s ignoreUrlParams
property as documented here.
The ignoreUrlParams
section must be uncommented and should reference the file conf.dispatcher.d/cache/marketing_query_parameters.any
. The file can be modified by uncommenting the lines corresponding to the parameters that are relevant to your marketing channels. You may add other parameters as well.
/ignoreUrlParams {
{{ /0001 { /glob "*" /type "deny" }}}
{{ $include "../cache/marketing_query_parameters.any"}}
}
In general, it will not be necessary to invalidate the Dispatcher cache. Instead you should rely on the Dispatcher refreshing its cache when content is being republished and the CDN respecting cache expiration headers.
Like previous versions of AEM, publishing or unpublishing pages clears the content from the Dispatcher cache. If a caching issue is suspected, you should republish the pages in question and ensure that a virtual host is available that matches the ServerAlias
localhost, which is required for Dispatcher cache invalidation.
For proper dispatcher invalidation, make sure that requests from “127.0.0.1”, “localhost”, “.local”, “.adobeaemcloud.com”, and “.adobeaemcloud.net” are all matched and handled by a vhost configuration so those request can be served. You can do this by either global matching “*” in a catch-all vhost configuration following the pattern in the reference AEM archetype or by ensuring that the previously mentioned list is caught by one of the vhosts.
When the publish instance receives a new version of a page or asset from the author, it uses the flush agent to invalidate appropriate paths on its Dispatcher. The updated path is removed from the Dispatcher cache, together with its parents, up to a level (you can configure this with the statfileslevel).
Adobe recommends to rely on standard cache headers to control the content delivery life cycle. However, if needed, it is possible to invalidate content directly in Dispatcher.
The following list contains scenarios where you might want to explicitly invalidate the cache (while optionally listening for the completion of the invalidation):
There are two approaches to explicitly invalidate the cache:
The approaches differ in terms of tier availability, the ability to deduplicate events and event processing guarantee. The table below summarizes these options:
N/A | Tier availability | Deduplication | Guarantee | Action | Impact | Description |
---|---|---|---|---|---|---|
Sling Content Distribution (SCD) API | Author | Possible using either the Discovery API or enabling the deduplication mode. | At least once. |
|
|
|
Replication API | Publish | Not possible, event raised on every publish instance. | Best effort. |
|
|
|
Please note that the two actions directly related to cache invalidation are Sling Content Distribution (SCD) API Invalidate and Replication API Deactivate.
Also, from the table, we observe that:
SCD API is needed when every event must be guaranteed, for example, syncing with an external system that requires accurate knowledge. If there is a publish tier upscaling event at the time of the invalidation call, an additional event is raised when each new publish processes the invalidation.
Using the Replication API isn’t a common use case, but should be used in cases where the trigger to invalidate the cache comes from the publish tier and not the author tier. This might be useful if dispatcher TTL is configured.
In conclusion, if you are looking to invalidate the Dispatcher cache, the recommended option is to use the SCD API Invalidate action from Author. Additionally, you can also listen for the event so you can then trigger further downstream actions.
When using the instructions presented below, please be aware that you should test the custom code in an AEM Cloud Service Dev environment and not locally.
When using the SCD action from Author, the implementation pattern is as follows:
@Reference
private Distributor distributor;
ResourceResolver resolver = ...; // the resource resolver used for authorizing the request
String agentName = "publish"; // the name of the agent used to distribute the request
String pathToInvalidate = "/content/to/invalidate";
DistributionRequest distributionRequest = new SimpleDistributionRequest(DistributionRequestType.INVALIDATE, false, pathToInvalidate);
distributor.distribute(agentName, resolver, distributionRequest);
package org.apache.sling.distribution.journal.shared;
import org.apache.sling.discovery.DiscoveryService;
import org.apache.sling.distribution.journal.impl.event.DistributionEvent;
import org.osgi.service.component.annotations.Component;
import org.osgi.service.component.annotations.Reference;
import org.osgi.service.event.Event;
import org.osgi.service.event.EventHandler;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import static org.apache.sling.distribution.DistributionRequestType.INVALIDATE;
import static org.apache.sling.distribution.event.DistributionEventProperties.DISTRIBUTION_PATHS;
import static org.apache.sling.distribution.event.DistributionEventProperties.DISTRIBUTION_TYPE;
import static org.apache.sling.distribution.event.DistributionEventTopics.AGENT_PACKAGE_DISTRIBUTED;
import static org.osgi.service.event.EventConstants.EVENT_TOPIC;
@Component(immediate = true, service = EventHandler.class, property = {
EVENT_TOPIC + "=" + AGENT_PACKAGE_DISTRIBUTED
})
public class InvalidatedHandler implements EventHandler {
private static final Logger LOG = LoggerFactory.getLogger(InvalidatedHandler.class);
@Reference
private DiscoveryService discoveryService;
@Override
public void handleEvent(Event event) {
String distributionType = (String) event.getProperty(DISTRIBUTION_TYPE);
if (INVALIDATE.name().equals(distributionType)) {
boolean isLeader = discoveryService.getTopology().getLocalInstance().isLeader();
// process the OSGi event on the leader author instance
if (isLeader) {
String[] paths = (String[]) event.getProperty(DISTRIBUTION_PATHS);
String packageId = (String) event.getProperty(DistributionEvent.PACKAGE_ID);
invalidated(paths, packageId);
}
}
}
private void invalidated(String[] paths, String packageId) {
// custom logic
LOG.info("Successfully applied package with id {}, paths {}", packageId, paths);
}
}
invalidated(String[] paths, String packageId)
method above.The Adobe CDN is not flushed when Dispatcher is invalidated. The Adobe-managed CDN respects TTLs and thus there is no need for it to be flushed.
Presented below is the implementation pattern when using the replication API Deactivate action:
The flush agent endpoint is not configurable but rather preconfigured to point to Dispatcher, matched with the publish service running alongside the flush agent.
The flush agent can typically be triggered by custom code based on OSGi events or workflows.
String[] paths = …
ReplicationOptions options = new ReplicationOptions();
options.setSynchronous(true);
options.setFilter( new AgentFilter {
public boolean isIncluded (Agent agent) {
return agent.getId().equals("flush");
}
});
Replicator.replicate (session,ReplicationActionType.DELETE,paths, options);
Pages are composed of HTML, Javascript, CSS, and images. Customers are encouraged to leverage the Client-Side Libraries (clientlibs) framework to import Javascript and CSS resources into HTML pages, taking into account dependencies between JS libraries.
The clientlibs framework provides automatic version management, meaning that developers can check in changes to JS libraries in source control and the latest version will be made available when a customer pushes their release. Without this, developers would need to manually change HTML with references to the new version of the library, which is especially onerous if many HTML templates share the same library.
When the new versions of libraries are released to production, the referencing HTML pages are updated with new links to those updated library versions. Once the browser cache has expired for a given HTML page, there is no concern that the old libraries will be loaded from the browser cache since the refreshed page (from AEM) now is guaranteed to reference the new versions of the libraries. In other words, a refreshed HTML page will include all the latest library versions.
The mechanism for this is a serialized hash, which is appended to the client library link, ensuring a unique, versioned url for the browser to cache the CSS/JS. The serialized hash is only updated when the contents of the client library changes. This means that if unrelated updates occur (i.e no changes to the underlying css/js of the client library) even with a new deployment, the reference remains the same, ensuring less disruption to the browser cache.
Default clientlib includes on an HTML page look like the following example:
<link rel="stylesheet" href="/etc.clientlibs/wkndapp/clientlibs/clientlib-base.css" type="text/css">
When strict clientlib versioning is enabled, a long term hash key is added as a selector to the client library. As a result, the clientlib reference look like this:
<link rel="stylesheet" href="/etc.clientlibs/wkndapp/clientlibs/clientlib-base.lc-7c8c5d228445ff48ab49a8e3c865c562-lc.css" type="text/css">
Strict clientlib versioning is enabled by default in all AEM as a Cloud Service environments.
To enable strict clientlib versioning in the local SDK Quickstart perform the following actions:
<host>/system/console/configMgr