Client requests

When clients send HTTP requests to the web server, the URL of the requested page must be resolved to the content in the Dispatcher cache, and eventually to the content in the repository.

  1. The domain name system discovers the IP address of the Web server that is registered for the domain name in the HTTP request.
  2. The HTTP request is sent to the web server.
  3. The HTTP request is passed to the Dispatcher.
  4. Dispatcher determines whether the cached files are valid. If valid, the cached files are served to the client.
  5. If cached files are not valid, Dispatcher requests newly rendered pages from the AEM publish instance.

Cache Invalidation

When Dispatcher Flush replication agents request that Dispatcher invalidates cached files, the path of the content in the repository must resolve to the content in the cache.

  • a - A page is activated on the AEM author instance and the content is replicated to the publishing instance.
  • b - The Dispatcher Flush Agent calls Dispatcher to invalidate the cache for the replicated content.
  • c - Dispatcher touches one or more .stat files to invalidate the cached files.

To use Dispatcher with multiple domains, you must configure AEM, Dispatcher, and your web server. The solutions described on this page are general and apply to most environments. Due to the complexity of some AEM topologies, your solution can require further custom configurations to resolve particular issues. You likely must adapt the examples to satisfy your existing IT infrastructure and management policies.

URL Mapping

To enable domain URLs and content paths to resolve to cached files, a file path or page URL must be translated during the process. Descriptions of the following common strategies are provided, where path or URL translations occur at different points in the process:

  • (Recommended) The AEM publish instance uses Sling mapping for resource resolution to implement internal URL rewriting rules. Domain URLs are translated to content repository paths. See AEM Rewrites Incoming URLs.
  • The web server uses internal URL rewriting rules that translate domain URLs to cache paths. See The Web Server Rewrites Incoming URLs.

It is desirable to use short URLs for web pages. Typically, page URLs mirror the structure of the repository folders that contain the web content. However, the URLs do not reveal the topmost repository nodes, such as /content. The client is not necessarily aware of the structure of the AEM repository.

General Requirements

Your environment must implement the following configurations to support Dispatcher working with multiple domains:

  • Content for each domain resides in separate branches of the repository (see the example environment below).
  • The Dispatcher Flush replication agent is configured on the AEM publish instance. (See Invalidating Dispatcher Cache from a Publishing Instance.)
  • The domain name system resolves the domain names to the IP address of the web server.
  • The Dispatcher cache mirrors the directory structure of the AEM content repository. The file paths below the document root of the web server are the same as the paths of the files in the repository.

Environment for the Provided Examples

The example solutions that are provided apply to an environment with the following characteristics:

  • The AEM author and publish instances are deployed on Linux® systems.

  • Apache HTTPD is the web server that is deployed on a Linux® system.

  • The AEM content repository and the document root of the web server use the following file structures (the document root of the Apache web server is /usr/lib/apache/httpd-2.4.3/htdocs):

    Repository

  | - /content
    | - sitea
  |    | - content nodes
    | - siteb
       | - content nodes

Document root of the web server

  | - /usr
    | - lib
      | - apache
        | - httpd-2.4.3
          | - htdocs
            | - content
              | - sitea
                 | - content nodes
              | - siteb
                 | - content nodes

AEM Rewrites Incoming URLs

Sling mapping for resource resolution enables you to associate incoming URLs with AEM content paths. Create mappings on the AEM publish instance so that render requests from Dispatcher resolve to the correct content in the repository.

Dispatcher requests for page rendering identify the page using the URL that it is passed from the web server. When the URL includes a domain name, Sling mappings resolve the URL to the content. The following graphic illustrates a mapping of the branda.com/en.html URL to the /content/sitea/en node.

The Dispatcher cache mirrors the repository node structure. Therefore, when page activations occur the resulting requests for invalidating the cached page require no URL or path translations.

Define virtual hosts on the web server

Define virtual hosts on the web server so that a different document root can be assigned to each web domain:

  • The web server must define a virtual domain for each of your web domains.
  • For each domain, configure the document root to coincide with the folder in the repository that contains the domain’s web content.
  • Each virtual domain must also include Dispatcher-related configurations, as described on the Installing Dispatcher page.

The following example httpd.conf file configures two virtual domains for an Apache web server:

  • The server names (which coincide with the domain names) are branda.com (line 16) and brandb.com (line 30).
  • The document root of each virtual domain is the directory in the Dispatcher cache that contains the site’s pages. (lines 17 and 31)

With this configuration, the web server performs the following actions when it receives a request for https://branda.com/en/products.html:

  • Associates the URL with the virtual host that has a ServerName of branda.com.

  • Forwards the URL to Dispatcher.

httpd.conf

# load the Dispatcher module
LoadModule dispatcher_module modules/mod_dispatcher.so
# configure the Dispatcher module
<IfModule disp_apache2.c>
 DispatcherConfig conf/dispatcher.any
 DispatcherLog    logs/dispatcher.log
 DispatcherLogLevel 3
 DispatcherNoServerHeader 0
 DispatcherDeclineRoot 0
 DispatcherUseProcessedURL 0
 DispatcherPassError 0
</IfModule>

# Define virtual host for brandA.com
<VirtualHost *:80>
  ServerName branda.com
  DocumentRoot /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea
   <Directory /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
     Options FollowSymLinks
     AllowOverride None
   </Directory>
</VirtualHost>

# define virtual host for brandB.com
<VirtualHost *:80>
  ServerName brandB.com
  DocumentRoot /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb
   <Directory /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb>
     <IfModule disp_apache2.c>
       SetHandler dispatcher-handler
       ModMimeUsePathInfo On
     </IfModule>
     Options FollowSymLinks
     AllowOverride None
   </Directory>
</VirtualHost>

# document root for web server
DocumentRoot "/usr/lib/apache/httpd-2.4.3/htdocs"

Virtual hosts inherit the DispatcherConfig property value that is configured in the main server section. Virtual hosts can include their own DispatcherConfig property to override the main server configuration.

NOTE
On AEM as a Cloud Service, a separate vhost configuration must be used with a DocumentRoot at a higher-level than each of the sub-pages. This is handled by default in the archetype, but when multiple DocumentRoots are used, a higher priority vhost configuration must be used so that cache invalidation can be handled for the whole cache as it can not be configured separately for each site. The ServerAlias of this new configuration must accept the host header “localhost”.

Configure Dispatcher to Handle Multiple Domains

To support URLs that include domain names and their corresponding virtual hosts, define the following Dispatcher farms:

  • Configure a Dispatcher farm for each virtual host. These farms process requests from the web server for each domain, check for cached files, and request pages from the renders.
  • Configure a Dispatcher farm that is used for invalidating content in the cache, regardless of which domain the content belongs to. This farm handles file invalidation requests from Flush Dispatcher replication agents.

Create Dispatcher farms for virtual hosts

Farms for virtual hosts must have the following configurations so that the URLs in client HTTP requests are resolved to the correct files in the Dispatcher cache:

  • The /virtualhosts property is set to the domain name. This property enables the Dispatcher to associate the farm with the domain.

  • The /filter property allows access to the path of the request URL truncated after the domain name part. For example, for the https://branda.com/en.html URL, the path is interpreted as /en.html, so the filter must allow access to this path.

  • The /docroot property is set to the path of the root directory. That is, the root directory of the domain’s site content in the Dispatcher cache. This path is used as the prefix for the concatenated URL from the original request. For example, the docroot of /usr/lib/apache/httpd-2.4.3/htdocs/sitea causes the request for https://branda.com/en.html to resolve to the /usr/lib/apache/httpd-2.4.3/htdocs/sitea/en.html file.

Also, the AEM publish instance must be designated as the render for the virtual host. Configure other farm properties as required. The following code is an abbreviated farm configuration for the branda.com domain:

/farm_sitea  {
    ...
    /virtualhosts { "branda.com" }
    /renders {
      /rend01  { /hostname "127.0.0.1"  /port "4503" }
    }
    /filter {
      /0001 { /type "deny"  /glob "*" }
      /0023 { /type "allow" /glob "*/en*" }
      ...
     }
    /cache {
      /docroot "/usr/lib/apache/httpd-2.4.3/htdocs/content/sitea"
      ...
   }
   ...
}

Create a Dispatcher farm for cache invalidation

A Dispatcher farm is required for handling requests for invalidating cached files. This farm must be able to access .stat files in the docroot directories of each virtual host.

The following property configurations enable the Dispatcher to resolve files in the AEM content repository from files in the cache:

  • The /docroot property is set to the default docroot of the web server. Typically, the /docroot is the directory where the /content folder is created. An example value for Apache on Linux® is /usr/lib/apache/httpd-2.4.3/htdocs.
  • The /filter property allows access to files below the /content directory.

The /statfileslevelproperty must be high enough so that .stat files are created in the root directory of each virtual host. This property enables the cache of each domain to be invalidated separately. For the example setup, a /statfileslevel value of 2 creates .stat files in the *docroot*/content/sitea directory and the *docroot*/content/siteb directory.

Also, the publish instance must be designated as the render for the virtual host. Configure other farm properties as required. The following code is an abbreviated configuration for the farm that is used for invalidating the cache:

/farm_flush {
    ...
    /virtualhosts   { "invalidation_only" }
    /renders  {
      /rend01  { /hostname "127.0.0.1" /port "4503" }
    }
    /filter   {
      /0001 { /type "deny"  /glob "*" }
      /0023 { /type "allow" /glob "*/content*" }
      ...
      }
    /cache  {
       /docroot "/usr/lib/apache/httpd-2.4.3/htdocs"
       /statfileslevel "2"
       ...
   }
   ...
}

When you start the web server, the Dispatcher log (in debug mode) indicates the initialization of all farms:

Dispatcher initializing (build 4.1.2)
[Fri Nov 02 16:27:18 2012] [D] [24974(140006182991616)] farms[farm_sitea].cache.docroot = /usr/lib/apache/httpd-2.4.3/htdocs/content/sitea
[Fri Nov 02 16:27:18 2012] [D] [24974(140006182991616)] farms[farm_siteb].cache.docroot = /usr/lib/apache/httpd-2.4.3/htdocs/content/siteb
[Fri Nov 02 16:27:18 2012] [D] [24974(140006182991616)] farms[farm_flush].cache.docroot = /usr/lib/apache/httpd-2.4.3/htdocs
[Fri Nov 02 16:27:18 2012] [I] [24974(140006182991616)] Dispatcher initialized (build 4.1.2)

Configure Sling Mapping for Resource Resolution

Use Sling mapping for resource resolution so that domain-based URLs resolve to content on the AEM publish instance. The resource mapping translates the incoming URLs from Dispatcher (originally from client HTTP requests) to content nodes.

To learn about Sling resource mapping, see Mappings for Resource Resolution in the Sling documentation.

Typically, mappings are required for the following resources, although other mappings can be necessary:

  • The root node of the content page (below /content)
  • The design node that the pages use (below /etc/designs)
  • The /libs folder

After you create the mapping for the content page, to discover more required mappings use a web browser to open a page on the web server. In the error.log file of the publish instance, locate messages about resources that are not found. The following example message indicates that a mapping for /etc/clientlibs is required:

01.11.2012 15:59:24.601 *INFO* [10.36.34.243 [1351799964599] GET /etc/clientlibs/foundation/jquery.js HTTP/1.1] org.apache.sling.engine.impl.SlingRequestProcessorImpl service: Resource /content/sitea/etc/clientlibs/foundation/jquery.js not found
NOTE
The Linkchecker transformer of the default Apache Sling rewriter automatically modifies hyperlinks in the page to prevent broken links. However, link rewriting is performed only when the link target is an HTML or HTM file. To update links to other file types, create a transformer component and add it to an HTML rewriter pipeline.