With AEM as a Cloud Service, Adobe is moving away from an AEM instance-centric model to a service-based view with n-x AEM Containers, driven by CI/CD pipelines in the Cloud Manager. Instead of configuring and maintaining Indexes on single AEM instances, the Index configuration has to be specified before a deployment. Configuration changes in production are clearly breaking CI/CD policies. The same holds true for index changes since it can impact system stability and performance if not specified tested and reindexed before bringing them into production.
Below is a list of the main changes compared to AEM 6.5 and earlier versions:
Limitations:
lucene
.damAssetLucene
index might, on Skyline, in fact be executed against an Elasticsearch version of this index. This difference is typically not visible to the application and user, however certain tools such as the explain
feature will report a different index. For differences between Lucene indexes and Elastic indexes, see the Elastic documentation in Apache Jackrabbit Oak. Customers do not need to, and can not, configure Elasticsearch indexes directly.Defining indexes can comprise of these three use cases:
For both points 1 and 2 above, you need to create a new index definition as part of your custom code base in the respective Cloud Manager release schedule. For more information, see the Deploying to AEM as a Cloud Service documentation.
An index definition can be either be:
/oak:index/cqPageLucene-2
./oak:index/cqPageLucene-2-custom-1
./oak:index/acme.product-1-custom-2
. To avoid naming collisions, we require that fully custom indexes have a prefix, for example, acme.
Notice that both customization of an out-of-the-box index, as well as fully custom indexes, need to contain -custom-
. Only fully custom indexes must start with a prefix.
If customizing an out-of-the-box index, for example damAssetLucene-6
, please copy the latest out-of-the-box index definition from a Cloud Service environment using the CRX DE Package Manager (/crx/packmgr/
) . Then rename the configuration, for example to damAssetLucene-6-custom-1
, and add your customizations on top. This ensures that required configurations are not being removed inadvertently. For example, the tika
node under /oak:index/damAssetLucene-6/tika
is required in the customized index of the cloud service. It doesn’t exist on the Cloud SDK.
You need to prepare a new index definition package that contains the actual index definition, following this naming pattern:
<indexName>[-<productVersion>]-custom-<customVersion>
which then needs to go under ui.apps/src/main/content/jcr_root
. All customized and custom index definitions need to be stored under /oak:index
.
The filter for the package needs to be set such that existing (out-of-the-box indexes) are retained. In the file ui.apps/src/main/content/META-INF/vault/filter.xml
, each custom (or customized) index needs to be listed, for example as <filter root="/oak:index/damAssetLucene-6-custom-1"/>
. If the index version is later changed, the filter needs to be adjusted.
Any content package containing index definitions should have the following property set in in the properties file of the content package, located at /META-INF/vault/properties.xml
:
noIntermediateSaves=true
Index definitions are marked as custom and versioned:
/oak:index/ntBaseLucene-custom-1
)To deploy a custom or customized index, the index definition (/oak:index/definitionname
) needs to be delivered via ui.apps
via Git and the Cloud Manager deployment process. In the FileVault filter, for example, ui.apps/src/main/content/META-INF/vault/filter.xml
, list each custom and customized index individually, for example <filter root="/oak:index/damAssetLucene-7-custom-1"/>
. The custom / customized index definition itself will then be stored in the file ui.apps/src/main/content/jcr_root/_oak_index/damAssetLucene-7-custom-1/.content.xml
, as follows:
<?xml version="1.0" encoding="UTF-8"?>
<jcr:root xmlns:oak="https://jackrabbit.apache.org/oak/ns/1.0" xmlns:dam="http://www.day.com/dam/1.0" xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0" xmlns:rep="internal"
jcr:primaryType="oak:QueryIndexDefinition"
async="[async,nrt]"
compatVersion="{Long}2"
...
</indexRules>
<tika jcr:primaryType="nt:unstructured">
<config.xml jcr:primaryType="nt:file"/>
</tika>
</jcr:root>
The above example contains a configuration for Apache Tika. The Tika configuration file would be stored under ui.apps/src/main/content/jcr_root/_oak_index/damAssetLucene-7-custom-1/tika/config.xml
.
Depending on which version of the Jackrabbit Filevault Maven Package Plugin is used, some more configuration in the project is required. When using the Jackrabbit Filevault Maven Package Plugin version 1.1.6 or newer, then the file pom.xml
needs to contain the following section in plugin configuration for the filevault-package-maven-plugin
, in configuration/validatorsSettings
(just before jackrabbit-nodetypes
):
<jackrabbit-packagetype>
<options>
<immutableRootNodeNames>apps,libs,oak:index</immutableRootNodeNames>
</options>
</jackrabbit-packagetype>
Also, in this case the vault-validation
version needs to be upgraded to a newer version:
<dependency>
<groupId>org.apache.jackrabbit.vault</groupId>
<artifactId>vault-validation</artifactId>
<version>3.5.6</version>
</dependency>
Then, in ui.apps.structure/pom.xml
and ui.apps/pom.xml
, the configuration of the filevault-package-maven-plugin
needs to have allowIndexDefinitions
as well as noIntermediateSaves
enabled. The option noIntermediateSaves
ensures that the index configurations are added atomically.
<groupId>org.apache.jackrabbit</groupId>
<artifactId>filevault-package-maven-plugin</artifactId>
<configuration>
<allowIndexDefinitions>true</allowIndexDefinitions>
<properties>
<cloudManagerTarget>none</cloudManagerTarget>
<noIntermediateSaves>true</noIntermediateSaves>
</properties>
...
In ui.apps.structure/pom.xml
, the filters
section for this plugin needs to contain a filter root as follows:
<filter><root>/oak:index</root></filter>
Once the new index definition is added, the new application needs to be deployed via Cloud Manager. Upon deployment two jobs are started, responsible for adding (and merging if needed) the index definitions to MongoDB and Azure Segment Store for author and publish, respectively. The underlying repositories are being reindexed with the new index definitions, before the Blue-Green switch is taking place.
In case you observe the following error in filevault validation
[ERROR] ValidationViolation: "jackrabbit-nodetypes: Mandatory child node missing: jcr:content [nt:base] inside node with types [nt:file]"
Then either of the following steps can be followed to fix the issue -
<allowIndexDefinitions>true</allowIndexDefinitions>
Below is an example of where to place the above configuration in the pom.
<plugin>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>filevault-package-maven-plugin</artifactId>
<configuration>
<properties>
...
</properties>
...
<allowIndexDefinitions>true</allowIndexDefinitions>
<repositoryStructurePackages>
...
</repositoryStructurePackages>
<dependencies>
...
</dependencies>
</configuration>
</plugin>
<isDisabled>true</isDisabled>
Below is an example of where to place the above configuration in the pom.
<plugin>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>filevault-package-maven-plugin</artifactId>
...
<configuration>
...
<validatorsSettings>
...
<jackrabbit-nodetypes>
<isDisabled>true</isDisabled>
</jackrabbit-nodetypes>
</validatorsSettings>
</configuration>
</plugin>
For further details on the required package structure for AEM as a Cloud Service, see the document AEM Project Structure.
Index management is about adding, removing, and changing indexes. Changing the definition of an index is fast, but applying the change (often called “building an index”, or, for existing indexes, “reindexing”) requires time. It is not instantaneous: the repository has to be scanned for data to be indexed.
Blue-Green deployment can reduce downtime. It also allows for zero downtime upgrades and provides fast rollbacks. The old version of the application (blue) runs at the same time as the new version of the application (green).
Certain areas of the repository (read-only parts of the repository) can be different in the old (blue) and in the new (green) version of the application. The read-only areas of the repository are typically “/app
” and “/libs
”. In the following example, italic is used to mark read-only areas, while bold is used for read-write areas.
The read-write areas of the repository are shared between all versions of the application, while for each version of the application, there is a specific set of /apps
and /libs
.
During development, or when using on premise installations, indexes can be added, removed, or changed at runtime. Indexes are used as soon as they are available. If an index is not supposed to be used in the old version of the application yet, then the index is typically built during a scheduled downtime. The same occurs when removing an index, or changing an existing index. When removing an index, it becomes unavailable as soon as it is removed.
With blue-green deployments, there is no downtime. During an upgrade, for some time, both the old version (for example, version 1) of the application, as well as the new version (version 2), are running concurrently, against the same repository. If version 1 requires a certain index to be available, then this index must not be removed in version 2: the index should be removed later, for example in version 3, at which point it is guaranteed that version 1 of the application is no longer running. Also, applications should be written such that version 1 works well, even if version 2 is running, and if indexes of version 2 are available.
After upgrading to the new version is complete, old indexes can be garbage collected by the system. The old indexes might still stay for some time, in order to speed up rollbacks (if a rollback should be needed).
The following table shows five index definitions: index cqPageLucene
is used in both versions while index damAssetLucene-custom-1
is used only in version 2.
<indexName>-custom-<customerVersionNumber>
is needed for AEM as a Cloud Service to mark this as a replacement for an existing index.
Index | Out-of-the-box Index | Use in Version 1 | Use in Version 2 |
---|---|---|---|
/oak:index/damAssetLucene | Yes | Yes | No |
/oak:index/damAssetLucene-custom-1 | Yes (customized) | No | Yes |
/oak:index/acme.product-custom-1 | No | Yes | No |
/oak:index/acme.product-custom-2 | No | No | Yes |
/oak:index/cqPageLucene | Yes | Yes | Yes |
The version number is incremented each time the index is changed. In order to avoid custom index names colliding with index names of the product itself, custom indexes, as well as changes to out of the box indexes must end with -custom-<number>
.
Once Adobe changes an out-of-the-box index like “damAssetLucene” or “cqPageLucene”, a new index named damAssetLucene-2
or cqPageLucene-2
is created, or, if the index was already customized, the customized index definition is merged with the changes in the out-of-the-box index, as shown below. Merging of changes happens automatically. That means that you do not need to do anything if an out-of-the-box index changes. However, it is possible to customize the index again later.
Index | Out-of-the-box Index | Use in Version 2 | Use in Version 3 |
---|---|---|---|
/oak:index/damAssetLucene-custom-1 | Yes (customized) | Yes | No |
/oak:index/damAssetLucene-2-custom-1 | Yes (automatically merged from damAssetLucene-custom-1 and damAssetLucene-2) | No | Yes |
/oak:index/cqPageLucene | Yes | Yes | No |
/oak:index/cqPageLucene-2 | Yes | No | Yes |
Index management is currently only supported for indexes of type lucene
, with compatVersion
set to 2
. Internally, other indexes might be configured and used for queries, for example Elasticsearch indexes. Queries that are written against the damAssetLucene
index might, on AEM as a Cloud Service, in fact be executed against an Elasticsearch version of this index. This difference is invisible to the application end user, however certain tools such as the explain
feature will report a different index. For differences between Lucene and Elasticsearch indexes, see the Elasticsearch documentation in Apache Jackrabbit Oak. Customers cannot and do not need to configure Elasticsearch indexes directly.
Only built-in analyzers are supported (that is, those that are shipped with the product). Custom analyzers are not supported.
For best operational performance, indexes should not be excessively large. The total size of all indexes can be used as a guide: If this increases by more than 100% after custom indexes have been added and standard indices have been adjusted on a development environment, custom index definitions should be adjusted. AEM as a Cloud Service can prevent the deployment of indexes that would negatively impact system stability and performance.
To add a fully custom index named /oak:index/acme.product-custom-1
to be used in a new version of the application and later, the index must be configured as follows:
acme.product-1-custom-1
This works by prepending a custom identifier to the index name, followed by a dot (.
). The identifier should be between 2 and 5 characters in length.
As above, this ensures the index is only used by the new version of the application.
When an existing index is changed, a new index needs to be added with the changed index definition. For example, consider the existing index /oak:index/acme.product-custom-1
is changed. The old index is stored under /oak:index/acme.product-custom-1
, and the new index is stored under /oak:index/acme.product-custom-2
.
The old version of the application uses the following configuration:
/oak:index/acme.product-custom-1
The new version of the application uses the following (changed) configuration:
/oak:index/acme.product-custom-2
Index definitions on AEM as a Cloud Service may not fully match the index definitions on a local development instance. The development instance does not have a Tika configuration, while AEM as a Cloud Service instances do have one. If you customize an index with a Tika configuration, please retain the Tika configuration.
Sometimes, a change in an index definition needs to be reverted. The reasons could be that a change was made by mistake, or a change is no longer needed. For example, the index definition damAssetLucene-8-custom-3
was created by mistake and is already deployed. Because of that you may want to revert to the previous index definition damAssetLucene-8-custom-2
. To do that, you need to add a new index called damAssetLucene-8-custom-4
that contains the definition of the previous index, damAssetLucene-8-custom-2
.
The following only applies to custom indexes. Product indexes may not be removed as they are used by AEM.
If an index is to be removed in a later version of the application, you can define an empty index (an empty index that is never used, and does not contain any data), with a new name. For the purpose of this example, you can name it /oak:index/acme.product-custom-3
. This replaces the index /oak:index/acme.product-custom-2
. Once /oak:index/acme.product-custom-2
is removed by the system, the empty index /oak:index/acme.product-custom-3
can then also be removed. An example of such an empty index is:
<acme.product-custom-3
jcr:primaryType="oak:QueryIndexDefinition"
async="async"
compatVersion="2"
includedPaths="/dummy"
queryPaths="/dummy"
type="lucene">
<indexRules jcr:primaryType="nt:unstructured">
<rep:root jcr:primaryType="nt:unstructured">
<properties jcr:primaryType="nt:unstructured">
<dummy
jcr:primaryType="nt:unstructured"
name="dummy"
propertyIndex="{Boolean}true"/>
</properties>
</rep:root>
</indexRules>
</acme.product-custom-3>
If it is no longer needed to have a customization of an out-of-the-box index, then you must copy the out-of-the-box index definition. For example, if you have already deployed damAssetLucene-8-custom-3
, but no longer need the customizations and want to switch back to the default damAssetLucene-8
index, then you must add an index damAssetLucene-8-custom-4
that contains the index definition of damAssetLucene-8
.
Apache Jackrabbit Oak enables flexible index configurations to efficiently handle search queries. Indexes are especially important for larger repositories. Please ensure that all queries are backed by an appropriate index. Queries without a suitable index may read thousands of nodes, which is then logged as a warning.
Please see this document for information on how queries and indexes can be optimized.