Indexing best practices in AEM

Last update: 2024-01-25
  • Topics:
  • Search
    View more on this topic
  • Created for:
  • Beginner
    Developer

Learn about indexing best practices in Adobe Experience Manager (AEM). Apache Jackrabbit Oak powers the content search in AEM and the following are key points:

  • Out of the box, AEM provides various indexes to support search and query functionality, for example damAssetLucene, cqPageLucene and more.
  • All index definitions are stored in the repository under /oak:index node.
  • AEM as a Cloud Service only supports Oak Lucene indexes.
  • Index configuration should be managed in the AEM project codebase and deployed using Cloud Manager CI/CD pipelines.
  • If multiple indexes are available for a given query, the index with the lowest estimated cost is used.
  • If no index is available for a given query, the content tree is traversed to find the matching content. However, the default limit via org.apache.jackrabbit.oak.query.QueryEngineSettingsService is to traverse only 10,0000 nodes.
  • The results of a query are filtered at last to ensure that the current user has read access. This means that the query results may be smaller than the number of indexed nodes.
  • The reindexing of the repository after index definition changes requires time and it depends on the size of the repository.

To have an efficient and correct search functionality that does not impact the performance of the AEM instance, it is important to understand the indexing best practices.

Custom vs OOTB index

At times, you must create custom indexes to support your search requirements. However follow below guidelines before creating custom indexes:

  • Understand the search requirements and check if the OOTB indexes can support the search requirements. Use Query Performance Tool, available at local SDK and AEMCS via the Developer Console or https://author-pXXXX-eYYYY.adobeaemcloud.com/ui#/aem/libs/granite/operations/content/diagnosistools/queryPerformance.html?appId=aemshell.

  • Define an optimal query, use the optimizing queries flow chart and JCR Query Cheat Sheet for reference.

  • If the OOTB indexes cannot support the search requirements, you have two options. However, review the Tips for Creating Efficient Indexes

    • Customize the OOTB index: Preferred option as it is easy to maintain and upgrade.
    • Fully custom index: Only if the above option does not work.

Customize the OOTB index

  • In AEMCS, when customizing the OOTB index use <OOTBIndexName>-<productVersion>-custom-<customVersion> naming convention. For example, cqPageLucene-custom-1 or damAssetLucene-8-custom-1. This helps to merge the customized index definition whenever the OOTB index is updated. See Changes to Out-of-the-Box Indexes for more details.

  • In AEM 6.X, the above naming does not work, however simply update the OOTB index with additional properties in the indexRules node.

  • Always copy the latest OOTB index definition from the AEM instance using the CRX DE Package Manager (/crx/packmgr/?lang=en), rename it and add customizations inside the XML file.

  • Store index definition into the AEM project at ui.apps/src/main/content/jcr_root/_oak_index and deploy it using Cloud Manager CI/CD pipelines. See Deploying Custom Index Definitions for more details.

Fully custom index

Creating fully custom index must be your last option and only if the above option does not work.

  • When creating a fully custom index, use <prefix>.<customIndexName>-<version>-custom-<customVersion> naming convention. For example, wknd.adventures-1-custom-1. This helps to avoid naming conflicts. Here, wknd is the prefix and adventures is the custom index name. This convention is applicable for both AEM 6.X and AEMCS and helps to prepare for future migration to AEMCS.

  • AEMCS only supports Lucene indexes, so to prepare for future migration to AEMCS, always use Lucene indexes. See Lucene Indexes vs Property Indexes for more details.

  • Avoid creating a custom index on the same node type as the OOTB index. Instead, customize the OOTB index with additional properties in the indexRules node. For example, do not create a custom index on the dam:Asset node type but customize the OOTB damAssetLucene index. It has been a common root cause of performance and functional issues.

  • Also, avoid adding multiple node types for example cq:Page and cq:Tag under the indexing rules (indexRules) node. Instead, create separate indexes for each node type.

  • As mentioned in above section, store index definition into the AEM project at ui.apps/src/main/content/jcr_root/_oak_index and deploy it using Cloud Manager CI/CD pipelines. See Deploying Custom Index Definitions for more details.

  • The index definition guidelines are:

    • The node type (jcr:primaryType) should be oak:QueryIndexDefinition
    • The index type (type) should be lucene
    • The async property (async) should be async,nrt
    • Use includedPaths and avoid excludedPaths property. Always set queryPaths value to the same value as includedPaths value.
    • To enforce the path restriction, use evaluatePathRestrictions property and set it to true.
    • Use tags property to tag the index and while querying specify this tags value to use the index. The general query syntax is <query> option(index tag <tagName>).
    /oak:index/wknd.adventures-1-custom-1
        - jcr:primaryType = "oak:QueryIndexDefinition"
        - type = "lucene"
        - compatVersion = 2
        - async = ["async", "nrt"]
        - includedPaths = ["/content/wknd"]
        - queryPaths = ["/content/wknd"]
        - evaluatePathRestrictions = true
        - tags = ["customAdvSearch"]
    ...
    

Examples

To understand the best practices, let’s review few examples.

Improper use of tags property

Below image shows custom and OOTB index definition, highlighting the tags property, both indexes use same visualSimilaritySearch value.

Improper use of tags property

Analysis

This is an improper use of the tags property on the custom index. The Oak query engine picks the custom index over the OOTB index cause of the lowest estimated cost.

The correct way is to customize the OOTB index and add additional properties in the indexRules node. See Customizing the OOTB index for more details.

Index on the dam:Asset node type

Below image shows custom index for the dam:Asset node type with the includedPaths property set to a specific path.

Index on the dam:Asset nodetype

Analysis

If you perform omnisearch on Assets, it returns incorrect results cause the custom index has lower estimated cost.

Do not create a custom index on the dam:Asset node type but customize the OOTB damAssetLucene index with additional properties in the indexRules node.

Multiple node types under indexing rules

Below image shows custom index with multiple node types under the indexRules node.

Multiple nodetypes under the indexing rules

Analysis

It is not recommended to add multiple node types in a single index, however, it is fine to index node types in the same index if the node types are closely related, for example, cq:Page and cq:PageContent.

A valid solution is to customize the OOTB cqPageLucene and damAssetLucene index, add additional properties under the existing indexRules node.

Absence of queryPaths property

Below image shows custom index (not following naming convention as well) without queryPaths property.

Absense of queryPaths property

Analysis

Always set queryPaths value to the same value as includedPaths value. Also, to enforce the path restriction, set evaluatePathRestrictions property to true.

Querying with index tag

Below image shows custom index with tags property and how to use it while querying.

Querying with index tag

/jcr:root/content/dam//element(*,dam:Asset)[(jcr:content/@contentFragment = 'true' and jcr:contains(., '/content/sitebuilder/test/mysite/live/ja-jp/mypage'))]order by @jcr:created descending option (index tag assetPrefixNodeNameSearch)
Analysis

Demonstrates how to set non-conflicting and correct tags property value on the index and use it while querying. The general query syntax is <query> option(index tag <tagName>). Also see Query Option Index Tag

Custom index

Below image shows custom index with suggestion node for achieving the advanced search functionality.

Custom index

Analysis

It is a valid use case to create a custom index for the advanced search functionality. However, index name should follow the <prefix>.<customIndexName>-<version>-custom-<customVersion> naming convention.

Helpful tools

Let’s review few tools that can help you to define, analyze, and optimize the indexes.

Index creation tool

The Oak Index Definition Generator tool helps to generate the index definition based on the input queries. It is a good starting point to create a custom index.

Analyze index tool

The Index Definition Analyzer tool helps to analyze the index definition and provides recommendations to improve the index definition.

Query performance tool

The OOTB Query Performance Tool available at local SDK and AEMCS via the Developer Console or https://author-pXXXX-eYYYY.adobeaemcloud.com/ui#/aem/libs/granite/operations/content/diagnosistools/queryPerformance.html?appId=aemshell helps to analyze the query performance and JCR Query Cheat Sheet to define the optimal query.

Troubleshooting tools and tips

Most of the below are applicable for AEM 6.X and local troubleshooting purposes.

  • Index Manager available at http://host:port/libs/granite/operations/content/diagnosistools/indexManager.html for getting index info like type, last updated, size.

  • Detailed logging of Oak query and indexing-related Java™ packages like org.apache.jackrabbit.oak.plugins.index, org.apache.jackrabbit.oak.query, and com.day.cq.search via http://host:port/system/console/slinglog for troubleshooting.

  • JMX MBean of IndexStats type available at http://host:port/system/console/jmx for getting index info like status, progress, or statistics related to asynchronous indexing. It also provides FailingIndexStats, if there are no results here, means that no indexes are corrupt. AsyncIndexerService marks any index that fails to update for 30 minutes (configurable) as corrupt and stops indexing them. If a query is not giving expected results, it is helpful for developers to check this before proceeding with reindexing as reindexing is computationally expensive and time consuming.

  • JMX MBean of LuceneIndex type available at http://host:port/system/console/jmx for Lucene Index statistics like size, number of documents per index definition.

  • JMX MBean of QueryStat type available at http://host:port/system/console/jmx for Oak Query Statistics including slow and popular queries with details like query, execution time.

Additional resources

Refer to the following documentation for more information:

On this page