AEM’s internal reindexing process collects repository data and stores it in Oak indexes to support performant querying of content. In exceptional circumstances, the process can become slow or even stuck. This page acts as a troubleshooting guide to help identify if the indexing is slow, find the cause, and resolve the issue.
It is important to distinguish between reindexing that takes an inappropriately long amount of time, and reindexing that takes a long amount of time because it’s indexing vast quantities of content. For example, the time it takes to index content scales with the amount of content, so large production repositories take longer to reindex than small development repositories.
See the Best Practices on Queries and Indexing for additional information on when and how to reindex content.
Initial detection slow indexing requires reviewing the IndexStats
JMX MBeans. On the affected AEM instance, do the following:
Open the Web Console and click the JMX tab or go to https://<host>:<port>/system/console/jmx (for example, http://localhost:4502/system/console/jmx).
Navigate to the IndexStats
Mbeans.
Open the IndexStats
MBeans for " async
" and " fulltext-async
".
For both MBeans, check if the Done timestamp and LastIndexTime timestamp are less than 45 mins from the current time.
For either MBean, if the time value (Done or LastIndexedTime) is greater than 45 mins from the current time, then the index job is either failing or taking too long. This problem causes the asynchronous indexes to be stale.
A forced shutdown results in AEM suspending asynchronous indexing for up to 30 minutes after the restart. And, it typically requires another 15 minutes to complete the first reindexing pass, for a total of about 45 minutes (tying back to the Initial Detection timeframe of 45 minutes). If indexing is paused after a forced shutdown:
First, determine if the AEM instance was shut down in a forced manner (the AEM process was forcefully killed, or a power failure occurred) and later restarted.
If the forced shutdown occurred, upon restart, AEM automatically suspends reindexing for up to 30 minutes.
Wait approximately 45 minutes for AEM to resume normal asynchronous indexing operations.
For AEM 6.1, ensure that AEM 6.1 CFP 11 is installed.
In exceptional circumstances, the thread pool used to manage asynchronous indexing may become overloaded. To isolate the indexing process, a thread pool can be configured to prevent other AEM work from interfering with Oak’s ability to index content in a timely manner. In such cases, do the following:
Define a new, isolated thread pool for the Apache Sling Scheduler to use for asynchronous indexing:
Verify that the new Apache Sling Scheduler thread pool is registered and displays in the Apache Sling Scheduler Status web console.
Navigate to the AEM OSGi Web console>Status>Sling Scheduler or go to https://<host>:<port>/system/console/status-slingscheduler (for example, http://localhost:4502/system/console/status-slingscheduler)
Verify that the following pool entries exist:
If too many changes and commits are made to the repository in a short amount of time, indexing can be delayed due to a full observation queue. First, determine if the observation queue is full:
Go to the Web Console and click the JMX tab or go to https://<host>:<port>/system/console/jmx (for example, http://localhost:4502/system/console/jmx)
Open the Oak Repository Statistics MBean and determine if any ObservationQueueMaxLength
value is greater than 10,000.
per second
section) so verify that the ObservationQueueMaxLength
’s seconds metrics are 0.missRate
for the DocChildren
cache in the Consolidated Cache
statistics MBean.To avoid exceeding acceptable observation queue limits, it is recommended to:
DiffCache
as described in Performance tuning tips > Mongo Storage Tuning > Document cache size.Reindexing can be considered “completely stuck” under two conditions:
Reindexing is slow, to the point where no significant progress is reported in log files regarding the number of nodes traversed.
Reindexing is stuck in an endless loop if repeated exceptions appear in the log files (for example, OutOfMemoryException
) in the indexing thread. The repetition of one or more same exceptions in the log, indicates Oak attempts to index the same thing repeatedly, but fails on the same issue.
To identify and fix a stuck reindexing process, do the following:
To identify the cause of stuck indexing, the following information must be collected:
Collect 5 minutes of thread dump, one thread dump every 2 seconds.
Set DEBUG level and logs for the appenders.
Collect data from the async IndexStats
MBean:
Navigate to AEM OSGi Web Console>Main>JMX>IndexStat>async
Use oak-run.jar’s console mode to collect the details of what exists under the * /:async
* node.
Collect a list of repository checkpoints by using the CheckpointManager
MBean:
AEM OSGi Web Console>Main>JMX>CheckpointManager>listCheckpoints()
After collecting all the information outlined in Step 1, restart AEM.
Reindexing can be safely aborted (stopped before it is completed) via the async, async-reindex
and f ulltext-async
indexing lanes ( IndexStats
Mbean). For more information, also see the Apache Oak documentation on How to Abort Reindexing. Also, consider the following:
PropertyIndexAsyncReindexMBean
.To safely abort reindexing, follow these steps:
Identify the IndexStats MBean that controls the reindexing lane that must be stopped.
Navigate to the appropriate IndexStats MBean via the JMX console by going to either AEM OSGi Web Console>Main>JMX or https://<host>:<port>/system/console/jmx (for example, http://localhost:4502/system/console/jmx)
Open the IndexStats MBean based on the reindexing lane that you wish to stop ( async
, async-reindex
, or fulltext-async
)
async
, async-reindex
, or fulltext-async
.Invoke the abortAndPause()
command on the appropriate IndexStats
MBean.
Mark the Oak index definition appropriately to prevent resuming reindexing when the indexing lane resumes.
When reindexing an existing index, set the reindex property to false
/oak:index/someExistingIndex@reindex=false
Or else, for a new index, either:
Set the type property to disabled
/oak:index/someNewIndex@type=disabled
or remove the index definition entirely
Commit the changes to the repository when complete.
Finally, resume asynchronous indexing on the aborted indexing lane.
IndexStats
MBean that issued the abortAndPause()
command in Step 2, invoke the resume()
command.It is best to reindex during quiet periods (for example, not during a large content ingest), and ideally during maintenance windows when AEM’s load is known and controlled. Also, ensure that the reindexing does not take place during other maintenance activities.