AEM’s internal re-indexing process collects repository data and stores it in Oak indexes to support performant querying of content. In exceptional circumstances, the process can become slow or even stuck. This page acts as a troubleshooting guide to help identify if the indexing is slow, find the cause, and resolve the issue.
It is important to distinguish between re-indexing that takes an inappropriately long amount of time, and re-indexing that takes a long amount of time because it’s indexing vast quantities of content. For example, the time it takes to index content scales with the amount of content, so large production repositories will take longer to re-index than small development repositories.
See the Best Practices on Queries and Indexing for additional information on when and how to re-index content.
Initial detection slow indexing requires reviewing the IndexStats
JMX MBeans. On the affected AEM instance, do the following:
Open the Web Console and click the JMX tab or go to https://<host>:<port>/system/console/jmx (for example, http://localhost:4502/system/console/jmx).
Navigate to the IndexStats
Mbeans.
Open the IndexStats
MBeans for " async
" and " fulltext-async
".
For both MBeans, check if the Done timestamp and LastIndexTime timestamp are less than 45 mins from the current time.
For either MBean, if the time value (Done or LastIndexedTime) is greater than 45 mins from the current time, then the index job is either failing or taking too long. This causes the asynchronous indexes to be stale.
A forced shutdown results in AEM suspending asynchronous indexing for up to 30 minutes after the restart, and typically requires another 15 minutes to complete the first re-indexing pass, for a total of approximately 45 minutes (tying back to the Initial Detection timeframe of 45 minutes). In the event you suspect indexing is paused after a forced shutdown:
Firstly, determine if the AEM instance was shut down in a forced manner (the AEM process was forcefully killed, or a power failure occured) and subsequently restarted.
If the forced shutdown occurred, upon restart, AEM automatically suspends re-indexing for up to 30 minutes.
Wait approximately 45 minutes for AEM to resume normal asynchronous indexing operations.
For AEM 6.1, ensure that AEM 6.1 CFP 11 is installed.
In exceptional circumstances, the thread pool used to manage asychronous indexing may become overloaded. In order to isolate the indexing process, a thread pool can be configured to prevent other AEM work from interfering with Oak’s ability to index content in a timely manner. To do this, you should:
Define a new, isolated thread pool for the Apache Sling Scheduler to use for asynchronous indexing:
Verify that the new Apache Sling Scheduler thread pool is registered and displays in the Apache Sling Scheduler Satus web console.
Navigate to the AEM OSGi Web console>Status>Sling Scheduler or go to https://<host>:<port>/system/console/status-slingscheduler (for example, http://localhost:4502/system/console/status-slingscheduler)
Verify that the following pool entries exist:
If too many changes and commits are made to the repository in a short amount of time, indexing can be delayed due to a full observation queue. Firstly, determine if the observation queue is full:
Go to the Web Console and click the JMX tab or go to https://<host>:<port>/system/console/jmx (for example, http://localhost:4502/system/console/jmx)
Open the Oak Repository Statistics MBean and determine if any ObservationQueueMaxLength
value is greater than 10,000.
per second
section) so verify that the ObservationQueueMaxLength
’s seconds metrics are 0.missRate
for the DocChildren
cache in the Consolidated Cache
statistics MBean.To avoid exceeding acceptable observation queue limits, it is recommended to:
DiffCache
as described in Performance tuning tips > Mongo Storage Tuning > Document cache size.Re-indexing can be considered “completely stuck” under two conditions:
Re-indexing is very slow, to the point where no significant progress is reported in log files regarding the number of nodes traversed.
Re-indexing is stuck in an endless loop if repeated exceptions appear in the log files (for example, OutOfMemoryException
) in the indexing thread. The repetition of the same exception(s) in the log, indicates Oak attempts to index the same thing repeatedly, but fails on the same issue.
To identify and fix a stuck re-indexing process, do the following:
In order to identify the cause of stuck indexing the following information must be collected:
Collect 5 minutes of thread dump, one thread dump every 2 seconds.
Set DEBUG level and logs for the appenders.
Collect data from the async IndexStats
MBean:
Navigate to AEM OSGi Web Console>Main>JMX>IndexStat>async
Use oak-run.jar’s console mode to collect the details of what exists under the * /:async
* node.
Collect a list of repository checkpoints by using the CheckpointManager
MBean:
AEM OSGi Web Console>Main>JMX>CheckpointManager>listCheckpoints()
After collecting all the information outlined in Step 1, restart AEM.
Re-indexing can be safely aborted (stopped before it is completed) via the async, async-reindex
and f ulltext-async
indexing lanes ( IndexStats
Mbean). For more information, also see the Apache Oak documentation on How to Abort Reindexing. Aditionally, take into consideration that:
PropertyIndexAsyncReindexMBean
.To safely abort re-indexing, follow these steps:
Identify the IndexStats MBean that controls the re-indexing lane that needs to be stopped.
Navigate to the appropriate IndexStats MBean via the JMX console by going to either AEM OSGi Web Console>Main>JMX or https://<host>:<port>/system/console/jmx (for example, http://localhost:4502/system/console/jmx)
Open the IndexStats MBean based on the re-indexing lane that you wish to stop ( async
, async-reindex
, or fulltext-async
)
async
, async-reindex
, or fulltext-async
.Invoke the abortAndPause()
command on the appropriate IndexStats
MBean.
Mark the Oak index definition appropriately to prevent resuming re-indexing when the indexing lane resumes.
When re-indexing an existing index, set the reindex property to false
/oak:index/someExistingIndex@reindex=false
Or else, for a new index, either:
Set the type property to disabled
/oak:index/someNewIndex@type=disabled
or remove the index definition entirely
Commit the changes to the repository when complete.
Finally, resume asychronous indexing on the aborted indexing lane.
IndexStats
MBean that issued the abortAndPause()
command in Step 2, invoke the resume()
command.It is best to re-index during quiet periods (for example, not during a large content ingest), and ideally during maintenance windows when AEM’s load is known and controlled. Also, ensure that the re-indexing does not take place during other maintenance activities.